From web logs, thousands of XML files, or ecommerce data, data is growing at a tremendous rate. Hadoop is an emerging technology that can help deal with data every day that is big, unstructured, messy or all three. And today, we're announcing native support for Apache Hadoop from Cloudera with Tableau 6.1.
Hadoop is usually used to mean several related technologies—Hadoop, Map/Reduce, and Hive—that can be used together to help you work with that data.
Hadoop, together with the Map/Reduce framework, is a distributed system that lets you query across multiple and potentially different data sources at once. Technically, HDFS is the distributed file storage part of the system and map/reduce is the algorithm that processes queries across that system, while Hadoop is the overall technology for managing distributed execution and resilience to node failure. Hive is a technology that essentially lets you run SQL queries across the Hadoop cluster. It includes a number of functions that make it easier to process and transform data from Hadoop.
Why is this important?
Data is only getting larger and more complex, with new data sources like web logs, bar code data and more. The most important data is also increasingly found in disparate places- XML files, databases, various unstructured formats. Hadoop is a technology and open source project that is leading the way in dealing with these new mountains of data.
Tableau's mission is to help people see and understand data. That includes data that is not handled well by traditional databases, and that's why it's important to support Hadoop.
What is Tableau releasing?
- A native data connector to Apache Hadoop Hive from Cloudera.
- A new set of string functions that work with Hadoop and Hive on a variety of data sources, including XML.
Example of the new string functions that can be used with Hadoop and Hive to work with XML objects and other data types.
How can I try it?
If you are running a Hadoop cluster with Hive, there are three ways to try it:
- If you're a Tableau customer, you can download Tableau 6.1.4 (available today). Contact us to enable the license key for the Hadoop connector and you'll be ready to go.
- If you're a Tableau customer and want to join the Tableau 7 beta, contact your sales rep. The Hadoop connector is in the beta.
- You can wait for the release of Tableau 7 this winter.
Where can I find out more?
This page on Hadoop has more information as well as links to a demo, whitepapers and Cloudera's site.