Tableau 6.1 Now Supports Hadoop

By Ellie Fields November 8, 2011

From web logs, thousands of XML files, or ecommerce data, data is growing at a tremendous rate. Hadoop is an emerging technology that can help deal with data every day that is big, unstructured, messy or all three. And today, we're announcing native support for Apache Hadoop from Cloudera with Tableau 6.1.

What's Hadoop?

Hadoop is usually used to mean several related technologies—Hadoop, Map/Reduce, and Hive—that can be used together to help you work with that data.

Hadoop, together with the Map/Reduce framework, is a distributed system that lets you query across multiple and potentially different data sources at once. Technically, HDFS is the distributed file storage part of the system and map/reduce is the algorithm that processes queries across that system, while Hadoop is the overall technology for managing distributed execution and resilience to node failure. Hive is a technology that essentially lets you run SQL queries across the Hadoop cluster. It includes a number of functions that make it easier to process and transform data from Hadoop.

Why is this important?

Data is only getting larger and more complex, with new data sources like web logs, bar code data and more. The most important data is also increasingly found in disparate places- XML files, databases, various unstructured formats. Hadoop is a technology and open source project that is leading the way in dealing with these new mountains of data.

Tableau's mission is to help people see and understand data. That includes data that is not handled well by traditional databases, and that's why it's important to support Hadoop.

What is Tableau releasing?

Two things:

  • A native data connector to Apache Hadoop Hive from Cloudera.
  • A new set of string functions that work with Hadoop and Hive on a variety of data sources, including XML.


Example of the new string functions that can be used with Hadoop and Hive to work with XML objects and other data types.

How can I try it?

If you are running a Hadoop cluster with Hive, there are three ways to try it:

  1. If you're a Tableau customer, you can download Tableau 6.1.4 (available today). Contact us to enable the license key for the Hadoop connector and you'll be ready to go.
  2. If you're a Tableau customer and want to join the Tableau 7 beta, contact your sales rep. The Hadoop connector is in the beta.
  3. You can wait for the release of Tableau 7 this winter.

Where can I find out more?

This page on Hadoop has more information as well as links to a demo, whitepapers and Cloudera's site.

Comments

Submitted by Robert M. on

We have also created some Knowledge Base articles to help analysts, data scientists and administrators quickly get started with Tableau and Hive, or jump straight into advanced features. The set of articles is easily found with this KB search on Hadoop: http://www.tableausoftware.com/search/kb_article/hadoop

Submitted by Steven K. on

Really look forward to seeing some user case studies of Hadoop and Tableau at future customer conferences.

Submitted by Guest (not verified) on

I hope this means support for other nosql databases is on the way.

Riak and Mongodb support would be most helpful

Submitted by Andrew Fisher (not verified) on

Would love to see Riak connected as a back end too with the ability to use Javascript Map Reduce functions - that would really raise up the usefulness of Tableau in our org as we currently pipeline data from Riak to a database to use it.

Great step forward though...

non-humans click here