Hadoop is hot. It's hot because it works well with big data, messy data and nested data like is often found in XML files. Senior Software Engineer and Data Rockstar Robert Morton has been working with Apache Hadoop from Cloudera here at Tableau, writing a connector from Tableau to a Hadoop cluster. Here's a first look at what he's been able to do. In this demo Robert covers:
      The easy connection from Tableau to Apache Hadoop from Cloudera
      Working with XML data in Hadooop
      Working with nested data and tuning your workload to your cluster
      Using Hadoop together with Tableau's in-memory data engine
The great thing about using Hadoop with Tableau is that you get all the powerful capabilities of Hadoop without having to do anything different in Tableau-- just connect to your cluster as you would any other data source. The other great thing about Hadoop and Tableau is that it should be available very, very soon. Want to find out more?
      If you're using the Tableau 7 beta, you already have the ability to connect to a Hadoop cluster. Just look in your Data Sources window.
      If you're a customer and want to try it out, contact us to get on the beta.
Or watch this space for more news about Hadoop and Tableau.

You might also be interested in...


Colour me impressed, that's definitely cool. I can see it getting more and more capable from here.

This is very powerful!
This Tableau connector to Hadoop enables data scientists to explore data without having to learn Hive/Pig/Java!! I particularly liked querying the XML data set in the demo!! Well done!!

Nathanial and Ravi, glad you like it! Ravi, have you been trying this out in the 7.0 beta?

Awesome work Robert!

Thanks for putting this together, question:
- How big was the data you were working with?


Hi Jagjit,

In the order of appearance in my demo:

  • The Kiva microfinance data has 2.4 million records and about 45 fields. In HDFS this takes 28 GB of storage for the prejoined, bucketed and materialized view of the lenders and loans tables. The raw lenders, loans and relationships tables are 170 MB, 1.3 GB and 82 MB, but yield a cartesian product when joined.
  • The weather data came from ~2200 separate XML files in HDFS, collectively taking 261 MB. In Tableau I unpivoted this twice to yield about 0.5 million records and about 30 fields.
  • The two biggest tables in the blended airline visualization were the airfare data and airline on-time performance data, both freely available from bts.gov.
    • The airfare data has 324 million records and 39 fields, and came from a collection of 74 zip files totaling 4.5 GB.
    • The airline on-time performance data has 140 million records and 93 fields, and came from a collection of 287 zip files totaling 4.3 GB.


Does Tableau support connectivity to Hbase?

Bump, is there plans for direct connectors to hbase? Or support for Hive+HBase integration. Thanks.

Subscribe to our blog