Our Recent Big Data Announcements

By Ted Wasserman 23 Out, 2012

Today’s blog post will be longer than usual, but I wanted to take time to explain and provide more context around the various announcements that Tableau made this morning, timed with the Strata-Hadoopworld conference in New York City. First, it’s important to re-iterate what Tableau’s strategy is around big data. Simply put, we want to be the de facto, front-end visual analytics tool that customers use to see and understand their (big) data.

Our customers choose the technologies that help them solve their specific business challenges. As a result, we need to support a broad set of platforms that will make them successful. Tableau now directly connects to over 25+ data sources and that list grows every year. We give customers the ability to access data sources that support open standards, such as ODBC, and will be providing additional ways to move data into Tableau with new APIs coming in Tableau 8. We will continue to invest in supporting the technologies that our customers are using today, while also keeping an eye towards the future and supporting the ones we believe will become mainstream.

We also believe in the democratization of data, meaning that organizations should provide people access to all the data they need and let them do the analyses. What good is data sitting in a database if it can’t be tapped by knowledge workers? The business value of data is being able to analyze it and gain information from it. It is no longer enough to only provide access to the elite few – data scientists, programmers, and engineers who have the technical know-how needed to access the data. That’s where Tableau comes in. There’s no better product on the market to help you quickly orient to what your data is telling you. A visual peek at your data to identify outliers, spot some trends, orient yourself to the bounds and shape of your data – these are all standard steps that are needed when working with data, and are even more important to do visually when working with big data. Try scanning petabytes of data listed out in a spreadsheet to find the interesting pattern that requires more analysis. This is where Tableau shines!

Today’s announcements strengthen Tableau’s leadership in this space by providing our customers access to a broad range of technologies that will help them solve real business challenges. Let’s quickly review each announcement and describe why it’s important:

  • Hortonworks – The Hortonworks distribution of hadoop is growing in popularity. Several of our customers use Hortonworks and need to be able to analyze their hadoop data in Tableau. We created a new Hortonworks connector that customers can use to quickly access and explore their hadoop data without any programming or scripting required. The Hortonworks connector is similar to the Cloudera and MapR connectors and is based on Hive connectivity.
  • DataStax - DataStax Enterprise embeds several technologies in a single platform, including Cassandra, Hadoop, and Solr. Many of our customers are interested in doing analytics on their Cassandra data, but there are very few BI tools on the market today that do a good job at that. Cassandra is one of the most popular ‘NoSQL’ technologies that companies have adopted in lieu of a relational database because of its scalability, throughput, flexibility, and TCO. Tableau has created a new connector to the DataStax platform so that customers can now visually analyze their Cassandra data using Tableau. DataStax allows Cassandra column families to be exposed through Hive, which is the mechanism Tableau uses to connect.
  • Hadapt – While the hadoop environment is powerful and allows you to churn through huge volumes of data, I think it would be difficult to find anyone who would describe it as ‘real-time’. That’s where Hadapt comes in. Hadapt’s technology, which essentially distributes an RDBMS across all the nodes of a hadoop cluster and leverages them for query processing, speeds up analytical queries by an order of magnitude. Hadapt provides fast query response times for customers that need to have an interactive conversation with their data and the latency in hadoop is not good enough.
  • Karmasphere – Karmasphere is another vendor in the hadoop ecosystem with a product that makes hadoop exploration easier for non-programmers. Their Collaborative Workspace allows teams to explore and share hadoop data, artifacts, and results. One of the missing pieces of that puzzle was being able to do visual analytics on the results. Karmasphere will be using one of the new APIs in Tableau 8 (coming in 2013) to make hadoop data available in Tableau for users who want to do more complex visual analysis.
  • Greenplum Chorus – Chorus is a platform that data scientists, analysts, and others can use to share, communicate, and collaborate around big data. The notion of social or team-based analytics is gaining traction, and Chorus provides an interesting platform to enable that. As part of Chorus being contributed to the open source community, our friends at Greenplum wanted to show how Chorus could be used to integrate with a BI tool like Tableau. The integration uses new APIs in Tableau 8 to integrate Tableau into the project-based workflow. For example, you can publish a dataset from within a Chorus workspace as a Tableau Workbook, and then view a published dataset but having it automatically open in Tableau. For teams of people needing a framework to collaborate and do analytics over big data, Chorus and the integration with Tableau provide an interesting solution.
  • Digital Reasoning – Digital Reasoning provides a platform to consume structured and unstructured text as input and automatically develop an understanding of the entities at the factual/relationship level instead of somebody needing to spend hours reading the documents and drawing the links between entities and events manually. Many of our customers do not leverage unstructured data to its full advantage, and have expressed great interest in being able to do analysis on unstructured content (blogs, emails, newswires, tweets, etc.). Our partnership with Digital Reasoning provides a new option for our customers to do this type of analysis.
  • Cirro – Cirro is a relatively new vendor on the big data scene with a product that makes it easy to combine data from disparate systems into a single view for further analytics. The Cirro Data Hub ‘looks’ like a relational database to Tableau and enables access to ‘views’ that have been defined. Behind the scenes, those views might be based on federated queries that reach across multiple sources of structured, semi-structured, and unstructured data and return the result as a single view. For example, a single query might integrate data residing in hadoop and a relational database. Cirro is an interesting solution for customers looking to integrate data residing in different systems, without wanting go down the traditional ETL route where you would physically need to move data into a staging system or data mart and do the reporting off of that. Views in Cirro can be created quickly and dynamically. Tableau integrates with Cirro through ODBC/SQL connectivity.
  • Simba – Simba is the ‘glue’ that makes a lot of our connectivity possible. They build ODBC drivers for many of our database partners that we leverage for connectivity. We have had a close working relationship with them for years and look forward to continued collaboration with them on other new big data technologies.

NOTE: some of the new integrations mentioned above are not part of the current Tableau release yet, but will be soon. Others will be part of Tableau 8. If you are interested in trying one out, please let me know and we can set you up with a beta version.

In summary, it’s a very busy and exciting time to be in this space. The pace of innovation, new technologies, and opportunities is incredible. I love my job!


Submitted by Dan Murray (não verificado) on

Exciting stuff!

Submitted by Budy S. on

With Tableau now supports many Hive and NoSQL databases, do we still need to extract/import from the database? Or we can use connect live option using the ODBC connector?
Considering Hive and NoSQL databases are intented for very large data, we should not extract the data in TDE storage, but what about the performance of the data source and what about SQL-compliant ODBC driver manager that used to connect to the data source?

Submitted by Ted W. on

Hi Budy - our customers use different approaches when working with Hive. For example, some customers use Tableau Data Extracts to improve the performance as they explore their data, build their dashboards, etc. Once they have built the views they are interested in, they'll toggle the connection to use the live hadoop cluster so that the results are based on all the data. For some customers, the latency of Hive is acceptable (in fact, I even had a customer that told me it performed better than the relational database they were using). Newer technologies such as Cloudera's Impala and Hadapt's Interactive Query are changing the game for doing real-time analytics on hadoop for large data sets.

Submitted by Budy S. on

Hi Ted, it's interesting to hear about new tech for distributed db such as Impala. Then, that's will be an interesting approach to visual analysis of big-data.
Of course, to import some data for first round exploration is a must, but it should not be a hinder for visual analysis with very big data, transaction-based data with many historical years. Hence, connect-live option for Tableau will be a game changer in BI tools.

Submitted by Budy S. on

Oh, one thing as a big obstacle when creating TDE from large data sets is it's very time consuming, especially there is no recovery steps when it failed (it must start from the beginning). Considering when we need about 1 or more hours of extracting, but sometimes when it failed, then err we need to start from the beginning and need to wait for about 1 or more hours again.
I think it must have a better approach, that we can resume when it failed. The common failure: storage, network, internal error(?), etc.

Submitted by Sarah Williams (não verificado) on

Tableau has created a new connector to the DataStax platform so that customers can now visually analyze their Cassandra data using Tableau. DataStax allows Cassandra column families to be exposed through Hive, which is the mechanism Tableau uses to connect gold prospecting equipment