Our Recent Big Data Announcements
Today’s blog post will be longer than usual, but I wanted to take time to explain and provide more context around the various announcements that Tableau made this morning, timed with the Strata-Hadoopworld conference in New York City. First, it’s important to re-iterate what Tableau’s strategy is around big data. Simply put, we want to be the de facto, front-end visual analytics tool that customers use to see and understand their (big) data.
Our customers choose the technologies that help them solve their specific business challenges. As a result, we need to support a broad set of platforms that will make them successful. Tableau now directly connects to over 25+ data sources and that list grows every year. We give customers the ability to access data sources that support open standards, such as ODBC, and will be providing additional ways to move data into Tableau with new APIs coming in Tableau 8. We will continue to invest in supporting the technologies that our customers are using today, while also keeping an eye towards the future and supporting the ones we believe will become mainstream.
We also believe in the democratization of data, meaning that organizations should provide people access to all the data they need and let them do the analyses. What good is data sitting in a database if it can’t be tapped by knowledge workers? The business value of data is being able to analyze it and gain information from it. It is no longer enough to only provide access to the elite few – data scientists, programmers, and engineers who have the technical know-how needed to access the data. That’s where Tableau comes in. There’s no better product on the market to help you quickly orient to what your data is telling you. A visual peek at your data to identify outliers, spot some trends, orient yourself to the bounds and shape of your data – these are all standard steps that are needed when working with data, and are even more important to do visually when working with big data. Try scanning petabytes of data listed out in a spreadsheet to find the interesting pattern that requires more analysis. This is where Tableau shines!
Today’s announcements strengthen Tableau’s leadership in this space by providing our customers access to a broad range of technologies that will help them solve real business challenges. Let’s quickly review each announcement and describe why it’s important:
- Hortonworks – The Hortonworks distribution of hadoop is growing in popularity. Several of our customers use Hortonworks and need to be able to analyze their hadoop data in Tableau. We created a new Hortonworks connector that customers can use to quickly access and explore their hadoop data without any programming or scripting required. The Hortonworks connector is similar to the Cloudera and MapR connectors and is based on Hive connectivity.
- DataStax - DataStax Enterprise embeds several technologies in a single platform, including Cassandra, Hadoop, and Solr. Many of our customers are interested in doing analytics on their Cassandra data, but there are very few BI tools on the market today that do a good job at that. Cassandra is one of the most popular ‘NoSQL’ technologies that companies have adopted in lieu of a relational database because of its scalability, throughput, flexibility, and TCO. Tableau has created a new connector to the DataStax platform so that customers can now visually analyze their Cassandra data using Tableau. DataStax allows Cassandra column families to be exposed through Hive, which is the mechanism Tableau uses to connect.
- Hadapt – While the hadoop environment is powerful and allows you to churn through huge volumes of data, I think it would be difficult to find anyone who would describe it as ‘real-time’. That’s where Hadapt comes in. Hadapt’s technology, which essentially distributes an RDBMS across all the nodes of a hadoop cluster and leverages them for query processing, speeds up analytical queries by an order of magnitude. Hadapt provides fast query response times for customers that need to have an interactive conversation with their data and the latency in hadoop is not good enough.
- Karmasphere – Karmasphere is another vendor in the hadoop ecosystem with a product that makes hadoop exploration easier for non-programmers. Their Collaborative Workspace allows teams to explore and share hadoop data, artifacts, and results. One of the missing pieces of that puzzle was being able to do visual analytics on the results. Karmasphere will be using one of the new APIs in Tableau 8 (coming in 2013) to make hadoop data available in Tableau for users who want to do more complex visual analysis.
- Greenplum Chorus – Chorus is a platform that data scientists, analysts, and others can use to share, communicate, and collaborate around big data. The notion of social or team-based analytics is gaining traction, and Chorus provides an interesting platform to enable that. As part of Chorus being contributed to the open source community, our friends at Greenplum wanted to show how Chorus could be used to integrate with a BI tool like Tableau. The integration uses new APIs in Tableau 8 to integrate Tableau into the project-based workflow. For example, you can publish a dataset from within a Chorus workspace as a Tableau Workbook, and then view a published dataset but having it automatically open in Tableau. For teams of people needing a framework to collaborate and do analytics over big data, Chorus and the integration with Tableau provide an interesting solution.
- Digital Reasoning – Digital Reasoning provides a platform to consume structured and unstructured text as input and automatically develop an understanding of the entities at the factual/relationship level instead of somebody needing to spend hours reading the documents and drawing the links between entities and events manually. Many of our customers do not leverage unstructured data to its full advantage, and have expressed great interest in being able to do analysis on unstructured content (blogs, emails, newswires, tweets, etc.). Our partnership with Digital Reasoning provides a new option for our customers to do this type of analysis.
- Cirro – Cirro is a relatively new vendor on the big data scene with a product that makes it easy to combine data from disparate systems into a single view for further analytics. The Cirro Data Hub ‘looks’ like a relational database to Tableau and enables access to ‘views’ that have been defined. Behind the scenes, those views might be based on federated queries that reach across multiple sources of structured, semi-structured, and unstructured data and return the result as a single view. For example, a single query might integrate data residing in hadoop and a relational database. Cirro is an interesting solution for customers looking to integrate data residing in different systems, without wanting go down the traditional ETL route where you would physically need to move data into a staging system or data mart and do the reporting off of that. Views in Cirro can be created quickly and dynamically. Tableau integrates with Cirro through ODBC/SQL connectivity.
- Simba – Simba is the ‘glue’ that makes a lot of our connectivity possible. They build ODBC drivers for many of our database partners that we leverage for connectivity. We have had a close working relationship with them for years and look forward to continued collaboration with them on other new big data technologies.
NOTE: some of the new integrations mentioned above are not part of the current Tableau release yet, but will be soon. Others will be part of Tableau 8. If you are interested in trying one out, please let me know and we can set you up with a beta version.
In summary, it’s a very busy and exciting time to be in this space. The pace of innovation, new technologies, and opportunities is incredible. I love my job!