Tableau’s Vision on Big Data


Overview | What you'll learn: 

Tableau is on a mission to help users see and understand their data. To accomplish this mission, our fundamental belief is in the democratization of data, meaning “the people who know the data should be the ones empowered to ask questions of the data.” Everyday knowledge workers should have the ability to easily access their data wherever it may reside. These same knowledge workers should also have the ability to analyze and discover insights about their data without assistance from the elite few - the data scientists and IT developers.

Visualizing data is important regardless of the size of the data because it translates information into insight and action. The approach to visualizing Big Data is especially important because the cost of storing, preparing and querying data is much higher. Therefore, organizations must leverage well-architected datasources and rigorously apply best practices to allow knowledge workers to query Big Data directly. Big Data has been home to a great deal of innovation in recent years - thus there are many options available, each with their different strengths. Tableau’s vision is to support any Big Data platform that becomes relevant to our users, and help them facilitate a real-time conversation with their data.

We've also pulled out the first several pages of the whitepaper for you to read. Download the PDF on the right to read the rest.


Tableau’s (Big) Data Strategy

Tableau is on a mission to help users see and understand their data. To accomplish this mission, our fundamental belief is in the democratization of data, meaning “the people who know the data should be the ones empowered to ask questions of the data.” Everyday knowledge workers should have the ability to easily access their data wherever it may reside. These same knowledge workers should also have the ability to analyze and discover insights about their data without assistance from the elite few - the data scientists and IT developers.

Visualizing data is important regardless of the size of the data because it translates information into insight and action. The approach to visualizing Big Data is especially important because the cost of storing, preparing and querying data is much higher. Therefore, organizations must leverage wellarchitected datasources and rigorously apply best practices to allow knowledge workers to query Big Data directly. Big Data has been home to a great deal of innovation in recent years - thus there are many options available, each with their different strengths. Tableau’s vision is to support any Big Data platform that becomes relevant to our users, and help them facilitate a real-time conversation with their data.

To achieve this Big Data vision, Tableau has focused on six pillars:

  1. Broad access to Big Data platforms - Part of our vision is to enable analysis of Big Data, wherever it lives. Tableau supports over 40 different data sources today as well as countless others through our extensibility options. As new data sources emerge and become valuable to our users, we will continue to incorporate them into our product to lower the friction for accessing data. Our named connectors for the Big Data ecosystem include:
    • Hadoop: Cloudera Impala & Hive, Hortonworks Hive, MapR Hive, Amazon EMR with Impala & Hive, Pivotal HAWQ, IBM BigInsights
    • NoSQL: MarkLogic, Datastax
    • Spark: Apache Spark SQL
    • Cloud: Amazon Redshift, Google BigQuery
    • Operational Data: Splunk
    • Fast Analytical Databases: Actian Vectorwise & ParAccel, Teradata Aster, HP Vertica, SAP Hana, SAP Sybase, Pivotal Greenplum, EXASOL EXASolution
  2. Self-service visualization of Big Data for business users - Business users can visualize their data using drag-and-drop operations without writing complex SQL, Java code or MapReduce jobs. Tableau simplifies the task of analyzing data - users can discover visual insights about their data faster than they ever could before.
  3. Hybrid data architecture for optimizing query performance - Tableau can connect live to data sources or bring it in-memory. Live connectivity works great when connecting to fast interactive query engines and large datasets. However, we can also augment and accelerate slower data sources by creating an extract of the data and bringing it into our in-memory Data Engine.
  4. Data blending for performing analytics across data sources - Distributed Data is often times an even bigger challenge than Big Data. It is rare an analyst’s data is nicely packaged in a single place - instead, data is all over the place residing in disparate technologies and platforms. Tableau enables users to traverse across data sources by blending Big Data with other data sources (e.g. Salesforce, MySQL, Excel files), allowing organizations to keep their data assets where they reside.
  5. Overall platform query performance - As data volumes grow, Tableau continues to invest in core query performance improvements that help facilitate real-time conversations with data. Most recently, this includes capabilities such as parallel queries, query fusion, and external query caching. Tableau also now leverages vectorization for processors that support it.
  6. Powerful and homogenous visual interface to data - Tableau has analytical tools such as the ability to filter data, run forecasts and perform trend line analysis using simple actions. It also interprets a user’s actions and selects the best way to represent the data based on visual best practices. Tableau also provides a single visual interface to data that is consistent across all data sources once you connect to data.

Our vision aligns well with how the overall data landscape is evolving. The new normal is that many customers are dealing with a diverse set of Big Data technologies. Technologies such as Hadoop and Spark have become a part of the data architecture alongside data warehouses for their ability to store and process data. In parallel, customers are rights-sizing their data warehouses based on their Hadoop deployments. NoSQL databases are frequently chosen over relational databases as the backend for applications because of their flexible data models, low latency and application-specific design. Lastly, cloud data sources are ubiquitous as cloud CRM & ERP systems have become the preferred choice to manage business processes and the “pay-as-you-go consumption model is becoming popular for cloud storage and data processing. With back-ends so diverse and flexible, a user needs a front-end tool like Tableau to flexibly connect across Big Data platforms, cloud data sources and relational databases to give users the agility they need for analyzing data.

Want to read more? Download the rest of the whitepaper!

Continue Reading...

You might also be interested in...