Tableau & Spark SQL: Big Data Just Got Even More Supercharged

By Jeff Feng 15 Oct, 2014

Tableau + Spark SQL

Update 2-20-2015: The connector for Spark SQL is now released and available for version 8.3.3 and newer.

We are thrilled to announce that Tableau is launching a new native Spark SQL connector (currently in beta), providing users an easy way to visualize their data in Apache Spark.

Spark is an open source processing engine for Big Data that brings together an impressive combination of speed, ease of use and advanced analytics. Spark enables applications in Hadoop clusters to run in-memory at up to 100x faster than MapReduce, while also delivering significant speed-ups when running purely on disk. Spark SQL provides an interface for users to query their data from Spark RDDs as well as other data sources such as Hive tables, parquet files and JSON files. Spark’s APIs in Python, Scala & Java make it easy to build parallel apps. Lastly, Spark provides strong support for streaming data and complex analytics where iterative calculations are used such as in machine learning and graph algorithms - this is where Spark shines brightest. Spark’s versatility has led users to call it “the swiss army knife” of processing engine platforms as users can combine all of these capabilities in a single platform and workflow.

Spark is also the hottest open source big data project currently on the planet. The level of involvement from the open source community has grown rapidly over the last year with over 330 contributors in the last 12 months alone. Spark is more than just hype though. Within the last 8 months, all of the major Hadoop distributors, including Cloudera, Hortonworks and MapR, have committed to ship Spark as a part of their distribution as well as help accelerate the development of the project.

Tableau’s integration with Spark brings tremendous value to our customers by providing a fast and versatile data processing engine at their fingertips. Our integration also provides new capabilities to the Spark community - users can visually analyze their data without writing a single line of Spark SQL code. That’s a big deal because creating a visual interface to your data expands the Spark technology beyond data scientists and data engineers to all business users. The Spark connector takes advantage of Tableau’s flexible connection architecture that gives customers the option to connect live and issue interactive queries, or use Tableau’s fast in-memory database engine. Tableau also provides users the capability to blend Spark data with data from any of our other 40+ direct connectors, empowering users to leverage their existing data assets wherever they are.

In the beta launch, Tableau is supporting both Windows and Mac Spark SQL as a named connector in Tableau Desktop. Now to see Tableau and Spark SQL in action, we have created a short video demonstrating how users can connect to a Spark cluster and interact with data in Tableau.

Read: To learn more about what Tableau’s integration means to Spark users and Tableau’s recent addition to Databrick’s “Certified on Spark” program, please check out our guest post on the Databricks blog.

Respond: Do you have an interesting big data use case? We’d love to hear about it. Please reach out and let us know.


Submitted by rajesh r. on

Hi Sir, you provided great info on SAP it is interesting and useful for the starters, i feel it is right place to get info, thanks for the info and keep posting more.

Submitted by Gaurav T. on

Hi ,

I am interested in exploring Tableau with Spark. Can you please tell me how do I get this ? I tried downloading the free trial but the tableau version (8.3.15) doesn't have Spark. I read on your blog that it's available in 8.3.30, can you please tell me how do I get that version ?

Submitted by Jaya M. on

Excellent news ! Good to know SparkSQL Integration with Tableau.

Submitted by VenkataReddy (no verificado) on

HI ,

The SparkSQL is itself is niether a datastore nor has a metastore. I see when you expand the default, it gives the list of tables. Where do these tables are listed from and where is the data is residing.


Submitted by Anvesh (no verificado) on

Hai Venkat,

You have asked a very good question even i felt the same when i have seen this video. Expecting a complete explanatory answer from the experts.


Submitted by Milind D. on


I am trying to Connect to the Spark server but I am getting the following error:

[Simba][SparkODBC] (34) Error from server: connect() failed: errno = 10061.

Can you please tell me why this issue is caused and what I am doing wrong

Agregar nuevo comentario

non-humans click here