Updated February 4, 2020: As of January 15, 2020 the Databricks connector is now available in Tableau Online! You can now publish a live connection to Databricks from Tableau Desktop to Tableau Online or connect directly to Databricks from the web authoring experience in Tableau Online. Activate a free trial of Tableau Online today to try it out for yourself.
Tableau 2019.3 was a momentous release for a number of reasons. Along with the unveiling of Tableau Catalog and Explain Data came a new native connection to Databricks for Tableau Desktop and Tableau Server. The new connector offers better performance, a straightforward connection experience, and high-quality error handling.
At Tableau, we’re thrilled to partner with Databricks to empower the full spectrum of data people throughout an organization. Databricks is helping data teams solve the world’s toughest problems, while Tableau makes it fast and easy to connect, explore, and make decisions data-driven. The two platforms are on a mission to make data more accessible and to enable organizations with self-service analytics, so creating a finely-tuned connector was an obvious next step for the partnership. This native connection is intended to better serve our customers as their organizations scale and their data strategies evolve.
Data lake vs. Delta Lake
While data lakes are the foundation of a modern data strategy, they are typically considered cold storage due to the large volume of data that is constantly being appended with no cohesive schemas. This can yield suboptimal performance and incomplete analysis when attempting to analyze your entire data lake in Tableau. The need to explore your data lake still remains, whether you want to visualize IoT or transactional data in real-time, or drill into the underlying details of your dashboards. In response to the big data problem that organizations face, Databricks created the open source project, Delta Lake.
Delta Lake was created to solve the challenges that face traditional data lakes at scale, storing tens of petabytes and introducing hundreds of terabytes each day. Here are some of the benefits of Delta Lake:
- Automatically compacts data and executes de-duplication tasks to improve performance.
- Makes ETL processes much faster on the front end. This enforces a cohesive schema and streamlines the movement of data for analysis in Tableau.
- Provides a storage layer that brings ACID (atomicity, consistency, isolation, durability) transactions to Apache Spark™ and data lakes. This eliminates the creation of incomplete datasets and enables clean content reads while data is changing.
With Delta Lake and the Tableau Databricks Connector, you can quickly and reliably access your data as soon as it’s written to your data lake, without waiting for an ETL process to run. The direct connection lets Tableau users take advantage of Delta Lake and the output of the work that the Databricks platform facilitates: data science and data engineering.
How the Tableau Databricks Connector works
Prior to 2019.3, connecting to Databricks was possible through the generic Spark SQL connector in Tableau. Accessing your data via the generic connector was possible, but the overall experience and performance needed improvement. Now, when you click on the Databricks Connector, you’ll receive a simplified dialog, a faster initial connection, and noticeably faster loading times.
The connector has been optimized for queries to be sent and correctly translated to Databricks SQL which results in faster, more reliable queries. With no errors and faster feedback when you’re dragging and dropping, Tableau users can expect to stay in the flow of analysis when exploring their data.
It’s worth noting that this means faster queries for live connections as well reduced time when extracting your data—so even data strategies that rely heavily on extracts will see an improvement.
Download Tableau 2019.3 and try out the Databricks Connector for yourself. Be sure to provide feedback as we’re excited to hear about your experience with this new feature. If you’re interested in hearing how our joint customers are using Databricks and Tableau, register for the Data Lake Analytics Virtual Summit, taking place on October 29th.