Tableau Cloud tips: Extracts, live connections, & cloud data
In this post, we’ll dive into the difference between data extracts and live connections, and when to use them. We’ll also look at publishing data sources to Tableau Online.
Editor’s Note: Tableau Online is now Tableau Cloud.
This post is part of our series covering tips, tricks, and ideas in Tableau Online, our cloud collaboration and sharing platform.
In our last tip, we gave you five quick ways to get up and running with Tableau Online. In this post, we’ll explore the difference between data extracts and live connections, and when to use them. We’ll also look at publishing data sources to Tableau Online.
Data extracts vs. live connections
“Extract” is a word you’re going to hear a lot in Tableau. Extracts are one of the most powerful but overlooked tools in Tableau’s arsenal. Tableau Data Extracts are snapshots of data optimized for aggregation and loaded into system memory to be quickly recalled for visualization. Extracts tend to be much faster than live connections, especially in more complex visualizations with large data sets, filters, calculations, etc. For a deep dive into how Tableau extracts are created, check out Gordon Rose’s fantastic blog post on the subject. When you create an extract from a local file (such as a .csv or an Excel workbook) or an on-premise database, you’re speeding up the workbook through optimization. As a result, Tableau doesn’t need the database to build the visualization. Instead, Tableau’s in-memory data engine queries the extract directly. However, because an extract is a snapshot of the data, the extract will need to be refreshed to receive updates from the original data source, whether it is a local file or an on-premise database. Live connections offer the convenience of real-time updates, with any changes in the data source reflected in Tableau. But live connections also rely on the database for all queries. And unlike extracts, databases are not always optimized for fast performance. With live connections, your data queries are only as fast as the database itself. There are also more variables at play when using a live connection. Workbook speeds are affected by a variety of factors, including your network speed, traffic on that network, and any custom SQL.
An extract or a live connection—which to use?
Both types of connections have their place. Hospitals that monitor incoming patient data need to make real-time decisions. These situations necessitate a live database connection. But in the same hospital, there may also be visualizations that monitor daily or weekly trends. For these analytics, using an extract of the data source helps build a faster workbook. A common misconception is that Tableau Online can only use data extracts. While that once was the case, we’ve made strides in offering live connections for common cloud data sources. Tableau Online currently supports live connections to the following cloud-hosted data sources:
- Amazon Redshift
- Amazon Aurora
- Google BigQuery
- Google Cloud SQL
- Hive and Impala on Amazon Elastic MapReduce
- HP Vertica
- Microsoft SQL Server
- Microsoft Azure SQL Data Warehouse
- Microsoft Azure Database (Marketplace DataMarket)
- SAP HANA
- Spark SQL
Additionally, you can use data from web applications in Tableau Online. With a scheduled extract refreshes, workbooks can connect to data from the following cloud applications:
- Google Analytics
- Google Sheets
- Quickbooks Online
All other connections will use extracts and our Online Sync Client to keep data fresh, but we’ll cover that in a future post.
Why use published data sources?
Now that you know what kind of data connections are in your arsenal, let’s talk about how to manage those sources. Each Tableau Online site and project has a Data Sources tab which shows all the data connections published to that area of Tableau Online. You’ll see whether the connection is live or an extract, and what the originating data source is. In the screen capture above, we have an extract originally based on an Excel file ("weekly sales update”) and a live connection to a cloud-hosted Amazon Redshift database ("coffee store”). Publishing data sources gives you a centralized, managed location where users can access data. People are no longer required to establish connections to databases themselves. Instead, publishing the data source connection provides simple and secure access through a user’s Tableau Online account. Publishing a data source to Tableau Online also captures any metadata you’ve built in Tableau Desktop. If you created new calculated fields, groups, sets, or hierarchies in the data pane of your workbook, all these modifications will be reflected in the data source published to Tableau Online. We’ve found this helpful in curating easy-to-use data sources for organizations.
How to publish a data source
Say you have a cloud-hosted Amazon Redshift database with one main account, but you want all your users to have access to the database for use in Tableau. You’ll want to publish the data source to Tableau Online with your Redshift login credentials embedded. 1. First create a new connection to the data source in Tableau Desktop. 2. Choose the data you want to bring into Tableau. 3. Sign into Tableau Online in the Server menu, using the address online.tableau.com. 4. In the same menu, publish the data source. 5. Choose the project in which you want the data source to live. You can also add a name, tags, permissions, and authentication. In this case, I’m going to choose embedded credentials so my users won’t need to enter credentials every time they use the connection. 6. Now we have our data source hosted on Tableau Online!
How to connect to a published data source
On the user end, connecting to the published data source is extremely simple. 1. In Tableau Desktop, choose “Tableau Server” as the database and enter “online.tableau.com” as the server URL. 2. Choose the published data source from the menu. 3. You’re ready to create a viz! If you regularly use more than one database, published data sources allow for easy organization and connection to any number of databases. In an upcoming post, we’ll take an in-depth look at keeping your data up to date. Until then, check out some of our current resources including our Product Help and our Tableau Server and Tableau Online administration community. And if you haven’t yet tried Tableau Online, start your free trial today!