Why Use Tableau Data Extracts

By Gordon Rose July 31, 2014

This is part two in a three-part series about Tableau Data Extracts. In the first post, we looked at how Tableau Data Extracts are built and used by Tableau. If the content of the first post did not already sell you on the benefits of TDEs, then here are several reasons that Tableau Data Extracts (TDEs) are valuable—even essential—to Tableau users:

7 Reasons for Using Tableau Data Extracts:

  1. Performance

    Data extraction not only offers increased performance when the underlying data source is unacceptably slow, but it also can speed up the performance when the use of CustomSQL slows it down (see here).

  2. Reduced load

    Replacing a live connection to an OLTP database—or any database—with a TDE reduces the load on the database that can result from heavy Tableau user traffic.

  3. Portability

    A TDE can be bundled with Tableau visualizations in a packaged workbook for easy sharing and collaboration.

  4. Pre-aggregations

    When creating a TDE, Tableau gives you the option to aggregate your data for all visible dimensions. This is known as an aggregated extract. An aggregated extract is smaller and contains only aggregated data, as the name implies—not all of the row-level data that is stored in a standard TDE. Accessing the values for additive aggregations in a visualization becomes near-instantaneous because all of the work to derive the values has already been done. So, the most basic reason to use an aggregated extract is performance.

    You can also choose to roll the aggregations up to the selected level—e.g. month, quarter, year, etc. —of one of the date fields in the underlying data source. This further reduces the size of the extract by reducing number of aggregate values stored in the extract, and for that particular level of aggregation, further increases performance. For more information, check out the following articles here and here.

  5. Materialization of calculated fields

    When you optimize a Tableau extract, all of the calculated fields that have been defined are converted to static values upon the next full refresh. At that point, they essentially become additional data fields that can be accessed and aggregated as quickly as any other field in the TDE. The increase in performance can be especially strong when working with string calculations as string calculations are significantly slower than numeric and/or date calculations. So, as was the case with aggregated extracts, the most basic reason to optimize a TDE is again performance.

  6. Publishing to Tableau Public and Tableau Online

    Tableau Public only supports TDEs. While Tableau Online can connect live to cloud-based data sources, TDEs are the most common data source used in that environment.

  7. Support for functionality not available when using MS Jet

    Versions 8.1 and earlier of Tableau use the MS Jet engine for accessing Excel, MS Access and text files. By creating an extract, certain features not supported by Jet—count distinct, for example—can be used. (In version 8.2, Tableau replaced MS Jet for accessing Excel and text files with a new, more performant and functional engine.)

Example Use Cases

Representing all of the possible use cases for TDEs would not be possible in a blog post as short as this one. What follows is meant to give the reader a sense of the unique kinds of things that can be done with TDEs to extend the functionality of Tableau.

  • Compare an aggregate for all rows in an underlying source with the same aggregate for a subset of the rows. By blending a data source with an aggregated extract based on the same data source, you can filter and slice data to compare aggregations of the subset to the entire data set (this can also be done using RAW SQL functions or Custom SQL).
  • Create “double aggregates.” For instance, if the default aggregate for a measure is SUM, creating a pre-aggregated extract would allow you to calculate an AVG of SUMs in the visualization.
  • Build a KPI-style dashboard that combines worksheets based on aggregated extracts with worksheets based on live connections. This design pattern has performance advantages in that KPI-style aggregations are pre-calculated and do not require a live connection, reducing the load on the underlying data source(s). By the way, here’s a nice article from this same blog about KPIs.
  • tableau data extraction
    Fig. 1 A dashboard that combines aggregated extracts and live connections for ease of navigation and performance

    Hopefully this post has given you an even better sense of how and why Tableau Data Extracts can help you use Tableau to see and understand your data. Next week, we’ll wrap up the series with an extensive list of tips, tricks and best practices.


Submitted by Joshua M. on

I've really been enjoying this series and found the first part very helpful in understanding some details about extracts. I also find the use cases presented in this part very useful.

I do feel that reason # 5: Materialization of calculated fields, should have a little more explanation. Not all calculated fields are materialized. Aggregations and table calculations are definitely not. Row-level calculations using user functions or date functions are not. And even some other row-level calculations may not be materialized based on what the optimizer determines.

Submitted by Anand Chandarana (not verified) on

In 8.2, are tde files no longer created and stored in the same folder locations as tbwx files? In the past, when I created a tbwx file, the tde file would always appear in the same folder as the tbwx. Now, it looks like the tde automatically goes to "C:\Users\achandarana\AppData\Local\Temp\TableuTemp" folder. I tried following this article, which didn't help: http://kb.tableausoftware.com/articles/howto/changing-the-file-path-for-extracts?lang=en-us
Haven't tried this one yet, because I think I'd need Admin access on my laptop from our corporate-IT: http://kb.tableausoftware.com/articles/knowledgebase/default-location-when-creating-extracts?lang=en-us
The bigger issue I'm having related to this is that now when I go back into my source Excel data files and update data, the tbwx doesn't pull in new columns (of text formatted as General in Excel) properly. It will pull in the header for the columns I add, but not the data below it, which all comes in as null.
What's even more bizarre is that when I take that same column and put it on a new Excel tab with just the unique identifier, Tableau will recognize that new spreadsheet tab and pull in the new column of data just fine.

Perplexed in Chicago

Submitted by stephen b. on

I have use cases where most user activity on a dashboard could use extract aggregate data, but, occasionally, those users might need to drill to row-level detail. Is there an easy way to design a dashboard so that the transition between using an extract and drilling to row level int he original data source is transparent to the user?

Submitted by Uday (not verified) on

Loving it! Thank you again and I am eagerly awaiting for the final post in this series.

Submitted by Mo (not verified) on

This was very informative. It would be nice to know if there are situations where you wouldn't recommend using TDEs. Because TDEs tend to like flat files there might be some more complicated data models where they may not be appropriate.
Any suggestions on whether TDEs should be used when the dataset contains 1B records and:
1. Number of columns: Is there a recommended max limit? What if it would contain 150+ columns?
2. Complicated model: Needs to resolve many-to-many relationships
3. Data modification: Underlying source data experiences inserts/updates/deletes.
4. Limited Refresh window: Refresh needs to complete within 12 hours.

What we'd like to do is build some kind of decision matrix that will help guide us to make the appropriate choice to get the best performance from Tableau using an appropriate data source that works with the constraints for that project.

Submitted by Gordon R. on

In version 8.x I recommend steering away from extracts with 1B or more rows. Stay tuned for firm guidance around that topic and Tableau version 9.x.

To speed refresh time, be sure to eliminate (hide) any columns in the data source you don't need before creating/refreshing the extract.

Submitted by Gordon R. on

You are correct, Joshua. Thanks for posting those clarifications.

Submitted by Guest (not verified) on

These posts are very informative. I was unable to find the 3rd part of the series. Can anyone please post the link here? Or has it not been posted yet?


Submitted by Miguel C. on

Thank you. Excellent post !!

Submitted by Lena (not verified) on

WHat is the link to Part 3? please

Submitted by Lena G. on

Apologies, it was right above and I missed it!

Submitted by srinidhi (not verified) on

Tableau Data Extraction explained very well thank you for sharing the post. i do practice on this thank sooo much. who are want to learn tableau online training i suggest you this site

Submitted by geetha (not verified) on

when i am clicking on aggregated extracts in the pre-aggregations it is telling like "file not found". Could you please help me out to get the more information

Add new comment

non-humans click here