By Gordon Rose August 15, 2014

This is the last post in a three-part series about Tableau data extracts. In the first post, we looked at how Tableau data extracts are built and used by Tableau. In the second, we looked at why you would want to use TDEs and some sample use cases. In this last post, we will look at tips, techniques and best practices.

Data Extracts!

Some General Things to Keep in Mind:

  • If you hide fields in Tableau after creating an extract and before creating an incremental or full refresh, the fields will continue to be hidden but are not physically removed from the extract. The hidden fields can be unhidden at any time.
  • Tableau Desktop uses the location pointed to by the system’s %TEMP% environment variable for all temporary files used in the process of creating extracts. Tableau Server uses the folder :\ProgramData\Tableau\Tableau Server\data\tabsvc\temp.
  • Tableau data extracts attempt to retain the case sensitivity of the source data. For instance, string comparisons are by default case insensitive in SQL Server but case sensitive in Oracle. Extracts based on those data sources would match their respective case sensitivity for the purpose of making string comparisons.
  • A user with the permission to schedule an extract refresh can change the type of the refresh from incremental to full. To prevent users from being able to schedule full extract refreshes during business hours, make sure to configure your schedules accordingly.
  • A Tableau extract written to a contiguous area of free space will be more performant than one written in fragments across a hard drive. Regularly defragmenting a Tableau Server’s hard drive or using SSD drives are two ways to insure that extract files are contiguous to the extent possible.
  • As of version 8.1, the Tableau Data Engine API can be used to create both full and incremental extracts.
  • In a Tableau Server environment, it’s important to make sure that the backgrounder has enough disk space to store existing Tableau extracts as well as refresh them and create new ones. A good rule of thumb is the size of the disk available to the backgrounder should be two to three times the size of the extracts that are expected to be stored on it.
  • Tabcmd (a command-line utility) can be used to refresh extracts, as well as to publish TDEs to Tableau Server

Tips for Incremental Refreshes

  • Incremental extracts retrieve only new records from the data source, reducing the amount of time required to create an up-to-date extract. If there are no new records to add during an incremental extract, the bulk of the processes associated with performing an incremental extract still execute. Incremental extracts become less performant over time. This is because, by definition, incremental extracts only grow in size, and as a result, the amount of data and areas of memory that must be accessed in order to satisfy requests only grow as well. In addition, larger files are more likely to be fragmented on a disk than smaller ones.
  • When performing an incremental extract, records are not replaced. Therefore, using a date column such as “Last Updated” to drive an incremental refresh could result in duplicate rows in the extract.
  • When creating an incremental refresh against an Excel data source, only Date columns will be available to use for defining new rows. This is because Excel is not strongly typed, and Tableau to does know with certainty that a column that contains numbers contains only integers.
  • Incremental refreshes are not possible after an additional file has been appended to a file based data source because the extract has multiple sources at that point.
  • When publishing an extract that will not or should not be refreshed, connect directly to the extract file as a data source before publishing.

Tips for Aggregated Extracts

  • Use caution with aggregations such as COUNTD or other non-additive aggregations. In order for Tableau to be able to recalculate a distinct count, it has to bring all of the unique members of the base field into the extract. For instance, if an aggregated extract included COUNTD(Customer_ID), the group by clause of the SQL query used to retrieve data from the underlying source would include Customer_ID.
  • Use caution with row level calculations that involve a parameter. Since the value of the parameter can change, Tableau includes all fields referenced in row level calculations on the extract’s Level of Detail.
  • “Number of Records” is special in aggregated extracts. Its default aggregation is SUM. If using Number of Records after creating an aggregated extract, it will show the number of records in the underlying data source - i.e. the original number of records. This is actually very useful for many types of secondary calculations that might depend on knowing the original rows in the underlying source. To know how many rows are in the extract, simply change the default aggression to COUNT.

Data Source Considerations

  • When extracting data from SAP BW, be aware that some limitations exist.
  • When creating an initial extract from a data source, Tableau retrieves all objects and the extract creation can be time consuming – several hours in some cases. Allow the initial extract to complete and use incremental refreshes to keep the data up-to-date.

In summary, be sure to keep in mind these simple best practices:

  • Use full refreshes when possible
  • Incrementally refreshed extracts should be fully refreshed at regular intervals (e.g. every weekend or monthly) in order to maximize performance
  • Publish extracts to Data Server to help avoid redundant extracts in the Tableau Server environment
  • Hide unused columns before creating an extract in order to speed extract creation and to preserve storage space
  • Make sure there is enough contiguous free disk space for the largest extract in use in order to optimize extract performance. One way to achieve this is to use SSD drives as they do not become fragmented (a function of how data is retrieved from an SSD, not of where it is stored).

Hopefully these posts have helped you to better understand TDEs and as a result, to make more effective use of them. Many thanks to Marc Rueter, Russell Christopher and David Spezia from Tableau for their very valuable contributions to this series of posts.

Happy Vizzing!


These are some very useful tips. I especially had not thought about defragging / SSD and extract performance.

I am a little confused about the final tip under "Incremental Refreshes". What does that mean that connecting live, then extracting, then publishing results in only the live connection being published? That doesn't seem right, or maybe I'm mis-reading it. Wouldn't a live, then extracted connection be published as a .tdsx with the data extracted but the original connection to the source remembered? Isn't that essential (versus publishing a connection to a .tde) if the extract is going to be refreshed (full or incremental) on Server?


Hi Joshua, you're right - the wording wasn't quite what I was looking for so I updated it to make it clearer. Thanks for your comment!



Is there a recommendation about the maximum number of columns that should be in a TDE? Does TDE performance begin to falter as more columns are added?



Please provide better link from post1 & post 2 to this post.

hi Gordon,

On tableau server, how do I refresh an extract that's not part of a schedule? If it is part of a schedule, I can go to the "task" and select "run now" but I couldn't find a way of doing if it isn't.


Hi Mohit, the biggest impact is on creation time. It is a recommended best practice to hide unused columns before creating/refreshing an extract.

Hear, hear!


How to edit a saved/published extract using Tableau Desktop 8.3? For example, i would like to add one more table/view to the extract using a JOIN. How would i do this?

Hi Naizam,

Assuming you have the permission, you can download the extract from Tableau Server as a .tdsx file. You can then open an instance of Tableau Desktop and connect to the .tdsx as a data source. In Tableau Desktop, you would right-click on the data source and uncheck Use Extract. You can then modify the joins, etc., recreate the extract and publish back to server.

Thanks for the excellent sharings!

I have one question which as per my knowledge is not possible but if there is any workaround please tell me. The question is that - Can we create another extract from an already existing extract? For e.g. A large data set and then smaller datasets from it.

Hi Experts,

Please help me on this strange issue, We have Tableau reports connected to salesforce objects. When published on server, we have used Full refresh of the objects extracts. The connection strings are embedded while publishing. But full refresh is not working in tableau server. It shows error as Out of Date.

Anyone come across this issue. Please help on this.


Thanks for this concise and informative 3-part article. It provided some great insights to help me decide when & how to use extracts vs. live data connections.

I hate to do this, but I should point out a typo above under "Tips for Aggregated Extracts" - the last sentence refers to "default aggression" vs. "default aggregation". Don't you love spell check?!

Thanks again - I look forward to reading more of your posts.


I there a way to keep the sort ascending / descending buttons on the top ribbon when publishing a dashboard on the server?
When using the filters the sorting gets distorted and these buttons offer an easy and neat way of re-sorting in the correct order.
Many thanks for any tips!

Thank you for this article. Is it possible to apply row level security on an extract without using filters within the workbook (which can be easily removed by a user)?

Hi Gordon,
Great articles about extracts. My question is a bit unorthodox- is it possible with the API to programmatically open an existing extract and iterate through it? i.e. some other developer publishes an awesome TDE and I want to get it into my Data Warehouse table. Can I write a python script to open the extract, inspect each row, and INSERT each (or bulk load) to a sql database?
Many thanks,

I have a question regarding the extract refresh. I created an extract from a table connecting to SQL Server. Published the extract to tableau server. But I'm unable to refresh the data. I'm sure, I'm doing some basic mistake. Kindly help If anyone understand.



thank for this article. But also I have a question to the section "Tips for Incremental Refreshes".
Does anyone of you has an article that can explain this process exactly?


How can I use Extracts and apply data Security on them?
What I mean is, I have a Data Extract for a report. I need the logged in user to ONLY be able to see a specific set of data from that extract on the report.


I am using the tableau Desktop and Tableau online.
and using Sql Server stored procedure with Date parameter,

When I ran the report in live data connection then Date parameter fetch the data accordingly but when I changed connection to Extract then my report parameter not fetch the data.

Means My report parameter not working in Extract connection.

Please suggest me how to resolve this problem.

Same Problem I am facing with Sql Server stored procedure parameter


I'm getting the following error when trying to refresh an extract:

The query time resource limit (10800 seconds) was exceeded.

It's about 50m rows and 16 columns, runs from a sql stored procedure. sql query runs in about 40 mins, but tableau server is timing out at 3 hrs. It's running on a virtual box with 64GB RAM, with 8 cores.

any help MASSIVELY appreciated!


I have the same issue Ben. What I have learned from searching and trial and error, is that you should not connect a Tableau server to Sql directly. You should use Extracts(TDE) from the Sql. Tableau and SQL, don't play well together.

Thanks for your reply Scott. My scheduled refresh is to create an extract within Tableau server. I am not using a "live" feed into SQL. Are you saying I shouldn't use a SQL connection to refresh an extract, if so, what technology are you using to get round it?


Tell me this - is the extract created from the database or from the workbooks?

Tell me this - is the extract created from the database or from the workbooks?

Is the extract created from the database or from the workbooks? We were trying to watch it build from the database and do not see a login.

Is the extract created from the database or from the workbooks? We were trying to watch it build from the database and do not see a login.

Is the extract created from the database or from the workbooks? We were trying to watch it build from the database and do not see a login.


Is there a way to chose the ORDER of the columns of the .csv extract?


Whenever you are using with extract you need to refresh data source from Tableau Desktop.then it will work else does not. For live connection it is working fine.

I want to make an extract, but i have a lot of data almost 142000000 rows, and the extract doesnt do. Some recomendations ?
because when i work in the live mode is very slow, one consult takes like 20 minutes to run.

Ben - not sure if you've resolved this now, and don't know the details but I do believe there is some 'different' behaviour if you use Stored Procedures and that's what's causing the lengthy process... I have a vague recollection of it calling the procdure at least twice during the extracaton process and if its taking 40 mins just to run once.... If you can use SQL views instead you shouldn't have a problem.

hide the fields which is of no use then try to create the extract hopefully it will work

Add new comment