Understanding Tableau Data Extracts

By Gordon Rose July 18, 2014

This is the first post in a three-part series that will take a large amount of information about Tableau data extracts, highly compress that information, and place it into memory — yours.

Or better yet, it will make that information available to you so you can grab what you need now and come back later for more. That’s much closer to the architecture-aware approach used by Tableau’s fast, in-memory data engine for analytics and discovery.

What is a Tableau Data Extract (TDE)?

A Tableau data extract is a compressed snapshot of data stored on disk and loaded into memory as required to render a Tableau viz. That’s fair for a working definition. However, the full story is much more interesting and powerful.

There are two aspects of TDE design that make them ideal for supporting analytics and data discovery. The first is that a TDE is a columnar store. I won’t go into detail about columnar stores – there are many fine documents that already do that, such as this one.

However, let’s at least establish the common understanding that columnar databases store column values together rather than row values. As a result, they dramatically reduce the input/output required to access and aggregate the values in a column. That’s what makes them so wonderful for analytics and data discovery.

Figure 1 A columnar store makes it possible to quickly operate over the values in any given column

Figure 1 - A columnar store makes it possible to quickly operate over the values in any given column

The second key aspect of TDE design is how they are structured which impacts how they are loaded into memory and used by Tableau. This is a very important part of how TDEs are “architecture aware”. Basically, architecture-awareness means that TDEs use all parts of your computer’s memory, from RAM to hard disk, and put each part to work as best fits its characteristics.

To better understand this aspect of TDEs, we’ll walk through how a TDE is created and then used as the data source for one or more visualizations.

When Tableau creates a data extract, it first defines the structure for the TDE and creates separate files for each column in the underlying source. (This is why it’s beneficial to minimize the number of data source columns selected for extract).

As Tableau retrieves data, it sorts, compresses and adds the values for each column to their respective file. With 8.2, the sorting and compression occur sooner in the process than in previous versions, accelerating the operation and reducing the amount of temporary disk space used for extract creation.

People often ask if a TDE is decompressed as it is being loaded into memory. The answer is no. The compression used to reduce the storage requirements of a TDE to make them more efficient is not file compression.

Rather, several different techniques are used, including dictionary compression (where common column values are replaced with smaller token values), run length encoding, frame of reference encoding and delta encoding (you can read more about these compression techniques here). However, good old file compression can still be used to further reduce the size of a TDE if you’re planning to email or copy it to a remote location.

Figure 2 Compression techniques

Figure 2 - Compression techniques are used to further optimize the TDE columnar store. Each column becomes a memory-mapped file in the TDE store

To complete the creation of a TDE, individual column files are combined with metadata to form a memory-mapped file — or to be more accurate, a single file containing as many individual memory-mapped files as there are the columns in the underlying data source. This is a key enabler of its carefully engineered architecture-awareness. (And even if the term is unfamiliar, you’ve come across memory-mapped files before. They are a feature of any modern operating system (OS). Read more about them here.)

Because a TDE is a memory-mapped file, when Tableau requests data from a TDE, the data is loaded directly into memory by the operating system. Tableau doesn’t have to open, process or decompress the TDE to start using it. If necessary, the operating system continues to move data in and out of RAM to insure that all of the requested data is made available to Tableau. This is a key point - it means that Tableau can query data that is bigger than the available RAM on a machine!

Only data for the columns that have been requested is loaded into RAM. However, there are also some other subtler optimizations. For example, a typical OS-level optimization is to recognize when access to data in a memory-mapped file is contiguous, and as a result, read ahead in order to increase speed access. Memory-mapped files are also only loaded once by an OS, no matter how many users or visualizations access it.

Since it isn’t necessary to load the entire contents of TDEs into memory for them to be used, the hardware requirements — and so the costs — of a Tableau Server deployment are kept reasonable.

Lastly, architecture-awareness does not stop with memory – TDEs support the Mac OS X and Linux OS in addition to Windows, and are 32- and 64-bit cross-compatible. It doesn’t get much better than that for a fast, in-memory data engine. If you’re interested, you can read about other important breakthrough technologies in Tableau here.

Now that you understand that a TDE is what makes a TDE such a technical breakthrough, we’re ready to turn our attention to why to use them and some example use cases. We’ll cover those topics in the next article in this three-part series.


Submitted by Daniel Seisun (not verified) on

This is awesome stuff! I love getting an understanding of the underpinnings of these magical black boxes that make everything run faster.

I was curious though, when the application maps a tde to memory for the first time, if the TDE is massive (say around 2-5 GB) would we potentially see some slowdown on the first run as it gets loaded? Would trying to get it loaded into memory beforehand (some sort of report warmer) be beneficial?

Submitted by Stacey (not verified) on

Very interesting - makes me re-think how I've done the data source for some of my workbooks!

Submitted by Ken Black (not verified) on

Very informative and well-written, Gordon. I'm looking forward to parts 2 and 3. Thanks much! This helps me understand some of the testing results I just completed.

Submitted by Daniel S. on

Great post! I really appreciate the additional insight on what enables tde's to operate so efficiently.

Submitted by David C. on


Submitted by John K. on

Thanks for the post! It is helpful to understand how TDE work!

Submitted by hans (not verified) on

Now I understand why it is sooo fast!

Submitted by KK Molugu (not verified) on

Interesting article Gordon. Always good to know 'Under the hood'.


Submitted by Shankar S. on


This is such a well-written article. I have never had anyone explain columnar store databases as well as the author has done. If anyone can make the subject of data extracts exciting it is this author :) I can tell that the author loves Tableau and it's technology. So do all of us ... :) :)

Submitted by suresh pulapalli (not verified) on

this is an excellent write-up. thanks for the detailed information

Submitted by asdfsdfa s. on

I am working on a rest api that usually outputs reports in JSON and CSV. Is it possible to output data in TDE format using a REST API?

Submitted by Matt L. on

This is a great article. Why am I just now discovering it, I wonder?

Submitted by Uday (not verified) on

Thanks Gordon for the insights! Simple, clean and beautifully written as the visualizations of Tableau.

Submitted by Xavier M. on

That's brilliant ! Thanks for this very informative article Gordon

Submitted by Lokesh (not verified) on

I'm working on a similar task. Did you make it work? Can you share your learnings and understanding?

Submitted by lokesh (not verified) on

I'm working on a similar task (tde file outputs in JSON and CSV) . Did you make it work? Can you share your learnings and understanding?

Submitted by veena (not verified) on

I am interested to know more about this. Is there any limitation for this TDE file? Suppose if the TDE is bigger size then how it will handle? Can you share your learning's and understanding?

Submitted by Gordon R. on

Because Tableau is architecture aware, you can create TDE files that are larger than the amount of available RAM you have. Generally speaking, up through Tableau 8.x, TDEs in the hundreds of millions are performant with somewhere below 500 million rows being closer to the "sweet spot". Customers do successfully run larger extracts, but that is the advice I give my customers. The practical limits are higher with version 9.x - some amazing improvements in the Data Engine are key features in Tableau 9.0.

Submitted by Anonymous (not verified) on

Hello fiends have a nice day

Submitted by hadoop training... (not verified) on

Hi this is raj i am having 3 years of experience as a php developer and i am certified. i have knowledge on OOPS concepts in php but dont know indepth. After learning hadoop will be enough to get a good career in IT with good package? and i crossed hadoop training in chennai website where someone please help me to identity the syllabus covers everything or not??

Submitted by Bala (not verified) on

What is the role of Tableau Server in the data extract operation initiated in Tableau Desktop?

Submitted by Gordon R. on

Hi - you can publish an extract you created in Desktop to Tableau Server. Based on how you configure permissions on the published extract, other users will then be able to connect to it as a data source and build new visualizations based on it. The extract can also be refreshed automatically on a scheduled basis.

Submitted by Joshua M. on

As the extracts are memory-mapped, would storing a TDE on an encrypted or compressed volume have significant performance implications?

Submitted by ste5787 (not verified) on

when i publish an extract on Tableau Server and a user connect to it, will Tableau work on extract in-memory? (in-memory on the server or on the client web app?)

Submitted by Jeremy Patoc (not verified) on

Can you hyperlink parts 2 & 3?

Submitted by Gordon (not verified) on


Submitted by Raphael S. on

Great explanation! helpful to understand correct sizing for the server.

Submitted by Bill Medrano (not verified) on

Hi Gordon! This is the greatest material I have ever come across. I reference it every Server Admin class I teach. Thanks for doing this!

Submitted by raju6452 (not verified) on

This is the right webpage for anyone who hopes to find out about this topic. You know so much its almost tough to argue with you (not that I personally will need to…HaHa). You certainly put a fresh spin on a topic that has been written about for a long time. Great stuff, just excellent!
Hadoop Online Training

Submitted by raju6452 (not verified) on

This is the right webpage for anyone who hopes to find out about this topic. You know so much its almost tough to argue with you (not that I personally will need to…HaHa). You certainly put a fresh spin on a topic that has been written about for a long time. Great stuff, just excellent!
Hadoop Online Training

Submitted by Poorab P. on

This article for me demystified the Tableau performance. Thanks a ton Gordon.

Submitted by Deepika (not verified) on

Awesome post.I was trying to find what is data extract for long time.This explains clearly.Great.PLease post technical stuffs which people work without knowing what is that.Example is TDE....can u explain other formats like TWB,TWBX and other formats.

Submitted by Madhu (not verified) on

Excellent Article and good resources. I found this an informative blog on Tableau Tutorial Guide at ExcelR. May this would help you.

Submitted by Lakshmi (not verified) on

Thanks for such a comprehensive post. Tableau is very interactive dashboard and gives dynamic results; the users directly connect to the database and it so easy to access data lively.
More at https://www.excelr.com

Submitted by Manda Albert (not verified) on

my friend was searching for a form earlier this week and was informed about an online platform that hosts 6,00Manda Albert0,000 forms . If you are interested in it too , here's a http://goo.gl/yEJbMF

Submitted by prem (not verified) on

superb...awesome Gordon Rose

Submitted by hadoop online t... (not verified) on

Hadoop online training in hyderabad.All the basic and get the full knowledge of hadoop.
hadoop online training in hyderbad

Submitted by Guest (not verified) on

If part2 and part3 of this series have already been published, please teach us links. Thanks

Add new comment

non-humans click here