This is the first post in a three-part series that will take a large amount of information about Tableau data extracts, highly compress that information, and place it into memory — yours.

Or better yet, it will make that information available to you so you can grab what you need now and come back later for more. That’s much closer to the architecture-aware approach used by Tableau’s fast, in-memory data engine for analytics and discovery.

What is a Tableau Data Extract (TDE)?

A Tableau data extract is a compressed snapshot of data stored on disk and loaded into memory as required to render a Tableau viz. That’s fair for a working definition. However, the full story is much more interesting and powerful.

There are two aspects of TDE design that make them ideal for supporting analytics and data discovery. The first is that a TDE is a columnar store. I won’t go into detail about columnar stores – there are many fine documents that already do that, such as this one.

However, let’s at least establish the common understanding that columnar databases store column values together rather than row values. As a result, they dramatically reduce the input/output required to access and aggregate the values in a column. That’s what makes them so wonderful for analytics and data discovery.

Figure 1 A columnar store makes it possible to quickly operate over the values in any given column

Figure 1 - A columnar store makes it possible to quickly operate over the values in any given column

The second key aspect of TDE design is how they are structured which impacts how they are loaded into memory and used by Tableau. This is a very important part of how TDEs are “architecture aware”. Basically, architecture-awareness means that TDEs use all parts of your computer’s memory, from RAM to hard disk, and put each part to work as best fits its characteristics.

To better understand this aspect of TDEs, we’ll walk through how a TDE is created and then used as the data source for one or more visualizations.

When Tableau creates a data extract, it first defines the structure for the TDE and creates separate files for each column in the underlying source. (This is why it’s beneficial to minimize the number of data source columns selected for extract).

As Tableau retrieves data, it sorts, compresses and adds the values for each column to their respective file. With 8.2, the sorting and compression occur sooner in the process than in previous versions, accelerating the operation and reducing the amount of temporary disk space used for extract creation.

People often ask if a TDE is decompressed as it is being loaded into memory. The answer is no. The compression used to reduce the storage requirements of a TDE to make them more efficient is not file compression.

Rather, several different techniques are used, including dictionary compression (where common column values are replaced with smaller token values), run length encoding, frame of reference encoding and delta encoding (you can read more about these compression techniques here). However, good old file compression can still be used to further reduce the size of a TDE if you’re planning to email or copy it to a remote location.

Figure 2 Compression techniques

Figure 2 - Compression techniques are used to further optimize the TDE columnar store. Each column becomes a memory-mapped file in the TDE store

To complete the creation of a TDE, individual column files are combined with metadata to form a memory-mapped file — or to be more accurate, a single file containing as many individual memory-mapped files as there are the columns in the underlying data source. This is a key enabler of its carefully engineered architecture-awareness. (And even if the term is unfamiliar, you’ve come across memory-mapped files before. They are a feature of any modern operating system (OS). Read more about them here.)

Because a TDE is a memory-mapped file, when Tableau requests data from a TDE, the data is loaded directly into memory by the operating system. Tableau doesn’t have to open, process or decompress the TDE to start using it. If necessary, the operating system continues to move data in and out of RAM to insure that all of the requested data is made available to Tableau. This is a key point - it means that Tableau can query data that is bigger than the available RAM on a machine!

Only data for the columns that have been requested is loaded into RAM. However, there are also some other subtler optimizations. For example, a typical OS-level optimization is to recognize when access to data in a memory-mapped file is contiguous, and as a result, read ahead in order to increase speed access. Memory-mapped files are also only loaded once by an OS, no matter how many users or visualizations access it.

Since it isn’t necessary to load the entire contents of TDEs into memory for them to be used, the hardware requirements — and so the costs — of a Tableau Server deployment are kept reasonable.

Lastly, architecture-awareness does not stop with memory – TDEs support the Mac OS X and Linux OS in addition to Windows, and are 32- and 64-bit cross-compatible. It doesn’t get much better than that for a fast, in-memory data engine. If you’re interested, you can read about other important breakthrough technologies in Tableau here.

Now that you understand that a TDE is what makes a TDE such a technical breakthrough, we’re ready to turn our attention to why to use them and some example use cases. We’ll cover those topics in the next article in this three-part series.

You might also be interested in...

Comments

This is awesome stuff! I love getting an understanding of the underpinnings of these magical black boxes that make everything run faster.

I was curious though, when the application maps a tde to memory for the first time, if the TDE is massive (say around 2-5 GB) would we potentially see some slowdown on the first run as it gets loaded? Would trying to get it loaded into memory beforehand (some sort of report warmer) be beneficial?

Very interesting - makes me re-think how I've done the data source for some of my workbooks!

Very informative and well-written, Gordon. I'm looking forward to parts 2 and 3. Thanks much! This helps me understand some of the testing results I just completed.

Great post! I really appreciate the additional insight on what enables tde's to operate so efficiently.

Awesome!!

Now I understand why it is sooo fast!

Interesting article Gordon. Always good to know 'Under the hood'.

..kk

EXCELLENT WRITEUP ON TABLEAU DATA EXTRACTS

This is such a well-written article. I have never had anyone explain columnar store databases as well as the author has done. If anyone can make the subject of data extracts exciting it is this author :) I can tell that the author loves Tableau and it's technology. So do all of us ... :) :)

this is an excellent write-up. thanks for the detailed information

This is a great article. Why am I just now discovering it, I wonder?

Thanks Gordon for the insights! Simple, clean and beautifully written as the visualizations of Tableau.

That's brilliant ! Thanks for this very informative article Gordon

I'm working on a similar task (tde file outputs in JSON and CSV) . Did you make it work? Can you share your learnings and understanding?

I am interested to know more about this. Is there any limitation for this TDE file? Suppose if the TDE is bigger size then how it will handle? Can you share your learning's and understanding?

Because Tableau is architecture aware, you can create TDE files that are larger than the amount of available RAM you have. Generally speaking, up through Tableau 8.x, TDEs in the hundreds of millions are performant with somewhere below 500 million rows being closer to the "sweet spot". Customers do successfully run larger extracts, but that is the advice I give my customers. The practical limits are higher with version 9.x - some amazing improvements in the Data Engine are key features in Tableau 9.0.

What is the role of Tableau Server in the data extract operation initiated in Tableau Desktop?

Hi - you can publish an extract you created in Desktop to Tableau Server. Based on how you configure permissions on the published extract, other users will then be able to connect to it as a data source and build new visualizations based on it. The extract can also be refreshed automatically on a scheduled basis.

As the extracts are memory-mapped, would storing a TDE on an encrypted or compressed volume have significant performance implications?

when i publish an extract on Tableau Server and a user connect to it, will Tableau work on extract in-memory? (in-memory on the server or on the client web app?)

Yes

Can you hyperlink parts 2 & 3?

Hi Gordon! This is the greatest material I have ever come across. I reference it every Server Admin class I teach. Thanks for doing this!

Awesome post.I was trying to find what is data extract for long time.This explains clearly.Great.PLease post technical stuffs which people work without knowing what is that.Example is TDE....can u explain other formats like TWB,TWBX and other formats.

superb...awesome Gordon Rose

If part2 and part3 of this series have already been published, please teach us links. Thanks

Superbly written and very informative.

Awesome explanation...

Wonderful piece of information. Thanks!


Add new comment

Subscribe to our blog