Amazon Redshift is a popular data source for Tableau Desktop and Tableau Server which empowers people to analyze datasets into the millions and maybe even billions of rows. It is also the most popular datasource for Tableau Online customers. The aim of this page is to provide resources and help you to understand how to optimize Tableau and Redshift so that you can provide good performance for your end users. This article is broken down into 3 sections:

  • Design – Dashboard and visualization design is one of the most important factors for performance.
  • Monitoring – Being able to effectively monitor what is happening under the covers helps guide tuning.
  • Best Practices – High level considerations for Redshift and Tableau

These sections are supplemented with links to many resources for further reading. Before we go further here are some excellent resources to start with:

Dashboard and Visualization Design

Design of your visualizations is one of the most important performance related factors. Following best practices and tweaking dashboard design can provide significant improvements in performance.
The Designing Efficient Workbooks whitepaper is the go to resource for best practice dashboard design.

It is a long but very insightful document, some key points for efficient design include:

  • Trying to show too many charts or too much data may cause slow performance, keep it simple: show data at a high level then let your users drill down into the details rather than showing everything at once and then letting them filter.
  • Work with only the data you need. Filter up front if you can, aggregate to the level you need if dealing with large granular datasets. This will allow Tableau to generate better faster queries.
  • Different calculations have different performance characteristics, generally working with numbers or Booleans will be faster than working with strings or dates, like basic, LOD and Table Calculations perform differently
  • Use the performance recorder to guide you on your analysis of performance.

Monitoring Performance

The first step to improving performance is to understand where the performance issues are occurring, for example is it the query being provided to Redshift or is it Tableau table or layout calculations. Tableau has a built in Performance Recorder to help you begin to investigate issues and below we document some further tools and techniques for monitoring performance. Please note that some of the tools below are open-source or 3rd party-projects and are not officially supported by Tableau.

Creating a performance recording is straight-forward, see the following links for a how-to guide:

Additionally, learning how to read and interpret Redshift Query Plans is important. Understanding the steps Redshift is undertaking during a query is important to key where to start tuning.


Options for finding the queries generated by Tableau against Redshift

  • In a Tableau Performance Recording: Selecting a query (green bar) in a Tableau Performance Recording will show you the Query run against Redshift
  • The query can be pulled directly from the tabprotosrv.txt log files, use Tableau Log Viewer (https://github.com/tableau/tableau-log-viewer) to make life easier and look for the end-query events
  • LogShark: LogShark is an analysis tool for Tableau Server logs that makes it straightforward to analyze the performance of queries over time so that you can focus on the most expensive queries.
  • A guide to LogShark for Redshift


Options for understanding Tableau queries on the Redshift side

  • Disable Tableau’s use of cursors for investigative purposes and view the queries directly in the console.
  • Leave cursors enabled and query Redshift system tables STL_QUERY/STL_QUERYTEXT/STV_ACTIVE_CURSORS to find the queries run by Tableau
  • More detailed information
  • Redshift Monitor – Consider using Redshift monitor if you would like to capture, view and analyze lots of information about Tableau interaction with Redshift and and see this video for an introduction to Redshift monitor as well as this blog post.

Redshift Best Practices

There are best practices and basics that need to be considered when deploying and scaling Redshift databases to help provide good performance. The below are summaries and specific requirements / choices for your environment may differ so please read the Amazon docs for further information. This section does assume some working knowledge of Redshift and database concepts in general.

Great starting resources on this topic can be found below:


Redshift best practice considerations

  • Distribution Styles – This defines how data is spread across the cluster. Distributing so that fact and dim data end up on the same node for a given key help avoid cases where Redshift has to go across the network for data. Smaller dimension tables may be suitable for the ALL distribution style so that a copy resides on all nodes. Further reading.
  • Sort Keys – The order data is stored on disk. The helps Redshift filter and join efficiently therefore columns that are likely to be filters or in join clauses make good candidates for sort keys. See Choosing sort keys for more guidance
  • Analyze and Vacuum – These steps help the query planner in Redshift understand statistics about the data and ensures that data is correctly sorted on disk. These are especially important after significant changes/loads/additions to the database. See reclaiming space and analyzing tables.
  • Use Compression – Columns that are appropriately compressed will result in faster queries. Further reading: compressing data on the disk and loading tables with auto-compression.
  • Concurrency – Redshift will default to a single workload management queue on launch which allows for up to 5 concurrent queries to be run. You can increase this limit or add additional queues, AWS current guidance is 15 concurrent queries maximum. In Tableau 9 we were conservative with the number of queries we would fire at Redshift and would limit concurrent queries to 2, in Tableau 10 we increased this to 8.


Schema design, cluster design and other considerations

  • Schema design – Express relationships using primary and foreign keys to allow Tableau to implement Join Culling.
  • Schema design – Columns you JOIN should use the NOT NULL constraint in their host table. Tableau will generate additional query complexity to validate the data when there is a chance NULLs may exist in the join.
  • Schema design – or denormalize complex schemas. Consider merging small dim tables into the fact table.
  • Schema design – aggregate tables or aggregate extracts. Tableau makes it simple to work with multiple connections to data sources in a single dashboard, this can significantly improve perceived performance for large datasets. A tiered approach to working with very large datasets can be useful.
  • Keep column width only as wide as it needs to be to reduce memory usage.
  • Cluster Design – Refer to Amazon documentation and work with their team to determine the appropriate cluster size
  • Cursors and Extracts – By default, Tableau leverages cursors with Redshift. Using cursors causes your Tableau resultset to be materialized on the leader node. The maximum cursor resultset size is dependent on the cluster node size and may be exceeded for large extracts causing the extract to fail. In this case larger nodes may be necessary or an option for large extracts is to UNLOAD the data into s3 and make it available to Tableau through something like EMR.

You might also be interested in...