Tableau and data science
Since its inception, the Tableau mission has been to help people make better decisions by allowing them to see and understand their data. This includes helping organizations solve one of the most pressing challenges in data science today: getting the value of advanced analytical insights into the hands of business decision makers. Industry analysts report that many data science efforts fail to deliver return on investment, a key reason being the communication gap between data science teams and business decision makers. Communicating the process and results of data science work in Tableau can close the gap with accessible interactivity and exploration for business stakeholders.
In this post, I’m excited to share the latest Python integration features we’ve built to support data science and diverse analytics environments in Tableau.
Python is a ubiquitous tool for analysts and data scientists across industries for applications from cleaning and shaping data to implementing cutting-edge machine learning algorithms. To better support data science integrations in Tableau at scale, our team has been working to expand the features and security of our Python server, TabPy. Tableau has supported dynamic integration with Python via TabPy in Desktop and Server since version 10.3 and in Prep since version 2019.3. You can find great use cases for integrating Python in Tableau in:
- Sessions from TC18 and TC19
- Blog post on scripting in Tableau Prep
- How-to guide for understanding and using table calculations
Since the initial release, we have added improvements to TabPy based on your requests and feedback. Now, with the release of Tableau 2020.1, we are happy to officially designate TabPy as a 1.0 release indicating it is an officially supported Tableau product and is ready for scaled up use. Read on to learn more about the features available in TabPy 1.0.
In its initial iteration, TabPy was installed as a Python pip package called tabpy-server and, by default, required an installation of the Anaconda data science framework. TabPy also required a second package for deploying functions called tabpy-client. To make this process easier and more streamlined for our users, we’ve combined the functionality of both packages into a single package called tabpy and removed the dependence on Anaconda. TabPy running in an Anaconda virtual environment is still a great solution, but it can be easily installed in other Python setups as well. To install TabPy today on any machine with a Python 3.6+ environment, simply run:
To start the TabPy server, from the command line run:
Once TabPy is running you can connect Tableau Desktop by navigating to Help->Settings and Performance->Manage External Service Connections and entering your connection information:
In Tableau Server, a connection can be configured by running the TSM security command.
Pre-built statistical functions
Once TabPy is installed and the server is running, you can install a library of pre-built statistical functions from the same machine, using the simple command line command:
These functions include analysis features like Principal Component Analysis (PCA), Sentiment Analysis, a t-test, and ANOVA. Once installed, any of these functions can be called by name by any Tableau Desktop or Server connected to TabPy. In the following example, the t-test function is used for web A/B testing:
The tabpy_tools library that ships with TabPy allows you to define and deploy your own Python functions, including scoring with machine learning models. To try it yourself, simply use these instructions.
Secured connections and authentication
TabPy has support for secure connections on HTTPS using SSL and username and password authentication using basic authentication. Secured connections can be configured in the TabPy configuration file as shown here. Starting with Tableau 2020.1, Tableau Desktop and Server will read SSL certificates from the OS keystore and not require a certificate to be specified in Tableau. Authentication is configured through a utility included in the tabpy package and is documented here.
TabPy can be started with custom configuration settings that are defined in a configuration file that is specified on starting the server. Find the specifications for the configuration file and a sample here. Configurable features include SSL, Authentication, Logging, Max Data Size, and Timeout. To start TabPy using a custom configuration, add the config startup parameter as in this example:
We’ve expanded TabPy’s logging features to support auditing of Python code run against the server and tracking which users ran what code. When connected to Tableau Server, this can be set to record the Server user’s Tableau username. Find instructions for configuring logging here.
With all of these features we’ve made dynamic Python in Tableau more flexible and powerful than ever before. We’re always looking at what’s next though, so please reach out with your questions and feedback.