New in Tableau Prep: Automatically identify data quality issues with Data Roles

Discover how these features save you time in getting your data ready for analysis.

Let’s face it, data often contains inaccuracies that you have to find and fix—and it would save a lot of time if your data preparation tool could find these data quality issues for you. We are excited to introduce Data Roles in the September release of Tableau Prep (2018.2.3). Tell Prep what your data values represent and it will automatically identify the values that don't match.

This release of Tableau Prep also includes user experience improvements like the ability to directly fix join clauses and get better feedback when you rename or group values. Plus, you can now connect to data in MongoDB.

Let’s see how these features save you time in getting your data ready for analysis.

Identify data quality issues by assigning Data Roles 

When you clean data, you often have to find inaccurate data values that represent real-world entities like country or airport names. This can be a tedious and error-prone process as you validate data values manually or bring in expected values from other data sources. Tableau Prep now recognizes a set of real-world entities including the eight geographic roles Tableau Desktop knows about, as well as email addresses and URLs.

Now you can set up the Data Role of a field to let Tableau Prep know what real-world entity the field represents. Tableau Prep uses this to validate the data values and automatically identify invalid values for you to clean your data. You can filter the field to focus on invalid values and fix them by renaming the values or filtering them out. Data roles help you set up expectations about a data field, so Prep can do the heavy-lifting of analyzing the data quality. We will continue to add roles in upcoming releases and would love your feedback in the forums.

Fix data mismatches directly in a join step

Finding mismatches in a join step is easy in Tableau Prep because they show up as red text. However, previously, you couldn’t directly fix these mismatches in the join step. Now, you can directly edit unexpected values in join clauses that are made up of single or multiple fields, to match them up as appropriate. This helps you stay in the flow of your task.