Best practices for tidy data using Tableau Prep
Data can be generated, captured, and stored in a dizzying variety of structures, but when it comes to analysis, not all data formats are created equal.
Data preparation is the process of cleaning dirty data, restructuring ill-formed data, and combining multiple sets of data for analysis. It involves transforming the data structure, like rows and columns, and cleaning up things like data types and values. The speed and efficiency of your data prep process directly impacts the time it takes to discover insights. Understanding the scope of data you’re analyzing and seeing the changes you make to the data can accelerate the entire process.
Think about your data holistically
Before you get started, it’s important to think about how people will use the data that you’re preparing. Understanding this context will help you determine which data set to use, how much data to bring into your data prep tool, and how to ultimately structure and shape the data. To get started, you'll need to answer some basic questions:
Know the basic structure of your data
Now that you understand how the data will be used, who will use it, and where it lives, it’s essential to understand how it’s constructed. You would never do a home remodel without first knowing the location of your load-bearing walls. Similarly, you don’t want to start data prep without knowing which fields are dependent on or related to each other, how the data was input (i.e. manual versus automated), or the level of detail. Knowing your data structure lets you develop the blueprint before you move forward in the data prep process.
Keep track of your steps
Staying organized throughout your preparation process is essential when you need to revisit and make a change to some step in the process. While you don’t need to follow a specific set of instructions to clean your data (in fact, you should prepare the data in a way that makes sense to you), your data prep process will be a lot easier to edit and update if you know where you made changes.
Spot check throughout
It’s important that you’re cognizant of what is happening to the data as you clean and make changes to it. You don’t want to get too far down the process only to realize you joined the wrong two fields. This goes back to knowing your data. If you have a good sense of what the data should look like, these spot checks will be easier to recognize when something isn't right.
Data preparation is an ongoing process. It’s not over once you have corrected all the misspellings or joins. When the data set updates, your questions may change or you may find that you need to add another field. With Tableau Prep’s “Open sample in Tableau Desktop” feature, it’s easy and seamless to test how the data appears down the line in the analysis portion of your journey.
Run the flow and start the analysis
Now that you’ve cleaned, restructured, and filtered your data, it’s time to make sense of what it's telling you. Unlike many data prep tools, Tableau Prep integrates into your full business intelligence platform. Publish the extract to Tableau Server or Tableau Cloud so that others can start their analysis. Bring it into Tableau Desktop to start asking and exploring deeper questions. You’ve just finished the most laborious part of the data analysis process. Now it's time to unleash the fruits of your labor—the insights!