Self-service data prep is an extension of the same paradigm as self-service analytics: allowing people to answer their own questions with data that was previously inaccessible. Adding additional data sources helps answer further questions and allows the data worker to provide more context to the consumers of their analyses. You can greatly reduce your analysis time when you can produce the datasets yourself. You can also more easily validate data sets to be sure that you are accurately reflecting reality. Everyone wins!
This is a lesson we’ve learned with the launch of Tableau Prep Builder and now with our Preppin’ Data project — a weekly project designed to get people hands-on with data preparation. Each week a set of data is released along with a series of requirements and end-goal of how the data needs to be prepared. During the week, participants share and discuss their ways of tackling the problem and the following week, a possible solution is released. These projects explore various facets of data preparation and aim to not only provide realistic examples of data preparation problems, but also keep participants on the cutting edge of Tableau Prep Builder releases by highlighting new or interesting features.
Let’s dive into why self-service data prep is important for individual analysts and organizations, along with some further lessons learned from our time with Preppin’ Data.
The importance of self-service data prep
Data is everywhere but to make use of it as analysts we first need to gather, structure, and clean the data. Traditionally, this was formally done within databases by IT teams within organizations, but also informally within Excel workbooks. The formal “wrangling” of data to complete analyses involved writing SQL code within the database itself. Most business users and business-side analysts simply didn’t know how to write SQL. Those that could, simply didn’t have the access or permission within the database to write the queries.
For the informal data files, wrangling the data involved a lot of copying and pasting or deleting rows and columns. The informal wrangling burned hours of people’s time on an uninspiring task that would just have to be repeated again the next day, week, or month. Data was hard to source and hard to load into the tools that business users and analysts were already using to answer questions. This challenge is not limited to business data. It was also compounded by data sources now living on the internet, behind APIs, on PDFs, and more. This is why self-service data prep is so important.
How to get started with data prep
1. Understand the basics of data prep
There are a few main tenets of data preparation. These include cleaning the data, restructuring the data, combining data together, and validating the data. These tasks are achieved through various means such as removing unnecessary data, using formulas and calculations to get new data or modify existing data, and merging different data sets together through joins and unions.
2. Understand how self-service data prep can shape and improve your analysis
Self-service data preparation helps prevent you from being restricted to the data that you’ve been provided. The ability to restructure and combine data to suit your specific needs can reveal insights that were hidden or difficult to discover. It can speed up ongoing analysis by pre-formatting your data so less time is wasted on aggregating, calculating, and comprehending. It allows you to be more confident in your analysis by ensuring the validity of the data yourself and providing a deeper understanding of the values in your data and how these values link together.
A great starting point for understanding the basics of data prep and how it can help your analysis, as well as how Tableau Prep builder fits into these processes, is this short whitepaper on data preparation best practices.
3. Practice with public data sets: The Preppin’ Data project
So how do you start learning and applying self-service data prep? That’s what we sought out to solve with the Preppin’ Data project as we were conscious that not everyone has had exposure to this type of work. Preppin’ Data is designed to empower the new ‘self-service data prepper’ to gain experience dealing with the most common challenges that they are likely to encounter in their day-to-day work. Let’s face it, data work is all about accuracy. If you make mistakes, your analysis (and future analysis) will likely be ignored, rendering the work useless. Learning how to prepare data sets correctly becomes a fundamental building block for great data analysis. But data prep is still a skill that needs to be practiced. This is why we have found that Preppin’ Data has lots of experienced users joining in alongside the ‘newbies’ too. Everyone needs to practice this skill.
What we’ve learned from the Preppin’ Data project
Preppin’ Data wasn’t just for the people participating in the challenges. It was also for the ‘Dr Preppers’ (Jonathan and Carl) as the initiative started with a conversation over a coffee machine at the Data School where it quickly became apparent that others would benefit too. Jonathan, a new Data School consultant, questioned Carl, the ‘Other Head Coach’ at the Data School UK, as to the best way to start getting more hands-on practice with Tableau Prep Builder. Simply put, apart from a few great blog posts, there were limited opportunities to regularly practice using Prep Builder or work across a wide range of different industries’ challenges. Preppin’ Data is a weekly challenge that is posted on a Wednesday and the solution post shared on the following Tuesday. The challenges have taught both Jonathan and Carl more than they originally thought (by a long way).
Participants submit their thoughts, questions, and solutions (solely pictures of flows for the time being) on Twitter and via the Preppin’ Data site. This has caused a lot of unintended benefits:
- Learning how to document a Tableau Prep flow in the best way:
- Use the step renaming and descriptions.
- Use the ability to change the colour of the steps to make the data steps feel like they are actually mixing (obviously a blue and yellow data flow when joined makes green).
- Bug fixes:
- By using the product more, we have found that Preppin’ Data has uncovered a few unknown bugs. This is great as the Tableau team are quick to make the amendments to fix the identified issues.
- Building a Tableau Prep community:
- As people discuss their work with Tableau Prep more and more, solutions are shared and best practices are formed.
- Some people complete Preppin’ Data challenges in other tools such as R, Python, SQL, and Alteryx as people explore what is the best tool for the job. This conversation helps us learn how to teach Tableau Prep Builder to people who currently use other tools.
Tableau Prep Builder has lowered the barrier to entry for a lot of people who have never had the chance to self-service their data prep needs. The common aesthetic, calculation syntax, and user focus has enabled a lot more people to get hands-on with data prep like never before. Preppin’ Data will continue (as long as it keeps being useful) to give more examples to allow people to make a start with preparing data and work with Prep Builder in a safe space.
Submit images of your flow on Twitter and discuss your techniques with others by using the #PreppinData hashtag. There is never one correct way and this has been a really important lesson for the 100+ participants we have had to date. The feedback between participants has started to form a Prep Builder community that is still in its fledgling state. We want you all to come and take part in it to help it flourish and create even more learning opportunities.