My summer job made me angry, so I looked to the data
Note: The following is a guest post by Malte Witt, a student of Tourism Management at the Munich University of Applied Sciences (MUAS). This piece kicks off our back-to-school series.
Okay, I'll admit, the headline might be a little bit too much. But really, the inspiration for this post came from my workplace.
I spent my summer days helping people book their holidays. More specifically, I work for tripodo.de, which is a distribution platform for high-quality, high-cost holidays.
A few of those weeks were really annoying because I had to let many customers know that their dream trips were already sold out. I had a feeling the time span between the initial inquiry and the desired departure date was getting smaller every day. Naturally, I needed to validate my feelings with hard facts.
Luckily, my boss allowed me to extract some data from our database as long as it was anonymized.
Why do you fail me, Excel?
I first tried visualizing the data in Excel, which soon turned out to be a huge pain. It seems as though Excel is amazing for cleaning and shaping the data (at least amazing enough for my purposes), but not that great at visualizing the data.
Good thing I am a student. Enter: the free Tableau license for students. I applied for the free license, got a response on the same day, and am now a happy customer for life. Well done, Tableau.
With Tableau, it was an absolute breeze to visually explore the data and then create the graphs you see below. From now on, my workflow will probably always be: importing, cleaning, and shaping the data in Excel, then visualizing and exploring the data in Tableau.
Let's validate those feelings
With Tableau, it was really easy to aggregate the average number of days between inquiry and departure date for each month over a couple of years. A small side note: I hate the fact that bar graphs are so good at visualizing this data set. Bar graphs are boring, but effective.
I could end the post here. The graph clearly shows that inquiries made between April and July are on a much shorter notice than inquiries made during the winter months.
Why are the numbers still so high? I did not bother to calculate standard deviation, but of course the range of values is very high. My guess is that for July, a very simplified data set would look something like n = 4,4,4,4,160,160,150.
I was hooked and wanted to know more. How long does it take our customers on average to make the decision to book?
This validates my theory even further. In April, customers are surprised, like: "Whaaat? Easter holidays again?!" and need to book a cool trip on very short notice. During August, customers realize the weather in Germany is terrible and they become very booking-happy. I can't really explain the data in February, though. If anyone has an idea, I'd love to hear it.
One more metric that strengthens my theory is the average number of days between booking and departure.
I was so into my analysis by this point that I didn't really think about whether I needed any more graphs; my mind just demanded it. So I plotted the average-trip duration by month as well as the average cost. (Unfortunately, the costs come without labels, because, you know, business secrets.)
One major thing I’m taking away from this data analysis is my new workflow with Tableau. I now have a tool at my disposal which allows me to visually understand my data. Before, I had to work through the numbers in Excel and then choose the right graph. Now I can just drag and drop, and switch things around at my leisure without worrying about anything else but the view.
I hope you enjoyed my frustration-born journey into my employer’s data. As always, comments and constructive criticism are very welcome.
To learn more about Malte's data journey, check out his blog, The Sigma.