Subject Area: Calculations
Level of complexity: Intermediate
Approximate Time to complete: 60 mins
Companion Workbook (twbx): CohortCalcs.twbx
Additional Resources on the web:
- http://kb.tableau.com/articles/knowledgebase/cohort-analysis-tips
- http://drawingwithnumbers.artisart.org/tag/cohort-analysis/
Cohort calculations are a convenient way to capture a ‘filtered' measure to use for comparison analysis. An example of a cohort calc is "Table Sales".
The logic of this calculation is this: For each row, we check if the field Category is "Tables," if it is we return the value of the Sales field. If it is not, then we return 0. In effect, we are creating a new column which looks like this.
What is the benefit of this technique? Here is a common example: You want to do a comparison of Sales in 2012 and 2013. The first step is to create a cohort calc for each year. Here is 2012:
And here is 2013:
Now you can create a year-over-year growth calculation which simply subtracts 2012 Sales from 2013 Sales.
(The above calculation can also benefit from being changed to an aggregate calc, e.g. SUM([2013 Sales]) - SUM([2012 Sales])
Putting these in table gives us the YoY comparison.
Why would one prefer this to simply using a YoY Table Calculation? Because our cohort calcs are row-level, they are more flexible than Table Calculations which are at an aggregated level and more importantly depend on the structure of the table. Here is an example of the Table Calculation approach.
As you can see, there is no way to not display the blank cell for 2012 "Difference in Sales". That highlights the inflexibility of building tables and visualizations when using Table Calcs. Cohort calcs on the other hand give us ‘pills' for each cohort which give us flexibility in creating our views. Another example of this flexibility is creating a YoY map. This is possible with Table Calculations but tricky. With our cohort calcs built, we simply need to create a map with the YoY Sales as the measure.
What are the downsides of cohort calcs? For one, there is a bit of work required to create a cohort calc. If you wanted to view percent of total for a dimension that had dozens of values, for example, you would need a cohort calc for each of those values. In that case, you might prefer to use Table Calculations. Also, creating a visualization that uses a lot of complex calculations can make the queries that Tableau sends to your database more complex. At some point, you might reach a point that many complex calculations are causing an acceptable performance degradation.