For many years, data users, whether they knew it or not, have struggled with database cardinality issues. I am excited to share a simple solution to this challenge.

Those of you who've tried your hand at table joins within or outside of Tableau likely found that some joining scenarios leave you with incorrect sums. This occurs because the numbers in one of the joined tables, such as quota or budget, have been duplicated in the resulting table. After the join, when you take the sum of quota or budget, the number appears to be way bigger than expected.

If the data set is of a manageable size, you may notice the incorrect numbers immediately. But in many cases, we may overlook the problem and unknowingly work with incorrect numbers. This is a topic of cardinality, specifically a one-to-many relationship between tables.

To better understand the problem and how it may apply to you, let’s take a look at an example of quota attainment.

## The Problem

In this scenario, each salesperson makes many sales and has a single quota to reach. Sales numbers are stored in one table and the quota is stored in a second table. We can link these tables in a join via the salesperson’s name to find out who has reached quota.

Notice that the data from the quota table will be replicated three times for Bronson when it is joined to the sales data. This is where our problem begins. Bronson’s quota of \$20 is now repeated three times. And when we take the sum of quota for Bronson, it will show \$20 x 3 = \$60, which is clearly incorrect.

## The Solution + Two Options to Avoid

So the challenge is clear, but how do we solve for it? You might consider data blending, or taking the minimum quota figure instead of the sum. Both of these options have caveats, however. Data blending can perform poorly on larger data sets. And taking the minimum quota figure relies on having the salesperson dimension in the view.

This brings us to our solution: using a Level of Detail Expression.

This approach is scalable, dynamic, and reusable. Since the quota is always replicated across the salesperson dimension, we can fix the level of detail to the salesperson and always take the minimum quota:

This calculation will give the correct value, whether we are looking at the view by salesperson or by total quota.

While this is one example, the application of this simple LOD Expression is vast. At Tableau, we use this across many of our data sources not only to improve performance, but also to ensure that others don’t fall victim to using incorrect numbers in their own analysis. I recommend changing the name of the duplicated metric to “incorrect value” and naming the LOD version “correct value” to ensure people use the right one.

## コメント

This is great! I recently made a tutorial video about Complex Data Joins in Tableau, and this issue was the last thing I went over in the video. I wasn't sure what words to use to talk about the issue though, but now I do – I may make a part two tutorial video to go over this issue more in-depth, because I imagine a lot of people don't realize this could be happening with their data.

Very Good. But in practice, the scenario is more complex. In a project management, there will be work break down structure (WBS) hierarchy and every WBS will have budget and actual costs (also commitment costs) in multiple line items in separate table.

This is similar to your example but budgets are having hierarchy. It will be great if you show some example to handle this scenario also.

I encountered one more issue. My quota is having hierarchy and my actuals need to aggregate hierarchically to display my Quota Vs Actual graph. But actuals are not aggregated on hierarchical basis. I used LOD for budget and so showing right numbers for budgets (because quotas are not bottom up and adding them will not match higher level quota and also at the root)

imagine this as salespersons in the department are having hierarchy where in the boss owns the total quota and part of it is distributed to his team managers. These team managers, keeping part of their quota to themselves and distributed remaining to their team members. This hierarchy can be n levels.

Now if we can show a drill down chart with quota vs actuals, that will be excellent and very useful. This is my actual requirement in showing the project budgets vs actuals. There are few more issues related to this which i will share later.

Hi Satya,

I am not sure if I completely understand the use case. It sounds like you have a high level quota for a department or region that needs to be divided among managers and sales reps. Is it distributed evenly? Is there a table that describes what percentage of that quota should go to whom? It may help to see a concrete example of what you are trying to accomplish.

Thanks Bronson. There is a table that describes what quantity of quota should go to whom. But this quota is not distributed evenly (i.e bottom up will not match the top level quota) because part of quota is kept by the person for self and the remaining is distributed further.

With your solution, the quota for every sales person is coming correctly! The actuals against each sales person is also coming correctly.

The issue is, i need one more view where for each sales person, i want to show the actuals that includes self value (which is coming correctly) and team value.

For example, manager quota is 100 and distributed 30 to sales person A and 50 to sales person B and kept himself 20. The actuals are 10 for manager, 25 for sales person A an 35 for Sales person B. For manager, the report shows quota as 100 and actual as 10. But it should show actual of 70 (10 + 25 + 35). Then I can know how the manager is performing as team. the actuals are coming from multiple line items (for example, for manager the actual of 10 is coming from 3 line items with 2, 5 and 3.

So the quota should not be aggregated (your formula is working excellent here) but actuals should be aggregated hierarchically (with your formula actuals are also coming without aggregation to next higer level (at sales person level it is aggregating).

I hope now i am clear about my requirement. Let me know if there is any solution.

I am sure there is solution to your problem, but it is really difficult to solve without knowing the data structure. I would suggest posting it on our community forums with a sample data set and an image/description of the expected results you are looking for: http://community.tableau.com/community/forums

My guess is that you would not want to use an LOD Expression on the actuals because these values are probably not getting duplicated. In my example I would not write and LOD Expression for the sales values because each row is an actual sale and summing them all up already gives the correct amount.

Bronson: This is a great solution for this problem. Before L.o.D. calcs, I solved the problem in your example by setting the default aggregation to Avg(Quota) or Min(Quota). It worked as long as I had the appropriate dimension (in this example Salesperson) from the table on the right side of the join (in your example, the quota table).

My sol'n pretty much limited me to always having the Salesperson dimension in a worksheet. So, we can better understand how L.o.D. calcs work, can you provide a little more info about how and why my sol'n is limited, and how LoD calcs overcome that limit?

I am unfamiliar with what sol'n is, but I can provide some context as to why LOD Expressions work here. The fundamental issue is that joining the tables duplicates our quota. Taking the sum of quota from this table is not a viable option when it is duplicated. So, what we would really like to do is take the sum of quota from the original table (without duplication) while still maintaining the join. An LOD Expression recreates the data for quota at the original granularity to remove duplication and then brings that result to the view. Let’s break down the Expression: { FIXED [Salesperson] : MIN( [Quota] ) }

Fixed [Salesperson] tells tableau to regroup the data behind the scenes into a table that has one row for each salesperson. If you look at the SQL for this is something like this:

GROUP BY Salesperson

MIN( [Quota] ) looks at all the rows of data for quota and takes the minimum value in those rows for each salesperson. Since the quota is just duplicated for all the rows for each salesperson the minimum will give us the expected result. You could also use MAX or AVG here and get the same results.

Now we have a new table with one row for each sales person with no duplication of quota. Whenever we drag that LOD Expression to the view it queries this table behind the scenes to give the correct amount.

Hi Bronson,

can this solution also be for a case of many-to-many relation? we have one revenue table which is in transaction granularity (many rows for one webiste) and cost table which is on daily granularity (adwords data, many rows for one day for one webiste). the relationship is many to many and we use blending to show cost, revenue and profit per website, This make the dashbord quite slow. I have also experienced slow outcome with FIXED (EXCLUDE and INCLUDE work well).

Do you think we can use it in this case? Doesn't FIXED also disregard filters like date and website?

we use amazon redshift as DWh.

Thanks

Hi Nir

This can absolutely be used to over come a many-to-many relationship. Same concept as above - find the dimension or combination of dimensions that are causing the duplication and use an LOD to take the minimum value at that level. You can use FIXED or INCLUDE in these scenarios. FIXED happens before filters while INCLUDE/EXCLUDE happen after filtering. Certainly worth keeping in mind as you move forward with one or the other.

Most carnality issues I have seen can use FIXED or INCLUDE interchangeably.

Hi Bronson,

We have used join & LOD instead of data blending and worked great. Performance had improved and we are much happier with the end result. Thank you for that.
I would like to add that I would have preferred to use blending as it is more intuitive but with blending the queries cannot run in parallel, so this maybe a point that should be handled. I understand this goes to the very foundation of blending but it's worth mentioning.

Thank you

Nir

Hi, Its a nice post. However, converting sum(quota) to attribute, attr(quota) will give the correct result.
Please correct me if am missing something and if attr approach is wrong.

Thank you.