2008/07/30
One of the blogs I read regularly is Flowing Data, which discusses effective visualization techniques for making sense of data.  A recurring topic is a challenge to the readers: can you improve this graph?

The most recent challenge at Flowing Data is a graph that attempts to demonstrate a correlation between suicide rates and unemployment levels in Japan. Nathan identifies some areas for improvement and links to the source data, which I've used to build a Tableau visualization. You can see my results in the attached image.

The first step I took was to transpose the row/column orientation of the Excel file, and then connect to it with Tableau. Both the "Unemployment Rate" and "Suicide Rate" have missing data points, which were fairly straightforward to resolve. In the former case, I converted "Unemployment Rate" to a numeric Measure instead of a textual Dimension, and then filtered the data to start at the year 1980. I created a simple line graph to show the unemployment rate against time, and used "Suicide Rate" to control the width of the line. To fill in the missing data points, I used a Table Calculation in Tableau to make a moving window for the suicide rate, averaging up to two data points within +/- 4 years.

I've attached a Tableau 4.0 Packaged Workbook for Beta users to explore. One week from today we release Tableau 4.0, and you will be able to download the free trial if you're interested in exploring Tableau Desktop!

Very nice Robert. I decided to break with convention and use a bar with a redundant encoding on the color and size of the bar.

This draws a strong visual correlation between the increase in unemployment and suicide rates.

Looks a bit like the chart of Napoleon's campaign into Russia (and back), which is in one of Edward Tuftes books: see here

Ian W.

"first understand the data"

Exactly. Understand where it comes from, try to understand what it means, and try to understand the relationships by first showing (in your own private workbook) all relationships, and singling out the meaningful ones.

Austin, what does that line represent? Is it is least-squares regression line, or (better) an orthogonal regression line?

Hi Hadley, I took a look at Austin's workbook - it's a normal least-squares regression. Perhaps our other readers already know, but could you explain to me what an orthogonal regression is and how it would be a better approach for this data? Thanks for your feedback!

Although the line and bar charts are attractive and make a powerful case, I think the scatter plot is a more honest depiction of the data. The scatter plot is less sexy, but it makes it very clear that the apparent correlation is based only on six data points.

That doesn't mean there isn't a trend, but just that the visualization shouldn't hide the small sample size from the reader.

One of the risks of using powerful visualization tools is that it is possible to downplay some aspects when trying to emphasize others. The real problem here is that the suicide rate was reported only every fifth year, and the data was filled in by linear interpolation for the intervening years.

If the sample size was larger, I'd say that all three of the visualizations are effective -- and I'd make all of them available. Its always good to look at the data different ways to catch just this sort of thing.

It looks like suicide rate correlates more closely to unemployment rate than to long-term unemployment rate.

This is an awsome layout. Very well put together. I was wondering if you had a graph of Suicide vs. Unemployment in Usa?

Hi Jennifer,
I don't have any such data on hand, but that would be an interesting cultural contrast with the analysis here.