In his book Beautiful Evidence, Edward Tufte talks about small X-Y graphs that he calls sparklines. Sparklines are very good at conveying time-varying data in a small area. Tufte's examples include things like patient glucose levels but you can probably think of your own uses.
One day last spring, I started thinking about how I could plot some of the time-varying climate data I play around with on our new maps. With a bit of effort (and some help from Jock Mackinlay) I managed to get it to work. Since then I've spent a little time perfecting the technique and now I'm ready to share it with the world.
In order to plot a sparkline, you need to know:
- what goes on the X axis
- what goes on the Y axis
- where each sparkline will go (the level of detail)
In my example, I will be plotting surface temperature anomalies from NASA's GISS project. Each record consists of a date (at month resolution), a latitude/longitude pair on a fixed grid, and the surface temperature expressed as a deviation from some baseline. In order to even out seasonal variation, we will average the anomalies for each year. So that means:
- YEAR([date]) goes on the X axis
- AVG([anomaly]) goes on the Y axis
- each sparkline will go at each [lat]/[long] grid location
The key to putting sparklines is to create two calculations that "jitter" the underlying latitude and longitude fields. These calculations will then go on the X (i.e. Columns) and Y (i.e. Rows) shelves, while the location of each sparkline (i.e. latitude and longitude) goes onto the Level of Detail shelf. The sparkline itself is a Line mark with the X axis value on the Path shelf. I also find it helpful to put the Y axis value on the Color shelf to provide a kind of labelling for the axis, but this is not strictly necessary. You can also put a third variable on the Size shelf if the sparkline is not too dense, but I have not done that in this example because there are no unused variables.
I like to create these jittered calculations in two stages. The first stage is to make two calculations called [Sparkline X] and [Sparkline Y] that compute the X and Y values for the sparkline and normalise them both to the range [-1,1]. This makes it easier to centre them on geographical points and scale them to the grid. These calculations should be aggregate calculations so that Tableau will not try to group by them, which would break the line up into a set of discrete points. In my example, the two calculations are:
- [Sparkline X] = (MAX(YEAR([date])) - 1980) / 30
- [Sparkline Y] = AVG([anomaly]) / 2
MAX (or MIN) is a convenient way to make a dimension into an aggregate without changing its value. Notice that each calculation is of the form
- (AGG - midpoint) / (half of the range)
[anomaly] varies from about -2 to +2 in my data set so the midpoint is actually 0, and I am only going to plot YEAR([date]) from about 1950 onwards, which gives a range of about 60 years centred at 1980.
The second stage is to make the actual jittered calculations using two calculations from the first stage:
- [Longitude X] = MAX([long]) + [Sparkline X]
- [Latitude Y] = MAX([lat]) + [Sparkline Y] * .75
The .75 is used to scale the Y axis to the grid used by the data. The X axis seems fine with a scaling of 1 for my grid. Again, notice that these are aggregate calculations and I have used the MAX trick on the [lat] and [long] fields. For this type of data visualisation, you should make sure that [lat] and [long] are Dimensions, not measures. Also, watch out for axis reversal - we tend to say "X and Y" and "Latitude and Longitude", but Latitude should have the Y values, not the X values! One final note: If you create the Latitude/Longitude calculations by right-clicking on your [lat]/[long] fields when they are already marked with the correct geographical role, Tableau will create the new calculation with that role already assigned.
(Incidentally, splitting up the calculations this way is an example of what we Software Engineers call "separation of concerns": the Sparkline calculations are "concerned" with the properties of the X/Y values and the Latitude calculations are "concerned" with the properties of the grid. Then if you need to adjust something, you know exactly where to go to fix it.)
Now that we have everything ready, we just need to build the visualisation! (It may help to turn off automatic updates until the visualisation is ready.) Double click on the [Latitude X] and [Longitude Y] calculations to get them onto the Rows and Columns shelves. This gives us our map background. (Make sure that the appropriate Background Map is selected in the Data menu.) Put [lat] and [long] onto the Level of Detail shelf so that the lines are split up onto the grid. Choose the Line mark on the Marks card and put the X axis value onto the path (YEAR([date]) in my example). Now turn on automatic updates and you should have a beautiful set of sparklines on your map!
As I mentioned above, you can add visual cues for the Y axis by putting the Y value onto the Colour shelf. In my example, this is AVG([anomaly]) and I edited the colour encoding to use 5 steps to make it easier to read. I also changed the default colour ramp to be a blue-red diverging because we tend to associate blue with cold and red with hot and the range is symmetric (like the colours).
And here is the final image:
I have attached a packaged workbook containing the example data and calculations for you to play with. The original data set is quite large, so I have created an extract containing only the data in a single worksheet. Let me know how it tastes!
There is something interesting lurking in this example besides how to put sparklines on maps. Have a look at the sparklines on the left of the map and compare them to the ones on the right. Notice the costal marks show a much less pronounced upswing in recent decades. This is because the ocean is a much better heat sink than the land (a fact that we here in Seattle are very thankful for in late July!) It is also a nice graphical illustration of why the term "global warming" is being replaced by "climate change" in public discourse: The warming is not uniform and is often affected by the local topography. Looking at the full data set (which is sadly too large to include with this posting) there are a number of other patterns that jump out when one has the high data density afforded by sparklines. I'll follow up if I find anything unexpected.