It is easy to assume that breeders continue to produce faster and faster horses each year. The truth is that Kentucky Derby winners have been slowing down for some time. As you can see, since the 1960’s the winning run has trended (ever so slightly) slower. Of course, the differences in speed we are talking about are far to minute to be perceived, but there is no doubt that the fastest decade on average was the 1960's.

Is this because of continuous inbreeding among thuroughbreds? Perhaps just a fluke of nature that will soon right itself? After all, if you drag the slider back to inception you can see that the trend has been undeniably upwards since the 1870's.



Come on Tableau, you can do better than this! The data is inherently noisy, and it's a really thin sample set (winners only), thus very hard to draw any conclusions from. But your slowing conclusion isn't justified by your data and your overly simplistic graphical analysis. If you look at the variance a bit closer, you can see that speeds are more variable to the slow side than to the fast side: a logical situation which is heavily influencing your conclusion. And while a simplistic analysis would say that the speed is the same or slower over the last fifty years, a more careful and informed analysis would show small but significant gains over the last decade or so. Sticking just with Tableau's graphical analysis, a more interesting aspect to highlight would be to ask why the performance variance was so low from the mid-'70s to the late-'80s.....

Coloring the data points by track condition and signifying Triple Crown Winners with a different shape or size would probably shed light on the biggest variable to performance outside of the horses abilities....that's weather.

I think Frank said it all above. What's the standard deviation. Are the results statistically valid. Beyond statistically valid, are the results meaningful. I love info graphics, but I love unbiased information more.

My criticism is this: We all know statistics can be misleading. We also know that small changes in the look of graphical information can bias the interpretation of unsophisticated information consumer. In the above case the range of the vertical axis with respect to the range of the data points is enormous when compared to the range of the horizontal axis when compared its data points. Visually it compresses the volatility; makes it look like a clear straight line with a slight downward slope. If you increase the range of the x axis to match the ratio in the y axis, the regression line would be drawn through a little cloud of data and it would start to look ridiculous.

Okay, so it can be difficult to tease out the bias without a wealth of information. Keeping the ratio of the range of data to axis consistent might help in some cases. However, the very least we can do is accompany info graphics with standard deviations and T/F tests for the formula and variables.

I understand it's the user's bias, an not Tableau's, but if we're going to have stunning and convincing info graphics, then perhaps it's even more important to have stunning transparency.

If you hover over the trend line, you can see the p-value for the model.
Since the p-value is higher than any standard level of confidence, the model is not significant. If we had the p-values for each coefficient, we would see that the intercept is significant but the slope is not significant. Thus, we cannot reject the null of a zero slope. So, during this period derby times were trend stationary.