Note: This piece was co-authored by Tim Chartier, associate professor of mathematics and computer science at Davidson College. A version of this post appeared on The Huffington Post.
Cincinnati will host Major League Baseball’s All-Star Game this week. The midsummer classic has garnered attention this year for the players selected to the game—and those who perhaps should have been but were not.
The players are chosen by one of three different methods. Fans select the starting lineup of each league. Peers pick the eight pitchers and eight positional backups for each team. And managers fill out the remaining roster spots to create teams of 34.
Who's In & Who's Out This Year
This year the Kansas City Royals, coming off of their first World Series appearance since 1985 and boasting one of the most galvanized fan bases in the game, managed to vote in four starters to the All-Star Game through the fan balloting process.
This is in stark contrast to previous years when larger-market teams with larger fan bases have dominated the fan vote of the All-Star Game. In fact, the game in Cincinnati will represent the first time in baseball history that no player from either the New York Yankees or Boston Red Sox will be in the starting lineup. Notably also left off the roster is Alex Rodriguez, baseball’s firebrand, who has been enjoying an incredible season at 40 years old following his year-long suspension in 2014.
Fortunately for the fans out there like myself, baseball is a sport inundated with statistics that simplify comparison of player performance. So, by combining baseball stats with some data analysis, we can critically assess the numbers and ask the question: Did the players who most deserve to be called “all-stars” crack the roster?
The first step in probing a question like this is to find a data set that contains the relevant baseball statistics. I found a suitable one on Fangraphs.com, and exported the spreadsheet from the 2015 Batting Leaders table. I also downloaded Fangraph’s 2015 Steamer Hitters Rest of Season Projections so that I can see who is projected to perform well in the season’s second half.
How Each Player Got Selected & Where They Stand
The visualization below shows a scatter plot of Major League hitters. Each dot represents a hitter. The color indicates whether the player was elected to the All-Star Game roster, and if so, how. The user can control which statistics to display on the axes by selecting from the menu on the right-hand side of the graph.
The options are, for the Y axis:
• OPS: On-base + slugging percentage, measuring a hitter’s power and plate discipline.
• HR: Home runs from the first half of the 2015 season.
• Proj. OPS and Proj. HRS: Projections for the season’s upcoming second half, courtesy of Steamer Projections.
And for the X axis:
• WAR: Wins above replacement, a catch-all metric for determining the value of a player’s total contributions to the team's win.
• BA: Batting average from the season’s first half.
• Proj. WAR and Proj. BA: Corresponding projections for the season’s second half.
Analyzing the scatter plot lets you see how some selected players fall far below the 90th percentile line for the displayed metrics. How do Alcides Escobar, Matt Holliday, and Salvador Perez beat out players like Alex Rodrigues, Joey Votto, and Adam Lind?
The Stats of Those Who Made the Roster & Those Who Didn't
This next data visualization shows a different take on the same data. Here we see the distribution of all the players in the data set for whichever of the four metrics is selected in the right-hand menu.
This histogram chart is perhaps even easier to digest than the scatter plot. Selecting "1st half OPS" as our metric shows clearly that certain players farther to the left in the columns who made the All-Star Game (colored in brown, blue, and green) did not outperform others who had higher OPS metrics but were not selected (colored in yellow). At least the top five columns all made it into the game, but poor Anthony Rizzo, whose 0.954 OPS ranks fourth in the Major Leagues, was snubbed by fans and had to rely on his peers to elect him to the All-Star roster.
The Numbers by Player and by Team
One of the most powerful parts of Tableau is the ability to combine single visualizations like the two I have shown above together on a dashboard and create interactivity between and amongst them. By so doing, I can create a true exploratory environment for fans such as myself to probe the simple question posed earlier by easily accessing and manipulating the underlying statistics of those players.
The dashboard below features the scatter plot and histogram from above plus another chart, a box and whisker plot, showing the median and distributions of the four categories of elected and non-elected players in the data set. It also has a filter based on team, so the user can click on the logo of a favorite team and see where its players fall in the three charts below.
The viz features the full stat line of all the players visualized in the data set. Just hover over a dot on any of the three charts and see the stat line for that player populate in the middle of the dashboard.
I hope that by exploring this dashboard, fans can ask and answer questions like: Are the Royals players worthy of their fan-voted All-Star spots? Should A-Rod have been an All-Star? Do the managers or the players pick better All-Stars? Should fans vote in more—or fewer—players?
You can also find data, create your own visualization, and become your own sports analyst. Let the numbers tell the tale with some Major League analysis!