DataRobot visualizes machine learning results in Tableau to predict 2020 MLB performance

COVID-19 has kicked off a lot of discussion about the future of professional sports. After months of conversations between Major League Baseball (MLB) and the Major League Baseball Players Association (MLBPA), they have enacted a 60-game schedule and, tentatively, a plan for playoffs.

Before the 2020 season started, we (the DataRobot team) predicted how the MLB season would unfold for teams and for individual players by leveraging DataRobot’s leading artificial intelligence platform, surfacing the results in Tableau dashboards. Once a model is built in DataRobot, people can easily democratize the value of machine learning with actionable, intelligent dashboards in Tableau.

In our analysis, we looked at questions like: Who is projected to win each division in the abbreviated season? Who will make the wildcard? Who will be the top pitchers and batters? This article addresses these questions, reveals how we made our predictions, and looks at how these results would change if there were 16 playoff teams as originally planned.

Background into the machine learning technology behind D.R.I.V.E. MLB

DataRobot is the creator of AutoML—the automation of machine learning—where the platform tries different prediction models with datasets and finds predictions with a higher degree of accuracy quickly and easily. We created D.R.I.V.E. MLB using this same technology.

Across all industries, people have used traditional business intelligence to make baseball predictions for decades, such as PECOTA, STEAMER, and ZiPS. Building on this foundation, artificial intelligence takes things to the next level by trying to understand the complexities and subtleties of how information is connected, like the real-life human behavior of baseball players.

D.R.I.V.E. MLB predicts Wins Above Replacement (WAR), the most common metric for total player performance for every player in major league baseball. It also predicts ancillary stats like Weighted On-Base Average (wOBA), Weighted Runs Created (wRC), and Earned Run Average (ERA-). By using these player-level predictions, we can also predict team win-loss records, division standings, playoff brackets, and even MVP and Cy Young winners. Explore the results of these analyses by clicking on the Tableau dashboard image below.

What would a hypothetical 16-team playoff field look like?

At one point, MLB was considering a 16-team playoff scenario. In addition to the stats above, we were also curious how this would work. Using the same projection system above, we recut the results for the top eight teams in each league to fill out the field.

Hypothetical 16-Team Bracket:

  • National League
    • (1) Dodgers* vs. (8) Diamondbacks
    • (2) Braves* vs. (7) Phillies
    • (3) Reds* vs. (6) Mets
    • (4) Padres vs. (5) Nationals
  • American League
    • (1) Astros* vs. (8) Red Sox
    • (2) Yankees* vs. (7) Rangers
    • (3) Twins* vs. (6) Rays
    • (4) Indians vs. (5) Angels

*Division Winners

What’s important to remember about this projection, this season, and the variability of baseball is that even with eight teams making the playoffs, the differences in wins between the first seeds and the ninth seeds is only five wins and seven wins in the American and National Leagues respectively. In the reality of five-team playoffs in each league, the margin for error will be even smaller.

See our blog post for additional details on the methodology and how we took roughly 1,500 season-specific statistics for each player and added 2,000 additional variables for each player, leveraging DataRobot’s enterprise AI platform to make our predictions.

The results of our machine learning analysis

Tableau helps people get maximum value from DataRobot’s AI-based projections. Our D.R.I.V.E. MLB Tableau dashboards above show DataRobot’s projections for the 60-game 2020 MLB season, with final win-loss records, division standings, and player performance. Many of our customers deploy the predictions they get from DataRobot through Tableau since it makes for a useful combination of insights and interpretation.

Given these results, we predict the following about the 2020 season:

2020 Playoffs Matchups:

Playoff teams project to the following wins:

  • Astros (35), Yankees (35), Twins (34), Angels (33), Indians (33), Rays (33)
  • Dodgers (37), Braves (33), Reds (31), Nationals (32), Mets (32), Padres (32)

2020 Major Individual Awards:

“What If” There Were 16 Teams in the Playoffs?

What would that have looked like, given DataRobot’s forecasts? In addition to the 10-team playoff teams, the Rangers and Red Sox would have also made the playoffs in the American League. The Phillies and Diamondbacks from the National League.

Our conclusions

The Dodgers, Yankees, Astros, Mike Trout, Alex Bregman, and Mookie Betts are projected to be the top teams and players in the shortened season. Using machine learning, baseball can predict future performance based on past information. Similarly, any industry can predict future performance where chance, human behavior, and the complexities among various data sources are involved.

Subscribe to our blog