Improving our Baseball Elo Model with Better Pitcher Rankings

In a recent article we posted, we talked about how to build a baseball Elo model and the ins and out of how that worked. Because of the way baseball works, the best way to build a baseball Elo model is to assign a rating to both the starting pitchers and to the rest of the team. This lets us accurately measure the overall team quality while still capturing the game-to-game variance caused by changing starting pitchers.

One of our main conclusions was that the model, while very accurate, struggled to accurately assign ratings to the starting pitchers. We hypothesized that there simply wasn’t enough data to capture the pitcher quality accurately. Moreover, pitcher performance can be highly variable from season to season.

As a result, we wondered whether an improved methodology for assigning ratings to the various starting pitchers in the league would lead to a Baseball Elo model with improved accuracy. In this article, we look at exactly this. We changed the method for computing the starting pitcher’s Elo rating and saw roughly a 2% increase in correctly predicted games.

Even more impressive, our model now predicts winners more accurately than the Vegas opening and closing lines.

In this article, we’re going to deep dive the changes we’ve made and why this is going to be stamped as the official data jocks’ baseball Elo model going forward.

A Quick Refresher on the Baseball Elo Model

To start, let’s make sure we’re all on the same page with how Elo works and, in particular, how our two-pronged baseball Elo model works. Elo is perhaps best known as the way that Chess ratings are assigned. In a sport or game with an Elo system, every competitor gets assigned a rating. Higher ratings correspond to better players. The ratings are reflective of someone’s overall skill.

In a matchup, the player with the higher Elo should win most of the time. When the difference between Elo ratings becomes larger, the probability of the higher rated competitor winning gets larger too. Elo is normalized so that a 400 point difference in ratings between two competitors is approximately equal to a 91% chance of winning.

Elo ratings are determined via a “point stealing” algorithm. When you beat someone, you steal a few of their points. The amount of points stolen is a function of how much of an underdog you were. If you were heavily favored, you might only steal one or two points. If you were a huge underdog, you might steal something like 20-30 points. In a close matchup, you’ll probably steal about 10 points or so.

Our Specific Baseball Elo Model

Our baseball Elo model is quite a bit different from traditional Elo models. There are two main differences. First, we assign Elo ratings to both pitchers and teams instead of just overall ratings to a team. Second, our Elo ratings are not determined by a point stealing algorithm but instead by looking at betting data.

Because baseball winning probabilities depends so much on the starting pitchers in a matchup, it is key to include a component in our model which respects this fact. Therefore, each matchup starts with a base rating between the two teams. Then, each team’s rating is adjusted up or down by the starting pitcher’s Elo rating.

Second, instead of running a pure Elo system where the winning team and pitcher steals points from the losing team and pitcher, we do something a bit different. In particular, we use betting data to inform how our Elo ratings work. If one team is heavily favored in the betting markets, that means that their Elo rating should be much higher than their opponents.

In fact, we rely only on betting data instead of on the actual outcome of the game! This approach is very different but has merits. First of all, it is hard to argue with the accuracy. Second, betting markets can be seen as either efficient markets or as ensemble learners, both are highly accurate ways of predicting outcomes. Moreover, we’ve shown before that augmenting the final scores of games can lead to improved model accuracy. This is just taking that idea to the next level by …totally discounting the outcome of a game and looking only at the betting market.

In the next section we look at how this model improves the overall accuracy of our model. At the end of this article, we describe the mathematics that we use to build our improved baseball Elo model.

Results

The results in this article should be compared to those in our previous article. For reference, they are posted below.

Model

Home Team

Pythagorean

Implied Elo (Old Model)

Vegas

Accuracy

53%

55%

58.4%

59.0%

In our new model, we used an improved method to measure the Elo rating of pitchers. The results are summarized below.

Model

Implied Elo (Old)

Implied Elo (New)

Vegas

Accuracy

58.4%

60.2%

59.0%

Our new and improved Implied Elo model actually outperforms Vegas and consistently picks games approximately 1% more accurately than the Vegas opening line. This is not quite enough to profit, but is notable in that it outperforms Vegas lines.

Pitcher Elo Ratings via Win Probability Added

In the previous iteration of our baseball Elo model, we used the betting data to infer both team ratings and starting pitcher ratings. What we found was that we didn’t accurately rate the best pitchers. To counteract this effect, we wanted to use a different method. Luckily, the world of baseball statistics is absolutely oversaturated with advanced stats. We can use some of these advanced stats to jump start the accuracy of our baseball Elo model.

At the end of the day Elo models are all about estimating the probability that one team beats another. Assigning an Elo rating to a pitcher basically says “we think this guy is worth +/- XYZ%” winning probability. Luckily for pitchers, there are many stats that can help us do this. Let’s describe two.

First, there is a stat called cumulative win probability added. This stat counts how much a player has contributed to their team’s probability of winning. For example, hitting a walkoff home run in a game that is tied in the bottom of the ninth would add around 50% (because the game went from a tossup to a guaranteed win).

Taking cumulative win probability added and dividing by the number of starts for a pitcher, we can estimate how much win probability a pitcher adds in games they start. Then, it is straightforward to convert this win probability added to an Elo score by just using the Elo formula.

The formula to convert win probability added (x) (per start) to Elo (y) is y= -400\cdot log_{10} (-1+1/x) .

An alternative way to do this calculation is to use other stats that let us try to estimate the same quantity: how much a pitcher’s contributions add to a team’s probability of winning. For example, one could use wins above replacement to measure a similar effect to cumulative win probability added.

At the end of the day, we need to pick some stat to use. Some will be swingier than others. Some will take too long to converge.