What are Pythagorean Wins and Pythagorean Expectation?
All sports analytics tries to predict the true quality of a team. One of the simplest models to do this is the technique called Pythagorean wins (or sometimes Pythagorean expectation). This model is pretty good and is based on very straightforward fundamentals.
In this article we look at Pythagorean wins and Pythagorean expectation from our perspective. Why is the stat defined the way it is and is it any good? How does the concept of Pythagorean wins differ from baseball to basketball? This and more below!
To receive email updates when new articles are posted, use the subscription form below!
Using Run Differential, not Record
Pythagorean wins is supposed to be an improvement on just looking at a team’s win-loss record. In baseball the scores tend to be so close that the one-run games can do a ton to impact your shot at making the playoffs. It isn’t uncommon for a team to compete in dozens of one-run games throughout the year.
If we consider games this close to be basically 50-50, then a team’s overall record is subject to a lot of coin flips. Now, in the long run (and 162 games might be long enough) things will even out. The good teams will win more of their one-run games than they’ll lose. But it isn’t crazy for a team’s record in one run games to dramatically shift their overall record one way or another. This can make it hard to tell how good a team actually is.
Pythagorean wins relies on run differential (AKA margin of victory) and not just on pure record. Take a look at the screen grab from the end of the 2022 season below.
All of the best teams have a significantly positive run differential or margin of victory (in the DIFF column). Good teams win by large margins and do so quite often. More importantly, the larger the run differential, the better we think a team is. Look now at the following graphic which shows team’s records against specific divisions in 2022.
The point of this graphic is that small sample sizes can lie extremely easily. Against the AL East, the Astros were basically a .500 team. Also against the AL east, the Guardians and the Mariners were below .500 teams.
Looking at a team’s record can be very misleading. The idea behind Pythagorean wins and Pythagorean expectation is that run differential is more robust and harder to fake.
The Pythagorean Wins Formula
The Pythagorean wins formula is named “Pythagorean” because it looks a lot like the Pythagorean Theorem you learned in high school geometry. The Pythagorean theorem relates the lengths of the sides of a right triangle. If z is the length of the longest side and the other two sides have lengths x and y, then the formula is z^2 = x^2+y^2 .
Back to our setting now. We want to use runs scored and runs allowed to estimate how good a team is. Let O, D denote a team’s runs scored on offense (O) and runs allowed on defense (D).
The Pythagorean Wins formula depends on Pythagorean expectation which estimates win % with the formula Win\% = \frac{O^2}{O^2+D^2} . Pythagorean expectation is an estimate of a team’s winning percentage.
To go from Pythagorean expectation to Pythagorean wins, you just need to multiply the expected winning percentage by the number of games played.
Notice how this formula kind of looks like the Pythagorean Theorem because everything is squared. The funny thing is, the similarities end there. In the next section we’ll “take this formula out for a spin” to see it in action.
Sanity Checking Pythagorean Expectation Formula
When I was still teaching mathematics, whenever I introduced a new formula I always always always encouraged my students to play around with the formula and see if it makes sense. Plug in numbers from very simple cases and see what you get. Can you explain the results? Usually this helps us understand why everything is where it is in the formula.
- What happens if our team has the best defense of all time and allows D=0 points? Intuitively, our team should win every single one of our games. In fact, it is impossible to lose! If we plug these numbers into the Pythagorean expectation formula we get Win\% = \frac{O^2}{O^2+0}=100\%
- On the flip side, what if our team has the worst offense of all time and never scores? This means that O=0 . Again, intuitively, our team should never win because we can’t score any runs. The Pythagorean wins formula comes out as Win\% = \frac{0}{0+D^2}=0\% . Again, the formula works exactly as expected.
- Finally, what if our team is as average as they come? Suppose we score exactly as many as we allow. Intuitively, we should win about as many as we lose. If we plug D=O into the Pythagorean wins formula we get Win\% = \frac{O^2}{O^2+D^2}=50\% . The formula works yet again.
Pythagorean Wins NBA
Recently analysts have also developed a Pythagorean wins NBA formula. The tricky thing is that you can’t just use the exact same formula as you do in baseball. The reason for this is that in the NBA, many more points are scored than in the MLB.
Margins of victory should be interpreted in baseball differently than in basketball. In baseball, it isn’t crazy or unusual for one team to score twice as many as the other team. A 6-3 or 8-4 game in baseball is normal.
In the NBA, though, doubling up a team is nearly unheard of. A 120-60 win is significant. This means that we need to adjust the formula to reflect the fact that margin of victory behaves differently in the two sports.
The Pythagorean wins NBA formula differs from the MLB formula by using a different exponent. The Pythagorean wins NBA formula is given by Win\% = \frac{O^{16.5}}{O^{16.5}+D^{16.5}}. The exponent changed from 2 to 16.5 in response to the different range of typical scores and outcomes in the NBA.
The same intuition works as before. A team that allows exactly as much as they score will win 50% of the games. However, what if a team scores 10% more than they allow?
In baseball, O=1.1D results in an expected pythagorean winning percentage of Win\% = \frac{1.21}{1+1.21}\approx 55% .
In basketball, though, a team that scores 10% more than they allow would have Win\% = \frac{1.1^{16.5}}{1+1.1^{16.5}}\approx 83\% of their games.
The 16.5 number isn’t anything special, it just happens to be the number that describes the data the best.
Actual Pythagorean Wins MLB Formula
The old school formula for Pythagorean expectation and Pythagorean wins in the MLB used exponents of 2. However, recently it was found that it was more accurate to not use that exponent. After all, there is no “intuitive” reason why we should square the runs allowed and the runs scored.
Squaring things just so happened to match the data pretty well. But studies were done to find an exponent for baseball that matched the data even better. In fact, the modern Pythagorean wins formula for baseball uses an exponent of 1.83.
Derivatives and What They Tell Us
There is useful information that can be extracted by using the idea of the derivatives (see here for a bit of a description on derivatives and critical points in sports). This section will be technical and can be skipped safely.
Let \rho = \frac{O}{D} denote the ratio of a team’s offensive to defensive production. For example, \rho =1.1 corresponds to the setting from last section where our team scored 10% more than we allowed.
The Pythagorean expectation formula simplifies in terms of \rho in the following way. We have O = \rho D. Then, Win\% = \frac{O^2}{O^2+D^2}=\frac{\rho^2D^2}{(1+\rho^2)D^2}=\frac{\rho^2}{1+\rho^2} .
Then, the derivative of a team’s Pythagorean winning percentage is \frac{dW\%}{d\rho}=\frac{2\rho}{(1+\rho^2)^2} .
You can use this formula to put “marginal value” on improving a team’s offense. For example:
- For a league average team ( \rho=1), improving the offense by a small amount \Delta \rho will lead to an increase in about \Delta \rho/2 points of winning percentage.
- The best teams in the league often have roughly \rho =1.25 . For the best teams in the league, a similar increase in offensive production leads to only 0.38 \Delta \rho increase in winning percentage.
These numbers mean that small increases in offense or defense are more valuable to average teams than to really good teams. I encourage you to play around with the derivative formulas for Pythagorean winning percentage as well.
Pythagorean Wins and Luck
There is one last fun use for Pythagorean wins. Because we know how good a team should be, we can figure out whether they were lucky or not. By comparing a team’s record to their Pythagorean expectation, we can see if they won significantly more or less than they should have.
This can be a good way to predict regression for teams that are overperforming. It is also a good way to predict improvement for teams that have been underperforming. In this way it reminds me a lot of the other famous baseball stat BABIP which helps us measure which hitters have been lucky or not.
To receive email updates when new articles are posted, use the subscription form below!