NBA Power Rankings with Bayes Ensemble

2020 NBA Power Rankings

Designing ways to generate power rankings for a particular league is one of the most important problems in sports analytics. About seven months ago, I introduced the Ensemble Method (or, as we’ll say Ensemble Ratings) for creating NBA power rankings. This technique was actually quite successful – we achieved competitive predictive accuracy when compared to many top methods. However, because our original method was entirely bias-free and wasn’t designed to learn from prior season data, our accuracy at the beginning of the season was quite low. In spite of this slow start, our technique achieved near state-of-the-art accuracy in predicting winners straight up towards the end of the season.

As the 2020 NBA season approaches, we are going to update our technique to address these issues. The change that we make relative to basic Ensemble is that we give our model access to an estimate of team quality to start the season. Mathematically, this changes our model from a maximum likelihood estimator of team quality into a maximum a posteriori estimator of team quality. Statisticians will recognize our technique as a form of ridge regression. Concretely, this means that our model will learn the appropriate team ratings much quicker, generate more accurate NBA power rankings, should be more accurate at the beginning of the season, and should be more accurate at the end of the season.

Parameter Estimation for NBA power rankings

Our original version of Ensemble NBA power rankings worked by understanding that every team has variation in their performance on a night to night basis. This idea was translated mathematically by modelling a team’s performance on a given night as having some inherent randomness – as being normally distributed. Here’s a quick example. At some point last year, we estimated the Milwaukee Bucks team quality to be normal and about 12 points better than an average team while we estimated the Lakers to be normal but only 8 points better than an average team. If these two play on a neutral court, we would predict the Bucks to win by 4 (12-8=4). However, because of the inherent randomness in sports, this prediction is not a guarantee. In fact, we estimate the probability that the Bucks beat the Lakers is equal to the probability that a random number drawn from the Bucks distribution is larger than a random number drawn from the Lakers distribution.

That is how our model’s predictions work. Given each team’s rating (the mean of their normal distribution), we can estimate margins of victory and winning probabilities. Moreover, this Gaussian ensemble gives us a way to create NBA power rankings by ranking them according to the mean of their Gaussian distribution.

Now, a natural question arises: how do we determine a team’s rating? It is nice that, given the ratings, we can do all that I claimed above, but how do we compute the ratings and how can we be sure they are accurate? In our original Ensemble method, we used the machinery given to us from the statisticians of the world to do parameter estimation. In particular, we used maximum likelihood estimation in order to determine these ratings.

Maximum Likelihood Estimation and Least Squares Minimization

What is maximum likelihood estimation (MLE)? Luckily in the NBA, we have observations – the scores of all the games having been played so far – that give evidence as to how good a team is and what their rating should be. For instance, if we watched the Nets beat the Warriors by 12 on opening night, then that tells us that the Nets’ rating should be about 12 points higher than the Warriors’ rating. If the Warriors go on to lose to the Bucks by 16 on Christmas Day, then that tells us that the Warriors’ rating should be about 16 points lower than the Bucks rating and, by extrapolation, the Bucks’ rating should be about 4 points better than the Nets’ rating.

As more and more games are played, we accrue more and more evidence as to what a team should be rated, and we can compute the teams’ ratings that match what we see. MLE is a technique to pick the parameters – in this case each team’s rating – that best explains the entire set of data we have observed. MLE picks the ratings so that the predicted margins of victory for every individual game match the actual margins of victory as closely as possible.

MLE behaves very well when used to estimate the means of Gaussian distributions. In fact, after using the log-trick, the maximum likelihood estimate turns into a least squares optimization problem. In fact, we compute each team’s rating by minimizing the squared errors between the predicted margins of victory and the actual margins of victory over all the games in the season.

This relationship between maximum likelihood estimation and least squares regression has been observed time and again and is well known throughout the statistics, mathematics, and sports worlds. However, what is perhaps less well-known is a relationship between maximum a posteriori (MAP) estimation – a form of MLE where we incorporate an initial guess for a team’s quality – and penalized least squares regression.

MAP Estimation and Penalized Least Squares

Remember our original goal in developing an updated Ensemble Ratings system, which we will call Bayes Ensemble Ratings, is to incorporate some knowledge of team quality before even the first game was played to generate the most accurate NBA power rankings possible. MAP estimation is a tool to combine observed data with a prior guess about the mechanism that drives the data. Again, an example. This is one I like to use when I have taught statistics before.

Suppose we have a jar full of 1000 coins, 999 of which are fair and one of which is double-sided heads. If I pick a coin at random, what are the odds that I think I have the unfair coin? Clearly, 1/1000. Now, if I start flipping the coin and seeing what outcomes I get, I can revise my guess for how likely it is I have the unfair coin. If my first flip is a tails, then I know for a fact that I have a fair coin: I revise my probability to 0. However, if I start flipping the coin and see many heads in a row, I might think it more likely I have the unfair coin. Suppose I flipped 10 straight heads. If I selected a fair coin – which happens 999/1000 times – 10 straight heads is a 1/1024 probability event. If I select a fair coin – which happens 1/1000 times – this event has a 100% chance of happening. So in fact, if I flip 10 straight heads on my random coin, I actually think it is more likely that I have selected the unfair coin than that I have selected the fair coin. If I flip 10,000 straight heads, I must think it is almost certain that I have the unfair coin.

In the previous example, I have both a prior assumption about the mechanism of randomness (that 1 out of 1000 coins are fair) and I have evidence about how correct my prior assumption was. The tool to combine this prior assumption and my observed evidence is Bayes’ Theorem. In our case, the technique of combining the evidence from NBA games with a prior assumption about team quality to give NBA power rankings will give us a maximum a posteriori estimation problem.

MAP estimation can be difficult, but in some nice cases there are concrete, satisfying ways to solve the problem. In particular, if your data is Gaussian (or, normally distributed) then the MAP estimate of the parameters of interest can be formulated as a penalized least squares optimization problem. This is the justification we use in the next section.

Penalized Least Squares for Bayes Ensemble NBA Power Rankings

Ok, but how does all this – MAP estimation, Bayes’ Theorem, and penalized least squares – help us determine the best NBA teams? Through combining observational data with an ‘initial guess’ as to team quality!

The first ingredient in our NBA power ranking method is to formulate an initial guess about team quality. For us, this means assigning a number to each NBA team so that we might expect the difference between two teams’ ratings to predict the margin of victory. From past experience, we find that the best NBA team is about 20 points better than the worst on a neutral court. Because we want our ratings to be centered around 0 (so that they read like +/-), this means the best team will be assigned a provisional, preseason rating of 10 and the worst team a provisional rating of -10. The rest of the teams will interpolate, will fill in the in-between values.

Now all we need is an accurate way of determining which team should start the season as the best, who should be second best, and who should be the worst. I have discussed previously how Vegas lines are interpretable as ensemble learners and because of this are quite accurate in their predictive ability. Therefore, our preseason team rankings are imputed by using a specific sportsbook’s pre-season title odds. This technique certainly isn’t the best we could do – for instance, conference imbalance can skew title odds to not reflect team quality directly – it certainly serves as a pretty good first guess of how to rank the NBA teams. And, when we remember that our original ensemble method started with the base assumption that ‘all teams are of equal quality to start the season’, we certainly expect to see improvements.

You can view our pre-season Bayes’ Ensemble rankings here. Because there have not been any games played, the MAP estimate for each team’s rating is nothing more than the mean of the prior. Simply: these rankings reflect each time’s title odds in Vegas.

The second ingredient in creating our NBA power rankings is observed game data. As time goes on, our rankings will reflect observational evidence along with our initial guess. For those who are curious for a little extra detail, the next section contains the explicit mathematics and the problem we solve.

The Mathematics

While I usually write to a general audience, because this section is unnecessary to understand the method I am going to allow myself a short time to use precise mathematical language. Because the optimization is easiest, we are going to assume that each team’s rating is normally distributed with mean given by our pre-season ratings and a variance to be determined later. In this particular case, the MAP estimate turns into a penalized least squares optimization problem where the penalty itself is proportional to the squared difference between the parameter and our pre-season estimate. In particular, we minimize: \sum_{\text{Each Game}} \left(\text{Pred. Margin} - \text{Actual Margin} \right)^2 + \sum_{\text{Each Team}}\left(\text{Team Rating} - \text{Team Preseason Rating}\right)^2

Then, solving to minimize the above quantity simplifies to inverting a 30×30 matrix. In our original version of Ensemble Rankings, we only used the first term corresponding to observed games scores. In that setting, we have issues with non-uniqueness of minima. Because of this problem, we use a Moore-Penrose inverse to compute each team’s rating. Moore-Penrose inverses have the desirable property of giving more ‘conservative’ estimates of the solution when it is non-unique. In fact, the set of team rankings we get will not only solve the system but will be the solution that, when considered as a vector, has the smallest Euclidean norm. However, because of the L2 penalty imposed in our Bayes’ Ensemble method, this non-unique solution problem should not occur.

Just a few notes about choices we have made. First, our impetus for choosing an L2 penalty is largely because of the simplicity of the optimization problem. However, one could conceivably choose an L1 penalty which would still enjoy relatively straightforward gradient calculations. An L1 penalty corresponds to a Laplacian prior on the team’s rating whereas the L2 penalty corresponds to a Gaussian prior.

Second, the value of \lambda is up for debate. Large values place a high emphasis on adhering closely to our preseason expectations for the NBA power rankings whereas smaller values place more emphasis on the observed data. Statistically, the value of \lambda corresponds to the variance of the normal distribution in our prior. Again, larger values mean there is more room for error in our initial guess while small values mean we are fairly certain in the correctness of our preseason rankings. Choosing the optimal value is largely a matter of style and preference. For us, we will use cross-validation on data from previous years to get a good idea of what \lambda to pick.

Minutiae and Subtle Benefits of the Addition of Priors

One shortcoming of the vanilla Ensemble method is that there wasn’t a nice, satisfying way to incorporate information like ‘Kawhi Leonard sat out this game’ or ‘Steph Curry just came back from injury’. That is, our model doesn’t know when there are large, meaningful events that will dramatically impact a team’s quality. If a team loses a player for the rest of the season, in the past we had to wait until quite a few new games had been played before our model would pick up that there has been a shift in team quality.

Now, if a significant event happens we have a tool to ‘bake’ this information in. We can update our priors in order to reflect the fact that a team isn’t as good as we might have believed in the preseason. If, for instance, the Jazz lose Nikola Jokic for the rest of the season, they would likely go from a top title contender to a fringe playoff team. Therefore, it would make sense for us to update our prior estimate of team quality to reflect this fact. Again, we can use Vegas to do this. Instead of constantly using the same pre-season rankings, we will continue to use updated title odds in order to reform our prior. In this way, our model should be able to quickly adjust to large changes in a team’s quality.

Finally, we need to comment on the amount of data we’ll use. Again, last season we used all the available games as they were played to update our power rankings. However, we saw some evidence that favoring more recent games more heavily than games played long ago led to increased predictive accuracy. Therefore, as the season goes on, we will only use about the most recent 20 games of data to derive our NBA power rankings. As before, the exact number of games to use will be estimated via cross-validation on previous years.

Commentary

The accuracy of vanilla Ensemble was quite impressive. Our prediction accuracy was actually quite high even when compared to some of the most accurate computer rankings available. Even with all the shortcomings, our technique worked well in deriving NBA power rankings. We expect that this year our model will be even more accurate and we expect our algorithm to compete with the best prediction methods out there.

Second, our rankings are tied to available Vegas data. In general, such a trait is not desirable. The output of our model relies quite heavily on the output of another model. Our model cannot operate in a vacuum. Is this a problem? If you ask my mathematician friends they would say ‘YES, this is a huge problem’. However, in practice this shouldn’t really matter at all for our purposes: determining the best teams. In some sense our model should achieve very strong results because it is a form of ensemble learning which is known to be very accurate when used intelligently.

Third, because we predict margins of victory, we can actually estimate win probabilities in a given game. If you aspire to be a degenerate gambler, knowing win probabilities (or, at least, an estimate of win probabilities) can help you identify positive EV bets.

Fourth, home court advantage is a bit of an interesting topic this year. Typically, home court advantage is about 3 points (we estimated last year 2.5 points to be closer to the truth). However, in 2020 we may see something significantly different. Some teams aren’t allowing or are limiting fans, teams generally won’t travel as far or as often, and the benefits from home court advantage may be much less than normal. It turns out that if we interpret ‘home court advantage’ as a learnable parameter to be estimated, we can figure out how much playing at home is worth this year – anything but a normal year.

Finally, as time goes on and I have more and more time to work on this project, we will begin hosting more and more data that can be derived from our NBA power rankings. Because we have a method of predicting win probabilities, we can estimate things like title odds, expected seed, and predictions for individual games, to name a few. Hopefully this project grows into something successful and can help sports enthusiasts – whether generic fans, gamblers, applied mathematicians, or otherwise – derive more meaningful conclusions about the NBA as a whole.

To receive email updates when future articles are posted, please consider subscribing to our email notifications!