Sparse Impacts Model: Test and Evaluation

In the article here, we introduced the sparse impacts model. This NBA model is our attempt at creating a simple yet effective way to predict the outcome of NBA games. One goal of the Sparse Impacts Model is to find the perfect balance between a model’s complexity and it’s predictive power.

However, the main goal of this model is to correctly predict the outcome of NBA games. Therefore, the phrase “perfect balance between complexity and predictive power” actually means “the balance that results in our model being as accurate as possible”.

In this article we will describe the sparse impacts model a bit more in depth. Then, we’ll explain how we optimized the model and talk about how it performs on real data. Along the way, we hope to touch on a common, fun idea in machine learning: the bias variance tradeoff.

To convince you to keep reading, though, we’re going to actually start with some results. Instead of building up to a grand reveal, we’re taking the Tarantino approach and giving you the bottom line up front. So, the first section below shows how our model performs compared to other models!

To receive email updates when new articles are posted use the subscription form below!

Sparse Impacts Model Accuracy

From my point of view, there are two ways to measure a model’s accuracy in the NBA. First, we can look at whether or not the model correctly predicts who won the game. Second, we can look at how accurate the model is in predicting the margin of victory.

Let’s start by looking at the number of games predicted correctly. In order to get a baseline, we’ll compare our results to those shown on ThePredictionTracker.com as November 25th, 2023. Through this time, the Sparse Impacts Model correctly predicted 65.8% of games. The screen grab below shows the accuracy of the other top models compared at the same point in time.

NBA model accuracy

Our model at this instant in time would be the second best model from those listed. It only loses against Massey ratings and the Vegas line. This is already a really good sign for our model. The plot below shows the accuracy plotted as the season goes on.

Sparse Impacts Model accuracy over time

This is pretty typical – as the season is just getting underway, the model’s reported accuracy is all over the place. But as more games are played, the model’s running accuracy starts to converge. The point of showing this is to show that, while our sample size is fairly small, we can be reasonably confident in the quoted 65% accuracy mark.

The second metric for model accuracy is error in predicting margin of victory. To get a slightly better idea of how the model works (and how much things can vary season-to-season), we built the table below over a few seasons.

Season

Absolute Error

Mean Squared Error

2023-2024

10.2

175

2022-2023

10.3

169

2021-2022

11.3

206

Comparing the results from the 2023-2024 season above to the screenshot from ThePredictionTracker shows that yet again our model is performing near the top of the pack.

All of the results in this section were generated using the following parameters for our model:

Parameter

Value

Home Court Advantage

3.5

# Players Rated

30

Weight of Prior (# Games)

10

In the next section, we’ll talk a little bit about how we landed on these values. More generally we’ll talk about the process of hyperparameter tuning in data science.

Hyper Parameter Tuning in Data Science

One of the more subtle arts in data science and machine learning is the concept of “hyper parameter tuning”. Analysts have a lot of decisions to make in building a model. Here are a few examples you might run into in building sports analytics models:

  • How many points is home field advantage worth?
  • How much should last year’s results influence our expectations for this year?
  • How much of an NFL team’s success is due to their quarterback?

When building a model, the analyst has to decide on the value for these things (which are called hyper parameters). Is home field worth 3 points or 4? Should we stop paying attention to the previous season stats halfway through the current season? A quarter?

These types of questions fundamentally change how a model performs. Building the best model requires choosing the best values for the hyper parameters. This process is called hyper parameter tuning.

How to Tune Hyper Parameters

The previous few paragraphs described the what and why of hyper parameter tuning, but how is it done? It sounds kind of silly, but the standard way is to just “try a bunch of different values and see what works”.

For example, maybe we see how accurate our model is if we try a bunch of different values for home field advantage. We program in a 2.5 point advantage, and see how accurate the model is. Then we try a 3 point, a 3.5 point, etc. advantage and see which gives the most accurate model.

It is important when doing this to use the idea of a train-validation-test split of your data. This involves splitting your training data into three different groups. I like to describe these groups as follows:

  • Training set: The data used to train your model. For our NBA model, this is the data that determines the ratings for the teams and players.
  • Validation set: The data used to validate your choice of hyper parameters was correct. For example, if a 2.5 points home court advantage gave a higher accuracy on the validation set than a 3.5 point accuracy, we will pick that value for the hyer parameter.
  • Test set: After all the parameters are chosen and the model is trained, we need to know how well the model performs. The test set tells us the overall model accuracy and importantly isn’t used to change the model values at all.

The reason we do all this is to ensure that our model is as good as it can be. By splitting the data into different subsets, we can ensure that our model’s accuracy is accurately represented. We also avoid overfitting, a common problem in machine learning.

Our Hyper Parameters

Returning to our specific setting, what hyper parameters do we need to tune? There are a few important choices that we need to make in our model.

The first choice to make is the home court advantage. Teams at home tend to play better. This is in part due to the comfort of sleeping at home and being in your traditional routine, and also in part due to the home crowd giving a boost to their own players. Through hyper parameter tuning, we determined that our model works best using a 3.5 point home court advantage.

The second hyper parameter to tune is how many players to assign ratings to. Remember the the whole point of the sparse impacts model is that we only give ratings to a few players. But how many is a few?

If we rate too few players, our model doesn’t take into account all the information it has access to. It will lose accuracy. On the other hand, if we rate too many, our model can suffer from the sample size problem and overfitting as described in our introductory article about the Sparse Impacts Model. The best value is somewhere in the middle. For now, our search determined rating only the top 30 players to be optimal.

The final hyper parameter we tune in our model is the weight of the prior. To jump start our model at the beginning of the season, we feed it data from the previous season. If we value the previous season too heavily, then our model doesn’t learn from the new data as quickly as it could.

On the other hand, if we weight the previous season too lightly, then our model will suffer from small sample sizes towards the beginning of the season. Through the same hyper parameter tuning methodology, we determined that weighting the past season data the same as 20 seasons of current game data to be optimal.

Bias-Variance Tradeoff in the Sparse Impacts Model

Finally, we want to comment on how our model deals with the bias variance tradeoff. The bias-variance tradeoff is a commonly talked about, poorly understood topic in machine learning communities.

Unfortunately, most people define the bias-variance tradeoff mathematically. Though I am a mathematician by training, I always prefer the intuitive approach. The bias of a model refers to its average systematic error. The variance refers to how much a model can change based on small changes to the training set. The reason variance is bad is that it tends to imply that the model will perform worse on test data than on training data.

In general, you want both the bias of your model to be small and the variance to be small. But the bias-variance tradeoff says you can’t really do this. In order to decrease bias, you sacrifice variance. The opposite is true as well.

For the Sparse Impacts NBA Model, we can directly tune the bias variance tradeoff with the “number of players to rate” hyper parameter. The fewer players we assign ratings to, the more error our model will have. The more players we rate, the more we risk overfitting our model to the training data.

To receive email updates when new articles are posted, use the subscription form below!