Sports Analytics and the Credit Assignment Problem (Why We Need More Tracking Data)
The credit assignment problem in sports analytics asks the fundamental question of how to divide the credit from a particular event to each of the players involved. If Derrick Henry scores a long rushing touchdown, was it because of his skillset, a big hole opened up by his offensive line, the defensive backs paying extra attention to AJ Brown, or some mix of these factors?
I think it is clear that the answer is always that a mix of factors go into determining the outcome of a play. What is not clear – not at all in some cases – is how much of the credit should be assigned to each of these factors and ultimately to each player on the team. The credit assignment problem is fundamental to sports analytics because it is crucial in determining how good players are. In this article we’ll first look at the credit assignment problem in a few different sports. Then we’ll include some commentary about the roles of expert opinion and tracking data in tackling this problem.
Credit Assignment in Football
To me, football is the most difficult major sport to assign credit for the outcome of a play to individual players. The Derrick Henry example above is quite illustrative of why this problem is so hard, but even simple passing plays can be complicated to break down. Consider the different factors that go towards a successful passing play:
- A good throw by the quarterback will make a completion more likely
- A receiver who runs a great route and gets wide open will make it easier for his quarterback to complete the pass
- Good offensive line play will give the quarterback more time to make a good throw and the receiver more time to get open
- A good block by the running back or a successful play-action fake by the running back can draw the attention of some defenders, again making things easier on the quarterback and receiver
- Good routes by the receivers who were targeted can draw safety attention to make a one-on-one matchup more likely
To complicate the credit assignment further, the following factors that don’t even correspond to the current play can even have an impact
- A good WR1 on a team can make things easier for the WR2 on the team because they get to play against inferior defenders
- Good play-calling by the OC can make players appear much better than they are
- Good defense by your team can ensure that the opposing defense is on the field more often and is playing tired.
This is why football is so complicated yet so interesting to talk about. Unfortunately for myself, though, this is also why football is so hard to analyze from a data perspective. Some websites like Pro Football Focus make a living by trying to objectively split the credit for a team’s success between the relevant players. They use the eye test to try to solve the credit assignment problem
Some statistics like expected points added try to divide the credit from a long scoring drive and assign a value to the outcome of each individual play. The NFL’s “Next Gen Stats” gets closer to solving the credit assignment problem by measuring subtler statistics related to how players perform on individual plays. The true solution, though, will be somewhere between these two approaches. We’ll discuss this more later when we talk about why tracking data will be crucial to sports analytics as we continue to progress through the digital age.
Division of Credit in Baseball
At the complete other end of the spectrum, the division of credit problem in baseball is nearly entirely solved. While there are 10 players “involved” in any play in a baseball game, a huge proportion of the credit is due to the actions of two players: the pitcher and the batter. If a batter gets a hit, it’s because of his actions. If a pitcher gets a strikeout, it’s because of his actions. It is easy to figure out who is responsible for the good and bad things that happen in innings.
Because it is easy to assign credit to certain events, advanced metrics like RE24 can provide incredibly accurate values to an individual player’s abilities. Want to know how valuable the best base stealer in baseball is? Want to know the value provided by a home run hitter versus a Joey Votto or Joey Gallo-type guy who draws high walk numbers? Because we can consistently and accurately assign credit for the outcome of an event to an individual player, these types of questions are easy to answer in baseball.
Credit Assignment in Basketball
Assigning credit for points being scored to players in basketball is easier than in football but harder than in baseball. Part of the reason for this is the number of people involved in any individual play – 2 for baseball, maybe 2 to 5 in basketball, and nearly all 22 in football.
Perhaps a bigger part of the reason that credit assignment in basketball is more tractable than in football is that most of the value drivers are recorded in the box score. If a shot is made, a player gets credit for that. If a player gets a tough rebound or a steal, they get credit for that. These events are relatively easy to measure the value of and are relatively easy to assign the credit to individual players. In football, there are no stats for a lineman successfully blocking his defender.
Unlike baseball, though, basketball still has “intangibles” which can be difficult to measure. While Klay Thompson might get credit for making an open 3, Draymond Green should get some of the credit for making the pass to create the open 3 or setting a hard pick to create the required separation. Many people try to factor in these intangibles by using plus minus stats to look at the overall impact of a player on the court. Credit assignment in basketball is fascinating because while it is difficult, we can take a pretty good stab at it with some creative analytics.
Credit Assignment in Golf
Golf is an even easier credit assignment problem than baseball. In baseball, there is ambiguity as to whether a hit occurred because of a bad pitch or because of a good swing. In golf though, the ball is just sitting on the ground. 100% of the outcome is due to whether or not the golfer made good contact or not. As a result we can come up with some pretty neat stats which describe how good of a golfer someone really is. We can adjust a golfer’s skill based on driving, chipping, and putting abilities to predict how well they might do on a particular course. We can look at who is getting better or worse to predict the outcome of tournaments.
Or, rather, we could if the PGA didn’t control its data so tightly. While I am not blind to the fact that sports leagues are business and their data is their IP, no other major sports league goes to the lengths the PGA does to protect their data. Generally, the NFL, NBA, and MLB are willing to let analysts use their data. It’s free advertising after all. But the PGA intentionally obfuscates their data and makes it harder to collect data for analytics without subscribing to their sponsored data feeds. So while there do exist pretty good golf analytics sites, for the foreseeable future this will not be one of those sites.
Resolving the Credit Assignment Problem in Sports
So what’s the solution? If the credit assignment problem is such a problem, how in the world are we able to figure out if a running back is actually good or not? How were we supposed to figure out that Kemba Walker was overperforming before he joined the Celtics? How are we supposed to determine if a quarterback is worth a 500 million dollar contract? To everything in sports analytics there are two solutions. The first is the eye test and the second is more data.
The Value of the Eye Test and Expert Opinion
The eye test is nothing more than watching a game and judging how good the players are by what you see. Being as data driven and mathematics oriented as I am, it may surprise you to hear how much value I place in the eye test. The reason is that the eye test answers questions that the data cannot. It’s a perfect complement to data. If a running back goes for 10 yards, the score sheet won’t tell you if the credit for that goes to the running back or the blocking crew. But if you actually watch the game, you can make a fairly accurate determination for yourself to what extent the credit should be given to each individual player.
If the eye test weren’t valuable, there wouldn’t be a market for sports talk shows and sports podcasts. If the eye test weren’t valuable, there would be no reason to read fantasy football articles and watch film breakdowns hoping to find the most likely breakout candidate. Perhaps most convincingly, if the eye test weren’t valuable, pro sports teams wouldn’t spend millions of dollars on scouting departments to determine which players they should sign. In the current state of affairs, the best way to do sports analytics is to pair data with a healthy dose of the eye test.
Tracking Data is the Next Frontier in Sports Analytics
Finally, though, I want to talk about where I think the analytics community should go / where the behind the scenes in front offices likely already is. By tracking data, we mean data which tells you where all the players (and the ball) are on the court or field at any given time. In addition, we want to include velocity data – how fast and in which direction the players and ball are moving in.
Some of this data already exists and is becoming more commonplace in the fans’ vernacular. Baseball has Statcast which measures exit angle and velocity of batted balls. The NFL has Next Gen Stats which measures player speed and route run on a given play. The NBA has their own tracking data, too. These things exist and were largely made possible by the massive computer vision and image processing advancements made since the publishing of LeCun, Bengio, and Hinton’s landmark paper Deep Learning. The technology continues to improve and tracking data analytics will become more common.
I’m going to close with some bullet points that should make it clear why tracking data is so valuable in the future of sports analytics, especially in cases where the credit assignment problem is difficult. With tracking data,
- We can solve the credit assignment between a running back and their offensive line by looking at the size of the hole and how close the defenders are to the running back throughout the run.
- We can measure the accuracy of a quarterback by looking at completion percentage after controlling for how open the receivers were in the first place.
- We can measure how good a receiver is by looking at how open they get regardless of whether or not the QB is able to complete the pass or not.
Outside of football,
- We can measure just how impressive Steph Curry’s shooting is by looking at the difficulty of his shots based on how close the defenders are.
- We can measure the quality of defensive players in baseball by looking at their speed and the quality of the route they take to get to the ball. We can also measure their defensive quality by measuring the accuracy and speed of their throws.
- We can develop better punt return and kickoff return blocking and tackling formations by looking at those formations which tend to be more or less effective.
This list is not exhaustive, of course, but hopefully should convince you that tracking data is the future of sports analytics and will be a great solution to the credit assignment problem in the future.