Teaching the Exponential Distribution with sports
In this edition of Teaching Math with Sports, we look at applications of the exponential distribution in sports. The exponential distribution is very closely related to both the Poisson distribution and the geometric distribution.
There are two intuitions for the exponential distribution. The first is that it is used to model the waiting time between continuous events. That is, it is a good model for the time between occurrences in a Poisson process. The second intuition is as a continuous generalization of the geometric distribution. The geometric distribution models how many discrete experiments must occur before an event happens; the exponential distribution models how long before the event happens when it can happen at any point in a continuous interval of time.
In this article we look at a few different examples that show how the exponential distribution can be used in sports analytics. We also include some challenge questions with each example of teaching the exponential distribution with sports.
To receive email updates when new articles are posted, use the subscription form below!
Exponential Distribution Background
The exponential distribution is specified by a single parameter, \lambda , which is usually called the “rate”. Exponential distributions are used to model waiting time for an event; the parameter \lambda is the average number of times that event is expected to happen over a given period of time.
The exponential distribution is typically denoted by Exp(\lambda) and if X \sim Exp(\lambda) then the probability density function is given by f(x) = \lambda e^{-\lambda x} for positive values x>0 .
The most common application of the exponential distribution is to model waiting times between Poisson events. Recall that a Poisson process is one where an “event” can happen any time during a continuous interval of time. Common examples include customers arriving at store and calls being placed to a call center. We discuss more examples of Poisson processes and Poisson random variables in a sports context in our article about Teaching the Poisson distribution.
Lastly, it is important to note the difference between a Poisson process and a Poisson distribution. Poisson distributions model finite intervals of time. Poisson processes model unbounded intervals of time. Restricting a Poisson process to a finite time interval results in a Poisson random variable. Exponential distributions model the waiting time between occurrences in a Poisson process. Therefore, the waiting times can be arbitrarily long.
This distinction might be subtle, but an example can be straightforward. A Poisson random variable could model the number of heart attacks worldwide in a day, or in a month, or in a year. A Poisson process (a specific type of mathematical object called a stochastic process) could be used to model the occurrences of heart attacks for all time. Restricting a Poisson process to a finite interval gives a Poisson random variable.
Example 1: Modeling Length of Overtime in Hockey
In our Poisson distribution article, we described how the number of goals scored in one of the first three periods can be described with a Poisson random variable. However, we argued that a Poisson distribution was inappropriate to use to model hockey overtimes.
However, the exponential distribution is the perfect example to model hockey overtimes. This is because hockey overtime is sudden death. The overtime period continues until a goal is scored. Goals being scored in hockey could reasonably be modeled as a Poisson process. Then, overtime in hockey can be viewed as “waiting for an event to occur in a Poisson process”. That means the exponential distribution is the perfect model to use.
What value should be used for \lambda , though? In hockey, roughly 6.5 goals are scored per game. The game has 60 minutes of play time while the overtime period is only 5 minutes long. The parameter \lambda is the average number of occurrences over a given period of time. If 6.5 goals are scored on average over 60 minutes, then on average 0.55 goals should be scored during the 5 minute overtime period.
If we want to model the probability of a goal being scored in the first four minutes of overtime, you first let X \sim Exp(0.55) and let \Phi(x) be the cumulative distribution function of X. Then, the value \Phi(0.8) tells you this probability. This 0.8 arises because 4 minutes is 80% of the 5 minute overtime. Any calculations could be done by instead using a Exp(0.11) distribution (goals scored per minute) and calculating \Phi(4) .
Challenge Question 1: How could you use the exponential distribution to predict the probability the game is tied after the overtime period?
Example 2: Fouls in the NBA
Almost ubiquitously in both college and professional basketball, coaches bench players that get into foul trouble early. This inevitably leads to incessant discussion, debate, and even argument about whether the coach benching the player is the right call. We’ve previously used nonhomogeneous Poisson distributions to argue that it is never strictly correct to bench a player in foul trouble. However, you could also argue that players in foul trouble play scared and therefore they should be benched to get their head on straight. It’s a tough debate.
The exponential distribution can show up when discussing occurrences of fouls in the NBA. Suppose a specific player averages a foul once every 12 minutes of play time. Then, modeling the waiting time in minutes between fouls by this player can be described with an Exp(1/12) distribution. Two examples of question you can answer using this model include:
- Suppose a player has 5 out of 6 allowable fouls with 6 minutes left in the game. If the player is not going to be subbed out, what is the probability they foul out?
- What is the probability that a team gets no fouls at all in the first quarter of a game
Challenge Question 2: Suppose a player averaging a foul every 12 minutes receives his 5th out of 6 fouls with 6 minutes left in the game. The player will not be subbed out and will leave the game if and only if they foul out. What is the expected value of minutes that player is on the court over the last 6 minutes of game time?
Example 3: Pitching and the Geometric-Exponential Connection
In the introduction to this article, we talked about how the exponential distribution can be approximated by the geometric distribution. In this example, we show how to use the exponential distribution to approximate the geometric distribution.
First, though, what is the intuition for why the geometric and exponential distributions are related? The exponential distribution describes waiting time for an event to happen when the event can happen at any point in time. The geometric distribution describes repeating an experiment until an event occurs. Let’s see this more explicitly with an example.
Sometimes in baseball, at-bats can go on for a very long time. In particular, if a batter has a full-count, then hitting foul balls can extend an at-bat indefinitely. Suppose that a batter has a knack for extending at-bats and they can foul the ball off 95% of the time if they really want to. What is the probability that they foul the ball off at least 20 times?
To solve this using a geometric distribution, you would need to compute the cumulative distribution function for fouling the ball off 19 or fewer times. Then, your answer is the complement of this number. The cumulative distribution function of the geometric distribution can sometimes be hard to compute, though. In particular, without access to a calculator, it would take you an hour or more to calculate this number.
Luckily, the exponential distribution provides an alternative! A player having a 95% chance of fouling the ball off means we expect a mean waiting time of 20 pitches before a non-foul-ball occurs. This means that on average 0.05 non-foul balls occur per at-bat. Therefore, using an exponential distribution with \lambda = 0.05 will let us model the probability we are interested in. If \Phi is the cumulative distribution function for the Exp(0.05) distribution, then the probability we are concerned with can be well-approximated with the value 1-\Phi(20)
Challenge Question 3: Why does the answer use \Phi(20) and not \Phi(19) ?
Challenge Question 4: From a computational perspective, why might we prefer to use the exponential distribution instead of the geometric distribution? What settings might this make sense in?
Challenge Question Answers
- The game is tied after an overtime period if no goal is scored in the first five minutes. This event is the complement of the event “a goal is scored in the first five minutes”. Therefore, we can use 1 minus the CDF at 5 minutes to get our answer.
- We can condition on two different events to help: that they do foul out and that they don’t. For t< 6 , if \varphi is the pdf of the relevant exponential distribution, then the probability of playing t minutes is \varphi(t) . For t= 6 , the probability of playing t minutes is 1-\int_0^t \varphi(\tau) d \tau . Then, the expected value can be found with an integration by parts.
- Because the exponential distribution is continuous and not discrete
- For very large numbers of experiments and very low probabilities of success, the geometric distribution requires computing n choose k for very large values of n and k. This can become computationally unstable. The geometric distribution – having already taken the limit and approximating the pdf with an appropriate Taylor series – does not evoke these numerical instabilities.