Applications of Measures of Central Tendency in Sports

We all remember the measures of central tendency from middle school: mean, median, and mode. All we heard about was mean, median, and mode, but never about why and when to use them. As always, I want to take a sports analytics approach to measures of central tendency. But we’re going to go beyond the basic mean, median, and mode you learned in grade school to talk about other measures of central tendency. We’ll also talk about how to use these things in a sports context.

In sports analytics, everything is about how good your team is on average. But figuring out your average quality is complicated. Therefore, we need to consider many different types of measures of central tendency. We’re going to look at many different options and use some NFL statistics including Expected Points Added (EPA).

The first half of this article explains some measures of central tendency you may not have heard of before. You can follow this link to skip ahead to see how we apply this to a very simply NFL prediction problem!

To receive email updates when new articles are posted, use the subscription form below!

Applications of measures of central tendency in sports

Alternative Measures of Central Tendency

The simplest possible way to describe data and summarize statistics is to describe where the “middle” of the data set lies. That is, we want to understand what a typical value from a data set looks like. This idea is called central tendency.

There are lots of ways to describe the middle of a data set; there are many measures of central tendency used throughout statistics. Part of the reason there are many measures of central tendency is that it is easy to come up with ways to describe typical values from a dataset.

The bigger reason for the existence of many measures of central tendency is that different measures work better in different scenarios. It all depends on context and exactly what you want to get out of your data set.

The Big 3 Measures of Center

The main three measures of central tendency are mean, median, and mode. These are typically taught in a high school stats class because they are simple to compute and provide three very different measures of center. I won’t spend too much time on these things because most people are familiar, but quickly so that we’re all on the same page:

  • Mean – the average of all the data points.
  • Median – the value that is in the middle if the numbers are sorted from smallest to largest
  • Mode – the number that appears most frequently in the data set.

These three measures of central tendency are typically taught because they use the main three ideas that show up in all measures of center: averaging, ordering, and frequency.

The mean represents the simplest way to average data. The median is the simplest and most straightforward way to find the middle for ordered data. The mode describes the “middle” by picking the highest frequency entry.

Nearly every other measure of central tendency builds on the basic ideas of averaging, ordering, and frequency but do so in more complicated ways.

Other Measures of Center

There are countless measures of central tendency, enough that we can’t possibly cover them all. However, we’ll consider at least the following:

  • The midhinge and trimean (and, more generally, L-estimators)
  • Truncated Means
  • Winsorized Means
  • Generalized Means (including root mean square or RMS)

Midhinge, Trimeans, and L-Estimators

The median relies on ordering the data and picking the value right in the middle. Sometimes, though, just looking at the median doesn’t capture the nuance of the overall dataset. The usual reason to use the median instead of a mean is that it is robust (as in, isn’t adversely impacted) against outliers.

However, in certain applications, the median might be too robust. By only looking at the middle value, you might not get a sense for how the distribution looks. This is why other tools including box and whisker plots exist. It is also why the midhinge and trimean exist.

These measures of central tendency depend on the data being ordered like the median, but don’t only use the middle value to report where the center is!

The midhinge averages the 25th and 75th percentile of a dataset as a way to measure the center. The trimean builds on this idea and reports the dataset center to be the average of the median (50th percentile) and the midhinge.

It isn’t difficult to imagine more ways to combine various percentiles as ways to measure the center. Maybe we average the 40th, 50th, and 60th. Maybe we also add in the 10th and 90th to capture the tails of the distribution. In general, estimators like this are called L-estimators.

L-estimators are often preferred to the median in contexts where sample sizes are small. In such cases, L-estimators are better ways to estimate the underlying shape of the distribution. There are other ways to do this, too, as we’ll see.

Truncated Means

One of the main reasons to prefer medians over means is their robustness to outliers. However, one problem with medians, midhinges, and L-estimators in general is that they don’t take into account every piece of data.

You can arbitrarily inflate all values larger than the median without changing the median. You can deflate all values smaller than the median without changing it. You can do both these things at the same time, but so long as you leave the middle value unchanged, the median will remain the same.

Truncated means are a way of reporting measures of central tendency that takes into account more individual pieces of data than medians and midhinges.

The idea of a truncated mean is to first specify a percentage of the data to ignore, then compute the mean of the rest. For example, if we want to compute a 10% truncated mean, then we measure the center of the data set by taking the mean of the middle 80%. That is, we take the mean of the 10th to the 90th percentile of the data.

Doing this retains the nice properties of means while not being susceptible to outliers. If there is a crazy outlier in our data set, we simply ignore it.

It is interesting to note that this truncated mean idea already shows up in sports! This is the way that the olympics score judged events. If 5 judges assign scores after a diver or a gymnastics routine, it is standard practice to toss out the highest and the lowest scores and average the rest. This is an example of a 20% truncated mean! We get rid of the highest and lowest 20% of scores.

Winsorized and Generalized Means

There are two other types of measures of central tendency we want to talk about, both are different ways of computing averages. These two measures of center are Winsorized and generalized means.

Computing a Winsorized mean involves first augmenting the data set to remove outliers. While truncated means simply ignore the outliers, Winsorized means keep them in the data set but change their value to something less extreme.

For example, the 10% Winsorized mean takes the 91-100% percentile of the data and changes their values to all be equal to the 90th percentile of the data. It does the same by setting the 0-9th percentiles all equal to the value of the 10th percentile.

Generalized means rely on using exponents to combine the data in new and maybe meaningful ways. They are often used in engineering and science applications. The classic example is the root mean squared measure of central tendency. Instead of averaging all the values, we first square them all, then average them, then take the square root of the result [to counteract the original effect of squaring].

The classic example of RMS is in electrical engineering when we want to compute the “average amplitude of a signal”. However, in this application one usually cares more about the power of an electrical signal which depends on the square of the amplitude. Therefore, the RMS value of a waveform gives average amplitude in a way that is representative of the overall power of the signal.

Applying Measures of Center to EPA in Football

One fun way to motivate the application of other measures of central tendency to sports events is to look at expected points added in football. Lots of people use average EPA per play as a measure of potency of an offense or quality of a defense. One problem with average EPA, though, is that it is very highly influenced by big plays; it is susceptible to outliers.

To see this effect, we set up a simple experiment where we try to predict how the second half of a NFL game goes by looking at data from the first half. We want to predict the home team’s “margin of victory” in the second half of the football game. To do this, we used the EPA per play in the first half.

The idea behind this is that first half trends should be predictive of second half trends. We used various measure of central tendency computed over the first half EPA values to predict the second half margin. Our measure of success is looking at the R value in a scatter plot between the two measures. Note: this is not meant to be rigorous analysis, just to show that avoiding the mean can help reduce the effect of outliers.

The following table shows the R values between various measures of central tendency for first half EPA when used to predict second half scoring margin in the NFL:

Measure

R Value

Mean

.03

Median

.16

Midhinge

.12

Trimean

.14

10% Truncated Mean

.10

25% Truncated Mean

.13

10% Winsorized Mean

.08

RMS

-.05

While all of these relationships are extremely weak, these numbers do tell us a lot about strengths of various measures of central tendency on data with outliers.

  • The mean has basically no predictive power when compared to the median. This is because of the effect of outliers
  • The median does better than all the other “robust to outliers” metrics like truncated means and the midhinge. Each of these is designed as a balance between the median and the mean. For this data set, the median is the best and therefore the further away from median we go, the worst predictive power we get.
  • The fact that the RMS value is negative is extremely fascinating. RMS actually places more value on large values/outliers than the mean does. RMS being negative means that if your performance in the first half was buoyed by a few big plays, your fortunes are likely to flip in the second half.

Conclusions

In the future, our intent is to look at using measures of central tendency for on EPA data to predict the outcomes of NFL games. This article is meant to introduce some of the information we’ll rely upon for reference in the future.