Parity in March Madness: Do Underdogs Overperform?

It seems like almost every year, all anyone talks about is the increased parity in March Madness relative to the regular season. The underdogs always seem to steal the spotlight until the eventual champion wrests it away. Even we ourselves have spent significant time trying to predict upsets in March Madness. But I constantly wonder whether this is all confirmation bias. I mean, the word madness is in the title of the event. We’re predisposed to focus on the upsets and not on the chalky results. But is there actually an increase in parity in March madness relative to the regular season?

Luckily, parity in March Madness is something we can actually study. All we need to do is look at how well underdogs do in the regular season, how well they do in the tournament, and compare the two. For example, maybe the fact that at least one 12 seed tends to advance to the second round isn’t evidence that the tournament increases parity but is rather simply a consequence of the law of large numbers. If each 12 seed had about a 20% chance of winning against the 5 seed, then it is more likely than not that at least one of them wins. So, is there actually an increase in parity in March Madness or is it the law of large number and confirmation bias?

This article looks at thirty years of tournament data and tries to answer this question. Do underdogs tend to overperform in March Madness relative to the regular season? I’ll start by explaining the methodology twice. The first will be intuitive while the second will use more robust statistical language. You can use this link to skip ahead to see our discussion of the results.

Testing for Increased Parity in March Madness: Intuitive Explanation

Our Ensemble method for predicting margins of victory in sports is an unbiased way to predict which team will win a game and by how much. If we look at how much our predicted margin of victory differs from the actual margin of victory, we get a probability distribution. A picture approximating this distribution is shown below.

the distribution of prediction errors in college basketball which is useful in testing for parity in march madness

Notice that this distribution trails off as you get further away from 0. This is indicative of the fact that our estimates tend to be pretty good and that large errors are quite unlikely. Before you ask: yes there have actually been examples in the NCAA tournament of both the favorite and the underdog overperforming their expectation by more than 40 points.

In 2016, Villanova was favored over Oklahoma by about 4 but ended up winning by 44. Unsurprisingly, the biggest upset happened in the exact same season. Virginia was favored over UMBC by almost 22 but lost by 20.

Our goal is to see if the shape of this curve is different for the tournament than it is for the regular season. We’ll also look at the mean of the given distribution to try to see if underdogs tend to over or under perform in the NCAA tournament. These tools will help us determine if there is increased parity in March Madness.

The next section is going to talk a bit about the formal statistics that will be used and can comfortably be skipped without losing the ability to understand the end results.

Testing for Increased Parity in March Madness: Kolmogorov-Smirnov Statistics

Formally, we want to see if our predictive model performs any differently when we try to predict the results of tournament games versus trying to predict regular season games. If there is, then this may be evidence of increased parity in March Madness. We want to use two statistical tests to do this.

The first test is one of my favorites: the Kolmogorov-Smirnov test for difference in distribution. The Kolmogorov-Smirnov test is what I refer to as “the thing you never learn in elementary statistics which you really should have”. It is a very powerful tool. At its heart, the Kolmogorov-Smirnov test relies on the following facts:

  1. The empirical cumulative distribution function is easy to compute without any assumptions on the underlying distribution.
  2. The empirical cumulative distribution function completely determines the probability distribution.
  3. The convergence of the empirical cumulative distribution function to the actual cumulative distribution function is well-understood.

Using these three facts, one can compare the empirical distribution function of two distributions to obtain a meaningful statistic measuring to what extent the two distributions are the same. For us, we’ll use the Kolmogorov-Smirnov test to determine if underdog’s performance in the regular season is any different from their performance in the tournament. While this test won’t tell us if there is increased parity in March Madness, it will tell us whether or not the tournament and the regular season are at all different.

Second, we’ll use a test for means to determine whether or not underdogs perform better or worse in the tournament.

Part 1: Are the Tournament and Regular Season Different?

The first step is to compare how accurate our predictions were in the regular season and in the tournament. Our model is unbiased in the sense that it doesn’t change how it makes predictions in the tournament versus how it makes predictions in the regular season. We are looking to see if the distribution of errors in the tournament is different from the distribution of errors in the regular season. The figure below shows the shapes of these distributions of errors.

The difference between the tournament and regular season distributions is used to test for parity in March Madness

The first thing that strikes me about this Figure is how similar the curves are. The curves overlap significantly in both their location and in their spread. This suggests that the mean and the standard deviation of the distributions are similar. However, because of the large scale of this plot – it ranges from negative to positive sixty – it is very difficult to visually detect if there is a difference in the precise location of the curve.

A second very noticeable aspect of these curves is that the regular season curve is….spiky…around 0. This means that in the regular season, a high percentage of games were predicted with nearly 100% accuracy. The red curve corresponding to the tournament prediction errors is much closer to a normal distribution.

Determining whether these differences are statistically significant can be done using the Kolmogorov-Smirnov test for difference in distribution. Running this test on these two distributions results in a p-value of 0.003. At any reasonable confidence level, this is sufficient statistical evidence for the claim that tournament and regular season distributions are different.

To put that another way, something is different in how underdogs perform in the tournament and in the regular season. This is the first step on our way to studying whether there is increased parity in March Madness.

Part 2: Do Underdogs overperform in March Madness?

Now that we know that something is different between the regular season and the tournament, we want to determine if this difference is that underdogs consistently overperform. To see if the average underdog performance is different in the tournament and the regular season, we’ll use a t-test.

During the regular season, underdogs underperformed relative to their predicted by about 0.1 points. This number was obtained by comparing how much our model predicted the underdog to lose by with the number they actually lost by. This means that in the regular season the underdogs lost by 0.1 points more than we predicted on average. If underdogs do better in the tournament than in the regular season, we’re going to want to see that number be positive when we look at the tournament data.

In the tournament, underdogs underperformed relative to their predicted by about 0.3 points. That’s right: underdogs actually do worse in the tournament than they do in the regular season. Though the difference is not statistically significant, all this means is that there is absolutely no basis for the claim that underdogs tend to overperform in the tournament. That is, there is no increase in parity in March Madness.

Biggest Upsets in March Madness History

Just for fun, because I already had the data readily available I wanted to compute the biggest upsets in March Madness history. The methodology used was to look at the difference in actual margin of victory and predicted margin of victory and see which were the most disparate. However, doing this blindly results in a lot of the top entries being games that were expected to get close that got out of hand. For example, in 2000 UCLA was predicted to lose to Maryland by 2 but won by 35. To me, this doesn’t belong on the list of biggest upsets even though the underdog over-performed by 37 points.

So, the following list is determined by filtering only games where the underdog was supposed to lose by at least 15 points. The following table is the poster children of parity in March Madness, the biggest upsets in March Madness history.

Season

Winner

Loser

Predicted

Actual

Diff

2018

UMBC

Virginia

+22

-20

42

2016

MTSU

Mich St.

+18

-9

27

2012

Norfolk St.

Mizzou

+21

-2

23

1992

ETSU

Arizona

+16

-7

23

1986

Ark. LR

ND

+15

-7

22

An honorable mention goes to…ETSU again. In the 1989 tournament they were +23 to beat Oklahoma and ended up only losing by 1. If they had found a way to score just two more points, they would have claimed both the 3rd and 5th biggest tournament upsets in history (and just 3 years apart!)

To me, this list is fun on its own to just look at. But looking at this list in its entirety truly puts into perspective just how much of an outlier the UMBC win over Virginia upset was. That game wasn’t expected to be close and, well, it wasn’t. Converting UMBC’s 42 point over-performance to a probability using a normal distribution indicates that the probability of this event about is 0.000013 or 0.0013%. That’s close to a one in one hundred thousand performance. Absolutely nuts. Go retrievers.

Summary

The main takeaway from the analysis contained here is that March Madness isn’t any crazier than the regular season. That is, there is no increase in parity in March Madness. The perceived madness is mostly attributable to confirmation bias.