AUC & ROC Curves Explained

In machine learning and predictive statistics, the name of the game is maximizing prediction accuracy (true positives/negatives) while minimizing error (false positives/negatives). Both ROC Curves and the idea of AUC (Area under the ROC curve) are helpful tools in achieving this goal.

Both ROC curves and AUC deal with classification problems. Classification problems are like trying to predict who will be the MVP next year or predicting what teams make the playoffs. All classification problems boil down to putting teams or players into categories. The predictions of these classifiers can be definitively marked right/wrong. In building a classifier, the goal is to be right as often as possible.

In this article we’ll explain both of these commonly used tools and show an application to sports analytics.

Classification Accuracy

In their simplest form classifiers simply predict true or false for some object. For example, we could

  1. Train a classifier to predict which baseball players will hit over .300 next year. True means their average is >.300 and false means otherwise.
  2. Train a classifier to predict whether or not a team will make the playoffs next year. True means they make the playoffs, false means they don’t.
  3. Train a classifier to predict whether or not a team will win in a given game. True means they win, false means they lose.

The goodness of a classifier is determined by its accuracy. Unfortunately, something as simple and seemingly straightforward as accuracy can be difficult to define. Here is an example.

Suppose I’ve built a classification model to predict whether or not a college basketball team will win March Madness next year. However, I want to go to bed early so I just make a very simply model that predicts FALSE for each team. That is, no matter which team you input to the model, it predicts that team won’t win the tournament.

Seems like a pretty bad classifier right? Well, for 361 out of 362 teams, our model is correct! This means our model is 99.7% accurate. This 99.7% number though is misleading as to the true quality of our classifier.

The point is, measuring the accuracy of a classifier is a subtle art. ROC curves and the AUC statistic help to do this.

True Positives and False Positives

Before diving into ROC curves, we first need to understand true/false positives and negatives. Take a look at the graphic below:

true positives, false positives, true negatives, and false negatives

ROC Curves look at how true positives relate to false positives. The more true positives you get with fewer false positives, the better your classifier is. A perfect classifier stays totally in the green regions above.

Two numbers are important in understanding ROC curves:

  1. The true positive rate is the percentage of things whose real label is true that are predicted to be true,
  2. The false positive rate is the percentage of things whose real label is false that are predicted to be true.

Looking back at our sports example, the true positive rate would be the percentage of teams that actually became champions that we predicted to be champions. For our classifier which always predicts “no”, this is 0%! The false positive rate is the percentage of losing teams that our model predicts would win – again 0% for us.

Comparing the values of true positive rate and false positive rate is the basis for ROC curves.

The ROC Curve

ROC stands for receiver operating characteristic. The name is reflective of the original application of these curves in understanding how old radar systems had to trade off detection ability for false alarms.

The ROC curve is a plot of true positive rate on the y-axis and false positive rate on the x-axis. Typically, the false positive percentage is a selectable parameter in building your model. That is, you can tell your model to allow for a certain number of false positives. The number of true positives typically goes up as the number of false positives does. The ROC curve shows us how.

On the same graph, often the line y=x is also drawn (more on this later!). An example of this is shown in the graphic below.

An example ROC Curve with random classifier reference

The dotted y=x line above is a helpful visual reference. If you build a classifier that randomly guesses true or false, then your ROC curve will be precisely the line y=x. This random classifier serves as a good baseline against which to evaluate performance.

A ROC curve for a good classifier is one which lies above the y=x reference. This is because a good classifier maximizes the number of correct predictions while minimizing the number of incorrect ones. The further above the y=x line the red ROC curve lies, the better the classifier is. Notice that the red curve above gets more things right for the same number of false positives.

One way to measure how far above the line the ROC curve is is with the idea of “Area Under the ROC Curve”, commonly abbreviated AUC.

AUC

Consider the two ROC curves shown below

Two different ROC Curves

Clearly, in the above graphic, the red ROC curve corresponds to a better classifier than the orange ROC curve. No matter what false positive percentage is selected, the red curve is above the orange curve so red is the better classifier. However, what about the more complicated example below?

The AUC metric helps compare different ROC Curves

Which classifier would you pick? One way to pick is that if our false positive percentage is “fixed”, i.e. is dictated to us, then pick the curve which is on top at that x-value. However, an alternative is to use AUC.

AUC for ROC curves is an attempt to wrap up the entire information contained in the curve into a single measure of “how good the classifier is”. Curves which are pushed further up on this scale also have the property that there is more area between them and the x-axis.

As a result, computing the area under the ROC curve is a quantitative measure of the quality of a classifier. Even better, computing AUC is a very easy math problem. If the curve is known in a closed analytic form, AUC can be computed via an integral.

However, ROC curves will rarely have closed forms because they are generated experimentally. Therefore, discrete (Riemann sum) estimates of the integral giving AUC can be substituted. Either way, AUC is an overall measure of statistical quality.

Let’s see how this all works with an example about ROC curves in the NBA.

ROC Curve in the NBA

All the code used in this section is contained in our ROC AUC Github repository in a Jupyter notebook. The data used is the same as in our XGBoost tutorial and a similar analysis could be done for those classifiers. But for now, something simpler.

To show an example of how ROC curves work and how to compute AUC, we are going to look at NBA data. We are going to try to predict whether or not a team will win at least one playoff game. To do this, we’ll use their scoring and rebounding numbers from the regular season.

First, we made the ROC curve for points scored. To generate this plot, we varied the prediction threshold (if points scored > ____ , then the team will win a playoff game) and computed the true positive rate and false positive rate.

We did the same thing for predicting whether or not a team would win a playoff game based on how many rebounds they secured.

Both of these curves are firmly above the “random classifier line”. This means that both points and rebounds are valuable metrics in determining how likely a team is to win a playoff game.

But which is more valuable? This is where AUC comes in.

To compute AUC for the blue and the red ROC curves, we use Riemann sum estimation. In particular, we use the trapezoidal rule. After taking into account that the points are not uniformly spaced along the x-axis, the calculation is very straightforward. We found:

  • AUC for the POINTS classifier was 0.626
  • AUC for the REBOUNDS classifier was 0.608

This means that the points-based classifier is ever so slightly better than the rebounds-based classifier. This very roughly translates to points being a better predictor of overall team quality than rebounds.

Before finishing, we want to share some further thoughts and caveats related to ROC curves and AUC.

AUC and ROC Final Thought

Often, building a classifier is a tradeoff between true positives and false positives. We can always make our model “find more true positives” by making the classifier more aggressive. This process, unfortunately, also leads to more false positives. The ROC curve helps visualize and measure this effect.

AUC is largely used in the ML community to compare which of two different models is superior. And while this is a reasonable approach and has certain statistical justifications, it is not always the end of the story.

For example, some classification systems are designed with a maximum allowable false alarm rate. In such cases, comparing two classifiers reduces to only looking at the value of the ROC curve at this false positive rate. Whichever curve performs better at this point is the better classifier for your application. That is, the whole curve doesn’t always need to be taken into account.

Finally, we want to comment on curves that dip below the dotted y=x line. These types of classifier are curious because while they might “appear” bad, they are actually not. Think of things this way: If you’re trying to pick between 2 labels for an object and you are wrong 100% of the time, this is just as good as being right 100% of the time.

If we’re wrong and we’re consistently wrong, then a really good classifier can be made by just picking the opposite of what we say. Therefore, curves that go below the y=x line can often be “reflected” (by taking an inverse) and a better classifier can be generated.