March Madness Machine Learning 2020
Not a single sporting event causes more debate, excitement, talk of ‘strength of schedule’, and interest in analytics as March Madness. Every budding data scientist or mathematician tries their hand at building a March Madness machine learning model. There is even a March Madness Machine Learning 2020 competition hosted by google every year where contestants don’t just pick a bracket, they have to estimate win probabilities for every game. People take this stuff seriously.
As we’ll see shortly, simply telling you ‘which teams are likely to win each individual matchup’ will not be that interesting nor that helpful in creating a good bracket. In fact, most machine learning models end up being pure chalk. So, instead of telling you which teams are likely to win each game, I will provide percentages of certain teams making the second round, sweet sixteen, etc. Then, you can use information like ‘Abilene Christian has a 20% chance at making the Sweet sixteen – highest of any 10+ seed’ to pick reasonable upsets. (Note: That number is actually true. Abilene Christian is my super sleeper pick).
First, though, let me explain my methods. Click here to skip ahead to my Cinderella picks.
My March Madness Machine Learning Model
The model I use in this article is not so different from my Bayes Ensemble method that ranks NBA teams. Long story short: I generate team ratings for everyone in the field using a mixture of observed game scores and recorded Las Vegas line data. Obviously, using games played this season give us a very good estimate of how good teams are relative to one another. But, why Vegas data? I wrote an article a few months ago about how Vegas lines are interpretable as ensemble learners and, therefore, are extremely accurate measures of how much better one team is than another. Mixing Vegas data and game scores should give us a very good (though, potentially, biased) way of determining the relative quality of teams.
What do these rankings tell us? Very simply: the difference between two teams’ ratings is my predicted margin of victory. Gonzaga is roughly a 26, Illinois a 21. Therefore, I would predict Gonzaga to beat Illinois by 5 points on a neutral court on average.
However, just using the ratings to predict who wins and who loses is boring. If I did that, all I would know is that the most likely final four is all one seeds and the most likely champion is Gonzaga. That doesn’t tell me much. Rather, I want to know how likely each of these things is. How likely is it that Abilene Christian makes the elite 8? How likely is it that Gonzaga wins?
One way to do this is to lookup a table that converts lines to winning probabilities. You can look at my piece performing a historical betting analysis on NFL data to see how one might do this using logistic regression.
A second possibility for converting predicted margin to winning probability is to use a normality assumption on the game score. For instance, I think that Gonzaga will beat Illinois by 5 on average. However, it is fairly reasonable to assume that Gonzaga’s actual margin of victory will resemble a normally distributed random variable with mean 5 and some standard deviation.
Without worrying too much about how, I determined a standard deviation of about 8.5 points fit the data best. This means that if a team is favored by 8.5 points, I can compute their win probability by querying the cumulative normal distribution for z=1. Using this, we can see that a line of 8.5 points means the favorite should win about 84% of the time, roughly matching what we see in the tables.
In exactly this way I can compute the probability of any tournament team beating anyone else in the field. How, then, can we get to things like ‘What is the probability that Abilene Christian makes the Sweet Sixteen’? We return to old faithful: Monte Carlo simulation.
I set up the bracket and simulated the tournament 40,000 times. For each game in each simulation, I picked the winner according to the probabilities computed from my rankings system. This way, I don’t pick Gonzaga to win in the first round every time. They actually lost in 4 of these 40,000 simulations. As Virginia taught us, this is a distinct possibility and we cannot simply discard it as unlikely.
Finally, I counted how many times each team advanced to each phase of the tournament. This is what will help us make our bracket picks.
Using These Results
Like I said, just picking the better team in every game is going to get you a bracket where you almost always just pick the higher seed. Nobody likes that guy in their bracket pool. Doing this is boring. If you’re anything like me, you have a visceral need to make ignorant picks and talk about how right you were for five years after. I still talk about my Loyola-Chicago call from a few years ago.
So, how can we use these computer results (which simply tell us ‘higher seeds are better’) to make a fun, reasonable bracket? It’s easy actually: let the model help you identify which teams are better/worse than their seed and which teams are more likely to advance further than their seed would suggest.
It is more likely than not that Texas will beat Abilene Christian in the first round. They are the better team. However, my model thinks Texas is way overrated and Abilene is underrated. Put all this together and we get Abilene with a 44% chance of beating Texas!
March Madness Machine Learning Team Rankings
Before moving on to discuss probabilities and Cinderellas, I think it is best to simply present the rankings and ratings of each team in the field. Remember, the numbers below are predicted margin of victory against a league average opponent on neutral court.
The number in each column represents the probability of a team making it to that particular round. R32 is the probability of making it to the round of 32, S16 is probability of advancing to sweet 16, etc.
Team | Rating | R32 | S16 | E8 | F4 | Final | Championship | Seed |
---|---|---|---|---|---|---|---|---|
Gonzaga | 26.47 | 0.9999 | 0.9528 | 0.8567 | 0.6812 | 0.5461 | 0.4278 | 1 |
Norfolk/App | -4.33 | 1e-04 | 0 | 0 | 0 | 0 | 0 | 16 |
Oklahoma | 12.84 | 0.5804 | 0.0335 | 0.013 | 0.0033 | 0.001 | 1e-04 | 8 |
Missouri | 11.05 | 0.4196 | 0.0137 | 0.004 | 0.001 | 1e-04 | 0 | 9 |
Creighton | 16.08 | 0.9023 | 0.5035 | 0.0636 | 0.0253 | 0.009 | 0.0026 | 5 |
UC Santa Barbara | 5.23 | 0.0977 | 0.0165 | 4e-04 | 1e-04 | 0 | 0 | 12 |
Virginia | 16 | 0.8723 | 0.4591 | 0.0619 | 0.0231 | 0.0088 | 0.003 | 4 |
Ohio | 6.43 | 0.1277 | 0.0209 | 4e-04 | 0 | 0 | 0 | 13 |
USC | 15.34 | 0.8071 | 0.4398 | 0.1423 | 0.023 | 0.0088 | 0.0028 | 6 |
Wichita St. | 8.02 | 0.1929 | 0.0473 | 0.0053 | 2e-04 | 0 | 0 | 11 |
Kansas | 15.24 | 0.9093 | 0.502 | 0.1504 | 0.0238 | 0.0087 | 0.0035 | 3 |
Eastern Washington | 3.73 | 0.0907 | 0.0109 | 5e-04 | 0 | 0 | 0 | 14 |
Oregon | 12.78 | 0.5644 | 0.1013 | 0.0403 | 0.0043 | 9e-04 | 3e-04 | 7 |
VA Commonwealth | 11.38 | 0.4356 | 0.0581 | 0.0201 | 0.0015 | 4e-04 | 1e-04 | 10 |
Iowa | 21.16 | 0.984 | 0.8386 | 0.6408 | 0.2132 | 0.1287 | 0.0754 | 2 |
Grand Canyon | 2.45 | 0.016 | 0.002 | 3e-04 | 0 | 0 | 0 | 15 |
Michigan | 21.39 | 0.9976 | 0.7455 | 0.5627 | 0.4137 | 0.1616 | 0.096 | 1 |
Texas Southern | -6.0 | 0.0024 | 1e-04 | 0 | 0 | 0 | 0 | 16 |
LSU | 13.42 | 0.3727 | 0.074 | 0.0324 | 0.0148 | 0.0024 | 7e-04 | 8 |
St. Bonaventure | 12.84 | 0.6273 | 0.1804 | 0.0998 | 0.0567 | 0.0114 | 0.0041 | 9 |
Colorado | 16.04 | 0.9103 | 0.5941 | 0.2084 | 0.1147 | 0.0266 | 0.0107 | 5 |
Georgetown | 9.17 | 0.0897 | 0.0179 | 9e-04 | 1e-04 | 0 | 0 | 12 |
Florida St. | 14.64 | 0.7422 | 0.3289 | 0.0893 | 0.0388 | 0.0057 | 0.0017 | 4 |
NC Greensboro | 4.41 | 0.2578 | 0.0591 | 0.0065 | 0.0014 | 1e-04 | 0 | 13 |
BYU | 12.69 | 0.7274 | 0.4035 | 0.1968 | 0.0678 | 0.0137 | 0.0039 | 6 |
Michigan St. | 10.88 | 0.2726 | 0.0904 | 0.0247 | 0.005 | 3e-04 | 0 | 11 |
Texas | 15.11 | 0.564 | 0.2982 | 0.1299 | 0.042 | 0.007 | 0.002 | 3 |
Abilene Christian | 6.35 | 0.436 | 0.2079 | 0.0787 | 0.0198 | 0.0024 | 0.0011 | 14 |
Connecticut | 14.94 | 0.4947 | 0.1179 | 0.0418 | 0.0067 | 6e-04 | 1e-04 | 7 |
Maryland | 13.23 | 0.5053 | 0.115 | 0.0397 | 0.0079 | 8e-04 | 2e-04 | 10 |
Alabama | 17.15 | 0.9827 | 0.7661 | 0.4882 | 0.2105 | 0.0549 | 0.0224 | 2 |
Iona | -1.02 | 0.0173 | 0.001 | 2e-04 | 1e-04 | 0 | 0 | 15 |
Baylor | 21.97 | 0.9509 | 0.7896 | 0.6099 | 0.4613 | 0.2641 | 0.1056 | 1 |
Hartford | -2.57 | 0.0491 | 0.0111 | 0.0025 | 3e-04 | 0 | 0 | 16 |
North Carolina | 14.6 | 0.5263 | 0.1113 | 0.0488 | 0.0209 | 0.0057 | 7e-04 | 8 |
Wisconsin | 17.29 | 0.4737 | 0.088 | 0.0373 | 0.0133 | 0.0024 | 4e-04 | 9 |
Villanova | 17.39 | 0.7894 | 0.4738 | 0.1659 | 0.09 | 0.034 | 0.0068 | 5 |
Winthrop | 5.96 | 0.2106 | 0.0646 | 0.0067 | 0.0014 | 1e-04 | 0 | 12 |
Purdue | 15.22 | 0.8863 | 0.4488 | 0.1279 | 0.0596 | 0.0198 | 0.0031 | 4 |
North Texas | 9.68 | 0.1137 | 0.0128 | 0.001 | 0 | 0 | 0 | 13 |
Texas Tech | 16.1 | 0.5814 | 0.2632 | 0.0894 | 0.0232 | 0.0054 | 8e-04 | 6 |
Utah St. | 10.91 | 0.4186 | 0.1525 | 0.0411 | 0.0091 | 0.0018 | 0 | 11 |
Arkansas | 15.48 | 0.8505 | 0.5421 | 0.2476 | 0.0832 | 0.0256 | 0.004 | 3 |
Colgate | 14.06 | 0.1495 | 0.0422 | 0.0048 | 2e-04 | 0 | 0 | 14 |
Florida | 11.69 | 0.5767 | 0.2318 | 0.1291 | 0.0412 | 0.0132 | 0.0033 | 7 |
Virginia Tech | 11.63 | 0.4233 | 0.1436 | 0.0724 | 0.0206 | 0.0051 | 6e-04 | 10 |
Ohio St. | 18.28 | 0.9848 | 0.6241 | 0.4156 | 0.1757 | 0.0697 | 0.017 | 2 |
Oral Roberts | 0.32 | 0.0152 | 5e-04 | 0 | 0 | 0 | 0 | 15 |
Illinois | 21.96 | 0.9898 | 0.817 | 0.6604 | 0.4685 | 0.3075 | 0.133 | 1 |
Drexel | 2.05 | 0.0102 | 7e-04 | 0 | 0 | 0 | 0 | 16 |
Loyola-Chicago | 14.68 | 0.654 | 0.1409 | 0.0716 | 0.0287 | 0.0089 | 0.0013 | 8 |
Georgia Tech | 11.43 | 0.346 | 0.0414 | 0.0154 | 0.0041 | 0.001 | 2e-04 | 9 |
Tennessee | 16.01 | 0.8754 | 0.6062 | 0.1839 | 0.0869 | 0.036 | 0.0081 | 5 |
Oregon St. | 6.06 | 0.1246 | 0.0328 | 0.0017 | 1e-04 | 0 | 0 | 12 |
Oklahoma St. | 12.65 | 0.84 | 0.3416 | 0.0659 | 0.0222 | 0.0058 | 8e-04 | 4 |
Liberty | 4.17 | 0.16 | 0.0194 | 0.0011 | 0 | 0 | 0 | 13 |
San Diego St. | 12.93 | 0.5769 | 0.2578 | 0.0735 | 0.0158 | 0.0042 | 5e-04 | 6 |
Syracuse | 11.19 | 0.4231 | 0.1561 | 0.0374 | 0.0064 | 9e-04 | 0 | 11 |
West Virginia | 14.53 | 0.9659 | 0.5836 | 0.2001 | 0.0559 | 0.0222 | 0.0038 | 3 |
Morehead St. | -1.06 | 0.0341 | 0.0025 | 1e-04 | 0 | 0 | 0 | 14 |
Clemson | 11.17 | 0.3657 | 0.0659 | 0.0284 | 0.0046 | 0.0016 | 0 | 7 |
Rutgers | 14.08 | 0.6343 | 0.1779 | 0.0928 | 0.0258 | 0.0092 | 0.0012 | 10 |
Houston | 19.25 | 0.9968 | 0.7562 | 0.5677 | 0.281 | 0.1558 | 0.0503 | 2 |
Cleveland St. | -3.12 | 0.0032 | 0 | 0 | 0 | 0 | 0 | 15 |
For the remainder of the article, I am going to split the field up into four sections: The longshots (seeds 13-16), the Cinderellas (seeds 9-12), the spoilers (seeds 5-8), and the favorites (seeds 1-4). For each group, I’ll highlight a few teams that are likely to over/under perform their seeds and make a nice run.
It is important to note that sorting teams by ‘Rating’ is only half the story. The other half of the story is quality of opponent. An upset alert takes only a mild mixture of an over-seeded favorite and an under-seeded underdog. In each of the sections below I’ll look at both ‘who the best teams are’ and ‘who is most likely to make a run’. The second takes into account strength of region and difficulty of matchup.
The Longshots
Abilene Christian is my team. They have the highest chance of making the sweet 16 that I ever remember seeing from a 14 seed. Not only is Abilene about 12-seed good, their potential opponents – Texas and BYU – are significantly over seeded. Disclaimer: I haven’t watched Abilene Christian play a single game, but that isn’t the point of my blog. I am not an eye-test guy. I am a ‘here is what the numbers are screaming to me’ guy. And the numbers, for whatever reason, think Abilene Christian is way better than a 14 seed.
Nobody else in this group seems to have much of a chance, I’ll let you figure out what is going on for yourself using the table below.
The number in each column represents the probability of a team making it to that particular round. R32 is the probability of making it to the round of 32, S16 is probability of advancing to sweet 16, etc.
Team | Rating | R32 | S16 | E8 | F4 | Final | Championships | Seed |
---|---|---|---|---|---|---|---|---|
Norfolk St. | -4.33 | 1e-04 | 0 | 0 | 0 | 0 | 0 | 16 |
Ohio | 6.43 | 0.1277 | 0.0209 | 4e-04 | 0 | 0 | 0 | 13 |
Eastern Washington | 3.73 | 0.0907 | 0.0109 | 5e-04 | 0 | 0 | 0 | 14 |
Grand Canyon | 2.45 | 0.016 | 0.002 | 3e-04 | 0 | 0 | 0 | 15 |
Texas Southern | -6 | 0.000 | 0 | 0 | 0 | 0 | 0 | 16 |
NC Greensboro | 4.41 | 0.2578 | 0.0591 | 0.0065 | 0.0014 | 1e-04 | 0 | 13 |
Abilene Christian | 6.35 | 0.436 | 0.2079 | 0.0787 | 0.0198 | 0.0024 | 0.0011 | 14 |
Iona | -1.02 | 0.0173 | 0.001 | 2e-04 | 1e-04 | 0 | 0 | 15 |
Hartford | -2.57 | 0.0491 | 0.0111 | 0.0025 | 3e-04 | 0 | 0 | 16 |
North Texas | 9.68 | 0.1137 | 0.0128 | 0.001 | 0 | 0 | 0 | 13 |
Colgate | 14.06 | 0.1495 | 0.0422 | 0.0048 | 2e-04 | 0 | 0 | 14 |
Oral Roberts | 0.32 | 0.0152 | 5e-04 | 0 | 0 | 0 | 0 | 15 |
Drexel | 2.05 | 0.0102 | 7e-04 | 0 | 0 | 0 | 0 | 16 |
Liberty | 4.17 | 0.16 | 0.0194 | 0.0011 | 0 | 0 | 0 | 13 |
Morehead St. | -1.06 | 0.0341 | 0.0025 | 1e-04 | 0 | 0 | 0 | 14 |
Cleveland St. | -3.12 | 0.0032 | 0 | 0 | 0 | 0 | 0 | 15 |
The Cinderellas
This is typically the most fun group to play with. Want to send a 9 seed to the elite 8? It can happen. A 12 seed makes the sweet 16? Almost every year. This is the group that has the memorable runs deep into the tournament where they have no business being. Those types of events are extremely hard to predict. However, what I can do is help you understand which teams are probably better than their seed indicates so you can make your own picks.
The number in each column represents the probability of a team making it to that particular round. R32 is the probability of making it to the round of 32, S16 is probability of advancing to sweet 16, etc.
Team | Rating | R32 | S16 | E8 | F4 | Final | Championships | Seed |
---|---|---|---|---|---|---|---|---|
Missouri | 11.05 | 0.4196 | 0.0137 | 0.004 | 0.001 | 1e-04 | 0 | 9 |
UC Santa Barbara | 5.23 | 0.0977 | 0.0165 | 4e-04 | 1e-04 | 0 | 0 | 12 |
Drake | 8.02 | 0.1929 | 0.0473 | 0.0053 | 2e-04 | 0 | 0 | 11 |
VA Commonwealth | 11.38 | 0.4356 | 0.0581 | 0.0201 | 0.0015 | 4e-04 | 1e-04 | 10 |
St. Bonaventure | 12.84 | 0.6273 | 0.1804 | 0.0998 | 0.0567 | 0.0114 | 0.0041 | 9 |
Georgetown | 9.17 | 0.0897 | 0.0179 | 9e-04 | 1e-04 | 0 | 0 | 12 |
MSU/UCLA | 10.88 | 0.2726 | 0.0904 | 0.0247 | 0.005 | 3e-04 | 0 | 11 |
Maryland | 13.23 | 0.5053 | 0.115 | 0.0397 | 0.0079 | 8e-04 | 2e-04 | 10 |
Wisconsin | 17.29 | 0.4737 | 0.088 | 0.0373 | 0.0133 | 0.0024 | 4e-04 | 9 |
Winthrop | 5.96 | 0.2106 | 0.0646 | 0.0067 | 0.0014 | 1e-04 | 0 | 12 |
Utah St. | 10.91 | 0.4186 | 0.1525 | 0.0411 | 0.0091 | 0.0018 | 0 | 11 |
Virginia Tech | 11.63 | 0.4233 | 0.1436 | 0.0724 | 0.0206 | 0.0051 | 6e-04 | 10 |
Georgia Tech | 11.43 | 0.346 | 0.0414 | 0.0154 | 0.0041 | 0.001 | 2e-04 | 9 |
Oregon St. | 6.06 | 0.1246 | 0.0328 | 0.0017 | 1e-04 | 0 | 0 | 12 |
Syracuse | 11.19 | 0.4231 | 0.1561 | 0.0374 | 0.0064 | 9e-04 | 0 | 11 |
Rutgers | 14.08 | 0.6343 | 0.1779 | 0.0928 | 0.0258 | 0.0092 | 0.0012 | 10 |
The Spoilers
The best of this group looks to be Villanova with a bunch of other teams close behind. However, in this range, matchups are everything. It seems like the surest bets to make the sweet 16 are Tennessee and Colorado. If you want to pick a team from this group to make a final four appearance, I would suggest Colorado. They have by far the easiest route. No huge upsets lurking in this group. San Diego State is the most likely of the <6 seed teams to get bounced in the first round, but they are still favored. Connecticut is the best relative to their seed. BYU is the worst relative to their seed.
The number in each column represents the probability of a team making it to that particular round. R32 is the probability of making it to the round of 32, S16 is probability of advancing to sweet 16, etc.
Team | Rating | R32 | S16 | E8 | F4 | Final | Championships | Seed |
---|---|---|---|---|---|---|---|---|
Oklahoma | 12.84 | 0.5804 | 0.0335 | 0.013 | 0.0033 | 0.001 | 1e-04 | 8 |
Creighton | 16.08 | 0.9023 | 0.5035 | 0.0636 | 0.0253 | 0.009 | 0.0026 | 5 |
USC | 15.34 | 0.8071 | 0.4398 | 0.1423 | 0.023 | 0.0088 | 0.0028 | 6 |
Oregon | 12.78 | 0.5644 | 0.1013 | 0.0403 | 0.0043 | 9e-04 | 3e-04 | 7 |
LSU | 13.42 | 0.3727 | 0.074 | 0.0324 | 0.0148 | 0.0024 | 7e-04 | 8 |
Colorado | 16.04 | 0.9103 | 0.5941 | 0.2084 | 0.1147 | 0.0266 | 0.0107 | 5 |
BYU | 12.69 | 0.7274 | 0.4035 | 0.1968 | 0.0678 | 0.0137 | 0.0039 | 6 |
Connecticut | 14.94 | 0.4947 | 0.1179 | 0.0418 | 0.0067 | 6e-04 | 1e-04 | 7 |
North Carolina | 14.6 | 0.5263 | 0.1113 | 0.0488 | 0.0209 | 0.0057 | 7e-04 | 8 |
Villanova | 17.39 | 0.7894 | 0.4738 | 0.1659 | 0.09 | 0.034 | 0.0068 | 5 |
Texas Tech | 16.1 | 0.5814 | 0.2632 | 0.0894 | 0.0232 | 0.0054 | 8e-04 | 6 |
Florida | 11.69 | 0.5767 | 0.2318 | 0.1291 | 0.0412 | 0.0132 | 0.0033 | 7 |
Loyola-Chicago | 14.68 | 0.654 | 0.1409 | 0.0716 | 0.0287 | 0.0089 | 0.0013 | 8 |
Tennessee | 16.01 | 0.8754 | 0.6062 | 0.1839 | 0.0869 | 0.036 | 0.0081 | 5 |
San Diego St. | 12.93 | 0.5769 | 0.2578 | 0.0735 | 0.0158 | 0.0042 | 5e-04 | 6 |
Clemson | 11.17 | 0.3657 | 0.0659 | 0.0284 | 0.0046 | 0.0016 | 0 | 7 |
The Favorites
I need to start by talking about Gonzaga. Gonzaga is a heavy, heavy favorite this year. I ran a similar (but simpler) analysis last year to simulate the missing march madness and the most likely champion only won something like 20% of the times. Gonzaga is more than twice as likely to win this year than a normal ‘first overall seed’. Gonzaga wins over FORTY percent of my simulations. This is absurd. Its boring, its blasé, its vanilla, but I am taking Gonzaga to win every bracket I enter. My model favors Gonzaga by about 5 points over the next best team. This translates to somewhere between a 70-75% chance of beating the second best team in the country on any given night. I am not the only numbers junkie to favor Gonzaga this highly, KenPom agrees.
This group is pretty much what you would expect the top 4 to look like. My model thinks Virginia is closer to a high 2 than a 4 seed. My model thinks Texas and WVU should be 4’s not 3’s. Small things like that. However, if you sort the table below by Sweet 16 chances, we see some interesting things.
It is more likely than not that Texas, Florida St. OK St., Purdue, and Virginia get bounced before the sweet 16. Texas in particular (shoutout Abilene Christian) looks to be in danger. I found only about a 56% chance that Texas makes it out of the first round.
The number in each column represents the probability of a team making it to that particular round. R32 is the probability of making it to the round of 32, S16 is probability of advancing to sweet 16, etc.
Team | Rating | R32 | S16 | E8 | F4 | Final | Championships | Seeds |
---|---|---|---|---|---|---|---|---|
Gonzaga | 26.47 | 0.9999 | 0.9528 | 0.8567 | 0.6812 | 0.5461 | 0.4278 | 1 |
Virginia | 16 | 0.8723 | 0.4591 | 0.0619 | 0.0231 | 0.0088 | 0.003 | 4 |
Kansas | 15.24 | 0.9093 | 0.502 | 0.1504 | 0.0238 | 0.0087 | 0.0035 | 3 |
Iowa | 21.16 | 0.984 | 0.8386 | 0.6408 | 0.2132 | 0.1287 | 0.0754 | 2 |
Michigan | 21.39 | 0.9976 | 0.7455 | 0.5627 | 0.4137 | 0.1616 | 0.096 | 1 |
Florida St. | 14.64 | 0.7422 | 0.3289 | 0.0893 | 0.0388 | 0.0057 | 0.0017 | 4 |
Texas | 15.11 | 0.564 | 0.2982 | 0.1299 | 0.042 | 0.007 | 0.002 | 3 |
Alabama | 17.15 | 0.9827 | 0.7661 | 0.4882 | 0.2105 | 0.0549 | 0.0224 | 2 |
Baylor | 21.97 | 0.9509 | 0.7896 | 0.6099 | 0.4613 | 0.2641 | 0.1056 | 1 |
Purdue | 15.22 | 0.8863 | 0.4488 | 0.1279 | 0.0596 | 0.0198 | 0.0031 | 4 |
Arkansas | 15.48 | 0.8505 | 0.5421 | 0.2476 | 0.0832 | 0.0256 | 0.004 | 3 |
Ohio St. | 18.28 | 0.9848 | 0.6241 | 0.4156 | 0.1757 | 0.0697 | 0.017 | 2 |
Illinois | 21.96 | 0.9898 | 0.817 | 0.6604 | 0.4685 | 0.3075 | 0.133 | 1 |
Oklahoma St. | 12.65 | 0.84 | 0.3416 | 0.0659 | 0.0222 | 0.0058 | 8e-04 | 4 |
West Virginia | 14.53 | 0.9659 | 0.5836 | 0.2001 | 0.0559 | 0.0222 | 0.0038 | 3 |
Houston | 19.25 | 0.9968 | 0.7562 | 0.5677 | 0.281 | 0.1558 | 0.0503 | 2 |
My Machine Learning March Madness Cinderella Picks
I am planting my flag on Abilene Christian. First of all, my model thinks that Abilene is better than their seed and Texas is worse than their seed. Even better, the potential second round matchup – BYU – is also bad for their seed. Normally a 14 seed has to beat a 3 and a 6 to make the sweet 16. Tall task. However, my ratings think Abilene is 12-seed capable. Moreover, Texas is a 5-6 seed quality and BYU is a 9 seed quality. In these terms, this Cinderella story seems much closer to possibility.
Will it happen? Probably not. The odds still say it is more likely than not that Abilene Christian loses in the first round and none of this happens. But this is March Madness, weirder things have happened.
The other 10+ seeds I think have a very good chance of making the sweet 16 and beyond are Rutgers, Syracuse, Utah State, and Virginia Tech. My model thinks that each of these teams a 14% or greater chance of making the sweet 16. I would be willing to bet a good sum that at least one – and maybe 2 – from this group make it that far.
Other than that, I will let you draw your own conclusions from the bounty of numbers I’ve provided. I’m picking Gonzaga to win, and I think you should too, but I’ll leave it up to you. Use these numbers as a guide, as a suggestion, on which scenarios are most likely. At the end of the day, though, remember this: there is absolutely no way to find clarity in the madness.