Super Bowl Squares Probability Charts by Quarter: Which are Best?
For those of us who wish the general public to better appreciate mathematics, there are few opportunities that present a better setting than large, commonly held, well-understood sporting events. It is easy to motivate a heavy handed use of math when the application is March Madness pools, or fantasy football, or – as we’ll see in this article – a super bowl squares pool. We ask: What are the best squares for a super bowl squares pool? How do the best squares evolve from quarter to quarter?
To do this, I scraped the quarter-by-quarter score of a few thousand games of the 1500 most recent football games from pro football reference in order to use in this analysis. Then, making just a few assumptions, we are able to compute the probabilities of each square hitting in the upcoming super bowl. To see the specific probabilities, feel free to skip ahead to the super bowl squares probabilities for the 1st quarter, 2nd quarter, 3rd quarter, or end of game below.
What is Super Bowl Squares?
‘Super Bowl Squares’ is a game where people enter into a pool and are assigned a pair of two single-digit numbers like “7/6”, or “3/3”, or “1/2” etc. These numbers represent the one’s digit of the scores for the home team and the away team. For example, if the home team is winning 12-10 at the end of the first quarter, whoever has the number pairing “2/0” wins a portion of the buy-in pot. Awards are paid out at the end of the first, second, and third quarters as well as at the end of the game. Usually, the board looks something like this:
General consensus amongst football fans is that squares with combinations of 0’s, 3’s, and 7’s are the best for a super bowl squares pool. And, in this case, the general consensus is right. But, how much better are these squares than others? And, within the set of 9 permutations of 0, 3, and 7, which squares are better than others?
Frequency of Numbers in Super Bowl Squares
To determine the frequency of each square, I scraped (using the excellent BeautifulSoup package in Python) game scores at the end of each quarter for every regular season and playoff game since 2015. That data can be found on our Data Page. Other analyses of super bowl squares have only used playoff games or actual super bowl data to determine the best squares. However, other than being a matchup between the two best teams, the super bowl is not particularly different from any other game. So, using regular season data as well as playoff data simply expands our data set, increasing accuracy.
For the first graphic below, I looked at the scores at the end of the first quarter for all the games in our data set. Then, I counted the frequency that each square in the super bowl squares grid would have won. For instance, I found that in a whopping 16.5% of games since 2015, both teams’ scores to end the first quarter ended in a 0. This means the “0/0” cell would have hit 16.5% of times. This is reasonably strong evidence that in the super bowl, there should be about a 16.5% chance of the “0/0” cell winning the first quarter pool.
One last thing: The following images are colorized to indicate those squares that are better/worse than the others. Two shades of red and green are chosen to indicate those squares that are slightly better/worse than average and those that are significantly better/worse than average. The rows represent the home team’s score’s final digit and the column represents the away team’s score’s final digit. Here is what we found at the end of the first quarter.
The main conclusion from this table is that, at the end of the first quarter, there are only a handful of squares that have even a puncher’s chance of winning the super bowls squares pot. The squares that contain only combinations of 0’s, 3’s, and 7’s make up more than 75% of the outcomes at the end of the first quarter. To put that another way, if you have a square that has something other than a 0, 3, or 7 in either spot, you are fighting with 91 other ‘bad’ squares for only 25% of the winning chances.
Now, let’s take a look at the second quarter.
Here again, the same squares give you an edge, but if you have a different ending digit other than 0, 3, or 7, you now have at least a chance of winning.
The end of the third quarter distribution is perhaps a surprising anomaly. The second quarter’s distribution is almost as heavily centered on 0’s, 3’s, and 7’s as the first quarter, but is way more centered on these outcomes than the second quarter. My personal gut instinct before doing this analysis was: As the game goes on, the probabilities should become more uniform across all numbers. While that trend started out true – the second quarter was more uniform than the first quarter – the trend seems to have reversed here. At the end of the third quarter, combinations of 0’s, 3’s, and 7’s win the game for you almost 72% of the time.
Finally, here is the ‘end of game’ score probability distribution.
Just like that, the game is much fairer. Certainly, having 0’s, 3’s, and 7’s is still an advantage for the final score payout, but it is not even close to as dominant as the other quarters. Those 9 squares combine to make up 21% of the wins for the end-of-game pool in super bowl squares. This is above average – if there were 100 squares and all were equally likely, you would expect 9 squares to account for 9% of the wins.
One of the more interesting things one can observe is that something interesting happens on the diagonal. The square for which this is the most obvious is the “0/0” square. In the first three quarters, “0/0” dominates. However, in our last picture, “0/0” accounts for only 2% of the wins. The sum of the diagonal probabilities in the four quarters are:
- First Quarter: 28%
- Second Quarter: 20.7%
- Third Quarter: 25.7%
- End of Game: 8.2%
Interestingly, this phenomenon has a very easy explanation. Ready? Ties are very rare in the NFL. “0/0” in the first three quarters either indicates a tie or one team being up by 10, 20, etc. However, in the fourth quarter “0/0” only indicates a team winning by being up by 10, 20, etc. because ties are overwhelmingly rare.
In the next two sections, we will deviate from our simple frequency analysis for a moment to test whether a version of this idea – that the diagonal entries are significantly under-represented in the final board – is more likely to be true or is more likely just due to random sampling error.
Statistical Tests
Introductory statistics is typically broken down into three parts: descriptive statistics, probability, and inferential statistics. Descriptive statistics is summarizing, visualizing, and explaining data. Probability is the study of measuring certainty of things happening. Inferential Statistics is the branch of statistics devoted to making predictions and testing hypotheses. We’ll use an idea from inferential statistics to test the diagonal of the super bowl squares board.
Statistical tests generally have two steps. First, you form a hypothesis about a property that your data has or must satisfy. Then, you boil your data down to a single number (called a p-value) that helps us determine whether our hypothesis was incorrect.
Truly, the second step is what requires some deep knowledge about statistics including my all-time favorite mathematical statement – the Central Limit Theorem. However, for the purposes of study in this article, we don’t need to know the statistics, we only need study the conclusions.
The Chi-Squared Test for Independence
We shall highlight one particularly useful test in this section. This test is called the chi-squared test (sometimes with ‘for independence’ appended as a suffix). This test makes use of the chi squared distribution, the concept of independent events, and yes as I proffered above the Central Limit Theorem makes an appearance. I’ll skip a discussion of the chi-squared and central limit theorem, but I would like to talk at greater length about independence of events.
Events in statistics are nothing more than ‘things than can happen’. If I roll a dice, some example events are ‘a 1 shows up’, ‘an even number shows up’, or ‘a 5 or a 6 show up’. Events are the things which we can assign probabilities to.
When two events are independent, we can think of them as having no effect on one another. The setting for the quintessential example of independence is rolling two dice. The events ‘the first die shows a 2’ and ‘the second die shows a 3’ are independent. Why? If you told me that the first die showed a 2, I wouldn’t think it any more or less likely that the second die shows a 3 than I did before. That is, information from the first dice doesn’t impart any information about what happens on the second dice – and vice versa! We are going to use ideas of independence to investigate the super bowl squares board a little bit more robustly.
Above, we discussed that at the end of the game, the diagonal squares were much less likely to win. Even though 0 was an extremely common number for a team’s score to end in, the “0/0” square was less likely than we might expect. My hypothesis (for those in the know, this is the alternate hypothesis of my test) is that the number the home team’s score ends in is not independent of the number the away team’s score ends in. If I know the home team’s score ends in a 0, my certainty that the away team’s score ends in a 0 is much lower than it would otherwise be. This would prove in a statistical sense that the diagonal is less likely at the end of the game than in the first three quarters.
I ran the chi-squared test for independence of the away and home team’s scores at the end of the game and achieved a p-value of roughly 2×10-16. The correct translation for this p-value is ‘If the columns and rows actually were independent, the probability that our data just randomly happens to make the diagonal this deficient is 0.0000000000000002’. When making conclusions, we weigh two possibilities to determine which is more likely:
- That the diagonal is not deficient and we simply happened to observe an event which is roughly as equal as winning two consecutive lotteries on two consecutive tickets, or
- That the diagonal is, in fact, deficient
Though the data can never prove beyond any doubt that the diagonal is deficient – it is always possible that our sample is really, really weird – it is easier to believe the second bullet above. Let’s go back to constructing our best guess of the super bowl squares probabilities.
Symmetry of the Super Bowl Squares Probabilities
In the regular season and all rounds of the playoffs save the super bowl, there is a fairly distinct difference between the home team and the away team. Home teams are more likely to win and, so, to score more points. Therefore, it is actually not particularly reasonable to expect the probabilities in our super bowl squares matrix to be symmetric. What do I mean? Because home and away are noticeably distinguishable, I wouldn’t a priori expect the probabilities for “3/0” and “0/3” to be the same.
In the super bowl, though, there is no such thing as home field advantage. (I want to point out that I wrote this copy before it was certain that Tampa Bay will in fact be playing the first ever home Super Bowl. However, because of the composition of fans not resembling a home crowd and because teams don’t have to endure travel lag because of the week-long break, I still posit that nobody has a home field advantage this year). Therefore, we would expect symmetry. We would expect “3/0” and “0/3” to be equally likely. So, if we want to estimate the probability that an individual square will win in the super bowl squares board, we should force the board to be symmetric. The following tables will be highly reminiscent of the tables I showed during the ‘Frequency of Numbers in Super Bowl Squares’ section. In fact, they will be exactly the same, but updated to be symmetric. I will forego any discussion in the following sections and just let the images speak for themselves.
First Quarter Super Bowl Squares Chart
Second Quarter Super Bowl Squares Chart
Third Quarter Super Bowl Squares Chart
Fourth Quarter Super Bowl Squares Chart
One Last Point
I’ll offer just one last thought. If somebody was in the business of actually predicting which squares would come up, it would be beneficial to analyze the two teams who are actually participating. Certainly, knowledge of the matchup and the two teams’ offensive and defensive capabilities could change which squares we think are likely to occur. For instance, if the Super Bowl matchup was Rams-Bears, I would think it extremely likely that the ‘3/0’ or ‘0/3’ square would come up. I leave this question open for others to study.
Very interesting- looking forward to the big game next week.