Page Rank Sports Ranking System and graph theory in Sports
Development of a page rank sports ranking system is not a new idea. In fact, it gets to the heart of a central question in sports. If team A beats team B and team B beats team C, should we expect team A to beat team C? Or is there something in sports that can causes cycles of dominance.
We’re describing the situation shown in the picture below.
Sports are complicated and cycles like A>B, B>C, C>A happen all the time. You can’t avoid it. The page rank sports ranking system tries to embrace this fact rather than run from it. The page rank sports method is just one example of using an entire field of math called graph theory in sports. At its core, page rank is a classy way of handling the strength of schedule problem in sports analytics.
In this article we take a quick look at the world of graph theory, describe the page rank algorithm and how you use it every day without realizing it, and talk about extending this idea to the page rank sports algorithm.
What is Graph Theory?
Graph theory is one of the first math topics you’ll branch into once you move past calculus and algebra. This field is all about studying objects and connections between them. An example is shown below.
Here, A is connected (say “adjacent to”) B, C, and D but is not adjacent to F and E. A graph can contain as many bubbles (we call them “vertices”) and connections (we call them “edges”) as we want. While this feels very abstract, graph theory in sports has natural applications.
The simplest example of using graph theory in sports is to think of the vertices as teams and the edges to represent the information that one team played another. This lets us visualize the league’s schedule. We can use directed edges by replacing lines with arrows to represent that one team beat another. See the example below.
An arrow from B to A represents that team A beat team B. Counting the number of arrows in and out of a vertex tells us a team’s record. For example, both team C and team D have 2 arrows in and out; both of their records are 2-2.
However, C is clearly better than D. First of all, C beat D head to head. And, one of D’s wins is against the 0-2 team E. The exercise of ranking the teams is fairly straightforward when there are only 6 options. It gets much harder when there are dozens or hundreds of teams.
We have to look for patterns of arrows pointing in the same direction, what we might call flow in the graph. This flow analogy is actually apt; if you think of arrows as pipes carrying water in the direction of the arrow, then we can envision the best team as the one where all the water comes out.
It turns out that graph theorists have studied these types of algorithms for decades and centuries and know how to do exactly the type of analysis that we want to do in generating sports rankings. Applying graph theory in sports is a natural thing to do. Let’s look at one specific example and talk about the page rank sports algorithm.
What is Page Rank?
I’m probably bugging a lot of the more technical people by calling it the page rank sports algorithm. Page rank was not invented for sports analytics. It isn’t called the page rank sports algorithm, it is just called “PageRank”. And, it turns out, it is one of the most important algorithms in modern life.
Page rank is based on some very old ideas from the 1800s. However, page rank as we know it was invented by Larry Page and Sergey Brin in the early 2000s as a way to rank websites by popularity and authority. If these names are familiar to you, they should be. These two are the founders of Google.
The page rank algorithm models the internet using graph theory like we described in the last section. Every page is a vertex in the graph and the edges are determined by links. If one page links to another, it is seen as a vote of confidence that the other page is valuable.
Those pages with the most inbound links are seen as the best, the most authoritative. One of the key ideas in page rank is the idea that if a high authority page links to you, that is seen as more valuable than if a low authority page does.
To say that again, links from good websites are worth more than links from bad websites. The magic of the algorithm is we don’t know which websites are good (and therefore which links are worth a lot) until the algorithm is run. The algorithm has to know where it ends before it even starts.
The idea of valuing links from good pages more than links from bad pages ports over directly to sports analytics. Try to figure it out yourself before I explain it explicitly in the next section.
The Page Rank Sports Algorithm
Page rank basically has three steps, though 2 and 3 happen concurrently:
- Build a graph of link relations between pages
- Determine which links are worth more based on the quality of the website containing the link
- Combine the number of links and link quality to rank internet pages.
Think about this in relation to the strength of schedule problem in sports. We want wins that come against good teams to count more than wins against bad teams. Adapting page rank to sports is as simple as:
- Build a graph of wins between teams
- Determine which wins are worth more based on the quality of the team that was beaten
- Combine the number of wins and win quality to rank sports teams
By replacing “links” with “wins” and “internet pages” with “sports teams”, we get an algorithm well designed for the sports world. When you know the details of the algorithms, you can’t help but apply page rank to sports.
Examples of PageRank in Sports Rankings
If you do a google search for page rank sports applications, you won’t have to look too hard to find that people have been doing this for a while.
Lets take a look at how some others have done this in the past. I found this GitHub page which implements the page rank sports algorithm and makes code available for you to play with. It is also helpful to look at implementation details (which I will discuss briefly later).
There is also a story on Towards Data Science talking about applying pagerank to college basketball. College basketball is a very good example of when the page rank sports algorithm has a chance to shine.
College basketball, like the internet, contains lots of things to rank and sometimes not a lot of connections between them. You won’t see lots of links between internet niches like sports and fashion just like you won’t see lots of games between ACC and Mountain West teams. I expect page rank to work well in college basketball.
I will provide more details on how to use page rank in sports a bit later on, but now I want to return to a bit more discussion on graph theory in sports more broadly.
Other Applications of Graph Theory in Sports
Sports analysts don’t only talk about ranking teams and graph theorists don’t only talk about page rank. There are other concepts that sports enthusiasts can borrow from their more mathematical brethren. I’ve got a few examples.
- Making a league schedule can be done by making a graph of which teams need to play each other. Then, creating a schedule is known as finding a “perfect matching decomposition” of the graph. If we’re allowed BYE weeks, then the schedule is just a matching decomposition.
- At the beginning of a college football season, it can be hard to determine relative strength of teams. This is because there isn’t always an “A played B, B played C, C played …” relation between every two possible teams. This type of relation is key to figuring out relative strength by direct comparisons. In graph theory, talking about whether or not these types of relations exist between every team is what graph theorists call “connectivity” of a graph. If this relationship or path between two teams doesn’t exist, the graph is called disconnected.
- A “Circle of Suck” is a circle of teams where each one beats the other. For example, A beats B, B beats C, and C beats A. Finding these things by hand can be hard. In graph theory, this is called a Hamiltonian cycle.
No matter what you do, graph theory in sports is a hugely valuable tool that can be used to solve tons of problems.
More Detail on Page Rank in Sports
Finally, I want to provide a few more details on how to implement page rank in sports. This will be by far the most technical of the sections which is why I ordered it last.
The page rank sports algorithm starts by creating a vertex for each team we wish to rank. The original page rank algorithm uses weighted edges. This is nothing more than a connection which has a number associated to it. Think of it like differentiating between “team A beat team B” versus “team A beat team B by X points”.
In page rank, there is an edge from vertex j to vertex k if there is a link in that direction. The weight of this edge is the proportion of all of j’s links that go to k. There are a few ways to implement this for sports. We could:
- Assign the edge from team A to team B with weight given by the proportion of ALL team A’s losses that were to team B.
- For example, if team A had 5 losses and 2 were to team B, then the edge from A to B would have weight 0.4.
- Assign the edge from team A to team B with weight given by the total points B beat A by divided by how much A lost by in total across all their losses.
- For example, suppose across 5 losses team A had a margin of victory of -30. If team B beat team A by 11 points in one game, then the weight from A to B would be 11/30
Both of these methods will lead to different results, both equally valid. Doing this for all teams gives us the graph to compute the page rank sports rankings. The actual rankings can be computed using some pre-built functionality in your favorite coding language.
Conclusions
The page rank sports ranking system uses ideas borrowed from Google’s favorite algorithm. The page rank sports ranking system takes into account win/loss record and strength of schedule in a unique way different from how other methods do it.
While the results are not necessarily state-of-the-art in general, the page rank sports algorithm might find applications in sports where rankings prove difficult.