Shot Charts for the NHL and Hockey Analytics

There is a surprisingly large – at least to me – online community devoted to Hockey analytics. While I am far from a Hockey expert, there is no denying how much room there is to contribute to an advanced statistical understanding of how the game works. The first question that came to my unconditioned mind is to study who the best shooters in the league are and to emulate the NBA community by creating NHL shot charts. The approach I will take is nearly identical to the one I took when studying the NBA’s best shooters. The idea is simple: compare each player’s accuracy relative to league average for each of the shots they took. In this way, players whose shot selection naturally leads to a higher percentage aren’t unfairly recognized for being great shooters just because they’re always shooting layups.

The meat of this article – as is typical on our site – is explaining how I get to my results. Along the way, well pick up some interesting graphics showing shooting percentages from various spots on the ice. While I always like to focus on the mathematics involved, today I’ll switch lanes just a bit and focus more on the programming that goes into each of these projects. If mathematics is the engine that drives data science and analysis, coding and computer programming are the tires. While an engine is enough to move in theory, without a set of tires all you’re left with are daydreams of cruising down the road.

This story will come in two parts. In the first part – this article – we’ll together study shooting accuracy from various spots on the ice. In the follow-up article, part two will show you how we turn this basic idea into a metric.

If you would like to skip ahead to the results, please feel free to follow these links to our NHL shot chart and the Boston Bruins Shot Chart.

Web Scraping with Beautiful Soup and Selenium

I almost always gather my data from sports-reference.com because their data is trustworthy and – significantly more importantly – every page on their website is in a predictable format. With just a little bit of coding knowledge (I significantly prefer Python for data harvesting even though I use R and Matlab for all my analysis), we can gather all the data we need.

For almost all my projects, the web scraping step is accomplished by just using Beautiful Soup to parse the HTML for a webpage and extract the table entries that I need. However, in this project I encountered a new wrinkle that made my data harvesting more difficult. I wanted shot chart data from hockey-reference. Unfortunately, the data exists online in a hard-to-use format. It looks like this:

single game NHL shot chart

However, these charts are only for a single game. To determine trends, I need to analyze all the games. So, I need to go through each game’s box score, somehow harvest the raw data from these charts, and aggregate each of them into one file. Luckily, if you look at the HTML (highlight something on the page, right click, and select inspect) that generates these images, you can get something resembling raw shot chart data. We have this:

This HTML tells us where each shot originated from, who took the shot, and whether or not it was made. If we can simply record this data for each team on the webpage, then our dataset will have been built. Then, once we’ve harvested the raw pixel values, we simply need to convert from pixels to feet…somehow. Seems easy enough, but as always JavaScript exists to make life harder.

JavaScript is responsible for making websites do things. In our case, the JavaScript dynamically renders these images and allows us to hover over individual shots to see who took it and whether it was successful. JavaScript makes a website look nice, but it makes scraping harder. JavaScript is nothing more than code that your browser (Chrome, Firefox, etc.) runs on your own machine. This means that if you simply ask a server for the HTML of a website, you won’t get any of the information generated by the JavaScript which is executed locally on your machine.

For us, this means we need to do more than simply request the raw HTML for the page containing every hockey game’s box score. In fact, we need to have our Python script actually render each page and run the JavaScript so that we can harvest the pixel values we so desperately desire.

Here is the point: the Python package Selenium does precisely what we want. It allows automated rendering of the on-page JavaScript so that we can access the data we want. All in all, our code stack performs the following tasks

  • Use Beautiful Soup on this page while searching for ‘href’ tags to generate a list of the URL for each individual game’s boxscore
  • On each of those pages, we us Selenium to run the JavaScript to generate the shot chart images we need
  • Uses Beautiful Soup to parse the resultant HTML to find the pixel values for every shot in every game
  • Converts from pixels to feet (don’t ask how, it took way longer than it should have but it was late and I was tired…)
  • Aggregates all this data into one source for analysis in R and Matlab.

The result of all this work is a .csv that looks like:

NHL Shot Charts – Where do the Best Shots Originate?

Going through ever single row of our dataset and aggregating which shots were made from each location, we can begin to get an idea of league-average shooting percentage from various locations around the net. If we just plot our raw data, we get the graphic below. The red box represents where the net is on the ice. The color of each pixel tells us the league-average shooting percentage from precisely that location on the ice. The x and y axis units are in feet (not in pixels!). This data is aggregated only from 2020 data and constitutes a total of about 40,000 shots.

aggregate raw NHL shotchart

This looks about right. Shots near the goal tend to be very accurate. Then, accuracy drops off as we move further out. However, at around 75 feet (the blue line!), accuracy jumps back up. However, this data is noisy.

In mathematics, there are two main ways to deal with noisy data. The first is to apply some local smoothing operations. The second is to gather more data. For no particular reason, we’ll apply the first idea here. If we apply a Gaussian smoothing filter (to those of you in the photography business, blurring) to the above image, we get what is likely a better representation of shooting percentage versus location in hockey. Below, I show two smoothed images. The filter applied to the first is much more local than in the second – the first image is less blurred.

And our heavily-smoothed shot chart:

smoothed aggregate NHL shot chart

It is totally up to an individual’s discretion which of these charts to use for their ‘NHL shot chart’. For me, I’ll use the smoothest image because to my untrained eye, it looks like it appropriately smooths the data while still retaining the underlying structure. For example, we can still tell that shots taken about 15 feet from the net go in at a slightly higher rate if they are taken from directly in front than if they are taken from the side.

Let’s see how we can use the above graphic to make conclusions about a single team in particular. For no particular reason, let’s consider the Boston Bruins as a case study.

Boston Bruins Shot Chart 2021

For the following charts, we compared the Boston Bruins shooting percentage relative to league average from each spot on the ice. Again: I don’t follow hockey and I have no preconceived opinions on the Bruins offense. However, the following chart tells me that the Bruins offense is probably below league average. Why? Because I see more red than green. Sometimes data analysis is simple if you look at it the right way.

Why does more red than green mean Boston’s offense is likely not stellar? Because it means Boston converts fewer of their shot opportunities leading to fewer goals. This is evidently especially true around the net and from center court around the blue line. What about their defense? The following chart shows how well Boston’s opponents shoot from various spots on the ice.

Boston Bruins Shot Chart

While the main functionality of this chart is that it looks nice, it could also be used to make conclusions if you look hard enough. If my arm was twisted and I was forced to say anything, I would guess the defenders who play on the right side of their defense play better (the left side of this graphic).

Going Forward

There are many improvements that can be made to this analysis. For instance, we may wish to weight these graphics by shot frequency. We notice that the Bruins have a gigantic red area at the bottom center of the graphic indicating that they don’t make very many shots from that spot on the ice. In fact, this area immediately draws the eye and kind of dominates the graphic. However, if the Bruins only went 0/1 from that area, then that would actually be quite misleading. Weighting by shot frequency would give us a better idea of where Boston is gaining a lot/losing a lot by virtue of shooting accuracy.

In the next article, we’ll develop those ideas further and continue onward to produce our metric for the NHL’s best shooters.