Analyzing Chess Elo: Does Ian Nepomniachtchi Tilt?

In honor of the recently completed 2021 World Chess Championship, I wanted to take the opportunity to analyze Chess Elo ratings in order to see if one of the main storylines has statistical backing. The reigning champion – five time defending champion at that! – Magnus Carlsen won the match quite convincingly after a slow start. However, after his opponent Ian (pronounced like yawn) Nepomniachtchi lost for the first time, things shifted quickly and squarely into Magnus’ favor. Many grandmaster commentators said this was inevitable: Ian Nepomniachtchi has a history of going on tilt after a particularly demoralizing loss.

To be clear, I am not talking about Ian throwing things or getting angry, he could not have possibly been more graceful in defeat. However, his quality of play decreased quite significantly. Indeed, before his loss, Ian drew five games in a row with the reigning world champion. However, after his loss he went on to quickly lose three of the next five games.

Is there any statistical backing to support the claim that Ian has been a swingy player historically? In this article, I want to use Chess Elo ratings and Ian’s historical tournament performances to see if there is a trend of this happening. To start, let’s take some time to discuss how Chess Elo ratings work.

What is Chess Elo: High Level

Perhaps the biggest misnomer about Elo ratings is that Elo is an acronym. It isn’t, it is named after the statistic’s inventor Arpad Elo. It is pronounced as if it were a brand of sugar-free sweetener and not like a 70’s rock band.

Elo is a very simple system that is extremely helpful in games with head-to-head matchups. It was originally invented for Chess, but has been adapted to other sports, for example by Nate Silver for basketball player ratings. Elo assigns every player a numerical rating by letting the winning player ‘take’ rating points from the losing player. The amount of Elo that is transferred is a function of the difference in the rating (skill) of the two players and the actual outcome.

If a low rated player beats a high rated player, a lot of points will be transferred. This is because the lower rated player winning is either evidence that the low rated player should be rated higher or vice versa (or both). On the other hand, suppose a highly rated player beats a lower rated player. In this case, the highly rated player will gain a very small amount of points because this just confirms what we already expected to be true.

In Chess there is a third option: a draw. If you draw with someone rated higher than you, you’ll gain a few (not as many as a win) points because you were ‘expected’ to lose. You did better than you should have so your Chess Elo will respond accordingly. If you draw with someone lower rated than yourself, you’ll lose a few points.

Elo works because after playing many, many games you’ll eventually reach a rating where your skill level matches opponents at a similar rating. If you are playing worse (lower rated) players, you’ll consistently win and consistently gain rating. If you are playing better (higher rated) players, you’ll consistently lose and consistently lose rating. Only when your rating reflects your true skill level do you tend to win as many points as you lose on average.

Chess Elo is pretty fascinating. It is wildly different from how most other sports rate their teams. From AP Polls to real plus minus, all sports analytics is about rankings things. But Elo is as old as time, is very simple, and works extremely well.

What is Chess Elo: Mathematical Details

Chess Elo ratings are crucially dependent on the concept of an expected performance. Expected performance relies on assigning a numerical outcome to the result of a game (in true statistical parlance, we need to define a random variable for which we want to compute the expected value). If a player wins, we give them 1 point, if they draw a half point, if they lose they get 0 points. The expected performance of player A against player B is the average number of points they’ll win per game if they play many, many games.

For example, if player A has a 60% chance of winning, 30% chance of tying, and 10% chance of losing against player B, then their expected points is 0.6*1 + 0.3*0.5 + 0.1*0 = 0.75.

Chess Elo ratings naturally lead to a formula for expected points in a matchup; these formulas are a result of logistic regression. While I won’t give the exact formulas, you can find them on, for example, Wikipedia’s article about the topic. Most simply, player A’s expected points grows exponentially fast towards 1 as their rating grows larger than player B’s. At the same time, player B’s expected points go exponentially fast to 0.

Finally, to update a player’s Elo rating, we compare their actual points scored against their expected points in a matchup. If actual points is larger than expected points, they gain rating. If actual points is smaller than expected points, that player loses rating. In this way, Elo rating can be calibrated to accurately reflect a player’s talent and is able to respond quickly to a player’s change in ability.

Our Methodology

We’re going to use the concept of expected performance or expected points to see if Ian Nepomniachtchi consistently underperforms his expectation in tournament games immediately following a loss in the same tournament. So, we’ve gathered a list of all his games from 365chess.com and computed his expected performance and actual performance. The results of this study are contained in the next section.

Does Ian Nepomniachtchi Tilt?

We went through every Ian Nepomniachtchi game since 2007 and selected specifically those games coming immediately after a loss in the same tournament. We computed how he performed relative to his expected value in each of these games, and computed his average performance above or below expected. The result, at first glance, may be extremely surprising. In games immediately following a loss in the same tournament, Ian Nepomniachtchi consistently scores 0.023 points better than expected in games immediately following losses.

Better.

Now, the story doesn’t quite end there. This is a really good example of statistics lying to you if you don’t know what you’re talking about. In 2007, Ian was rated in the high 2500’s while his rating since has climbed into the high 2700s. In order to gain that much rating, he needed to consistently out-perform his rating. That is, he had to win more than he lost. In fact, if you look at every game Ian played since 2007 he, on average, outperformed his expectation by 0.035 points.

Now things are starting to come together. While Ian did tend to outperform his expectation after losses, he did so by less than after a randomly selected game. To put that more simply: Ian performed worse after a loss than he did overall.

Does that settle the story?

Of course not, we need to sprinkle in a little bit of statistics. There is certainly a difference in Ian’s performance after a loss. But is that difference statistically significant? That is, is the difference big enough to not just be a coincidence?

Performing a z-test at the 95% significance level indicates that Ian does not indicate that this difference is statistically significant. In fact, the z-score for this data is only about -0.6 which wouldn’t indicate a statistically significant difference at any reasonable confidence level.

The long and short of it is that while Ian Nepomniachtchi does not perform statistically significantly worse after a loss, he has actually performed a little bit worse in that setting.

Is Tilting a Recent Effect?

Maybe things have changed since Ian Nepomniachtchi reached super-GM level? I wanted to attempt to determine if maybe Ian’s tilting was a more recent phenomenon. So, I looked at his performance relative to expectation after a loss over different periods of time. The graphic below shows a smoothed time-series estimating Ian’s performance relative to expectation after a loss over time. A few notable tournaments over the last few years are highlighted for orientation.

Using Chess Elo to decide if Ian Nepomniatchi Tilts

How can we interpret this graphic? Honestly, it is difficult to say. Since the `2020 St. Louis Blitz’ tournament, Ian has performed about 0.025 points worse than expected after a loss. Before that, though, things were very cyclic – likely as Ian went through periods of significant improvement followed by stagnation over and over again. While his negative average points relative to expectation recently perhaps indicates that Ian does have a tilting problem, I find it hard to believe that the effect is statistically significant.

Conclusions

To me, it seems that the claim that Ian Nepomniachtchi goes on tilt and underperforms relative to expectation after a loss is perhaps blown out of proportion. I think the results of this World Chess Championship were more a result of going down early to a decidedly better opponent and Ian having to play aggressively to catch back up. So, give my guy a break. The data does not strongly support the claim that Ian goes on tilt. Instead, let’s focus on how impressive it is that Magnus Carlsen has gone on to handily win his 5th straight World Chess Championship.