Underneath the athletic facade, sports is all about optimization. And perhaps the most commonly employed tactic is greedy optimization. Some might call the strategy “win…
We all remember the measures of central tendency from middle school: mean, median, and mode. All we heard about was mean, median, and mode, but…
The Sparse Impacts Model (SIM) is the second iteration of The Data Jocks’ NBA model. Some NBA models assign ratings to teams. An entirely different…
Nearly every data scientist or sports analyst has learned a painful first-hand lesson about data leakage. Let me paint a picture. You come up with…
Development of a page rank sports ranking system is not a new idea. In fact, it gets to the heart of a central question in…
In machine learning and predictive statistics, the name of the game is maximizing prediction accuracy (true positives/negatives) while minimizing error (false positives/negatives). Both ROC Curves…
If neural networks dominated the early 2010s, no algorithm dominates modern machine learning discussions like XGBoost. XGBoost is a wildly powerful variant on decision trees…
Linear regression is the first, easiest, and most versatile tool that statisticians and data scientists will learn. The Sklearn linear regression python class provides the…
College football in particular can be particularly divisive when debating the resume of one team over another. One of the tools that ESPN uses to…
Everyone loves filling out brackets. It’s a way to flex your knowledge and dunk on your friends. It gives you rooting interests in games you…