Data
Below you’ll find links to many of the datasets we use in our analyses in .csv format. Unless otherwise specified, the sources are gathered from sports-reference.com using the Python packages BeautifulSoup and Pandas.
Our Data Sets
- NBA 2019 Basic Box Scores. Suffix is the extension on basketball-reference.com to find the game’s information.
- NFL Quarter-by-Quarter Points Scored in: 2015, 2016, 2017, 2018, 2019, and 2020.
- The ‘Final’ column is the end of the game regardless of if that occurred in the fourth quarter or OT.
- The ‘suffix’ column is a unique identifier for each game. Appended to the end of ‘https://www.pro-football-reference.com/’ will take you to the page for the game.
- NFL Individual Player Season Totals (Since 1990 for rushing, receiving, 1996 for passing for technical reasons)
Other Excellent Data Sources
Here are some of our most frequently used data sources:
- https://www.sports-reference.com/ – For nearly any box score in any sport
- https://www.retrosheet.org/ – The ultimate baseball reference. Game logs are compact and require parsing but are extremely fruitful