Home
Project Details

NFL Quarterback Performance Analysis

Statistical analysis of nullified interceptions in the NFL

Completed: December 2024
Version: 1.2
R Python pandas scipy matplotlib
NFL Quarterback Analysis visualization showing distribution of nullified interceptions

Distribution of nullified interceptions across NFL quarterbacks (2018-2023)

The Hidden Side of NFL Statistics

As football fans, we're all familiar with the standard NFL statistics that dominate discussions every Monday morning - passing yards, touchdowns, and interceptions. But what about the plays that don't make it into the official record? What about those moments when a quarterback throws an interception, only to have it erased by a penalty flag?

These "nullified interceptions" exist in a statistical shadow realm - they happened on the field but disappear from the record books. In this data analysis project, I decided to shine a light on this overlooked aspect of quarterback performance.

The Mahomes Mystery

The inspiration for this project came from watching Kansas City Chiefs games over the past few seasons. As a data scientist and football fan, I couldn't help but notice what seemed like an unusual pattern: Patrick Mahomes appeared to benefit from an uncommonly high number of nullified interceptions.

Was this just confirmation bias on my part, or was there something statistically significant happening? To answer this question, I needed data - and lots of it.

Technologies & Tools

This project leverages several powerful data science technologies:

  • R with nflfastR and tidyverse: The core of the data collection process, allowing me to access comprehensive play-by-play NFL data without manual review
  • Python: Used for the statistical analysis and visualization components
  • pandas: For efficient data manipulation and analysis
  • scipy: For conducting statistical tests like the Mann-Whitney U Test
  • matplotlib: For creating visualizations of the findings
  • numpy: For numerical calculations and bootstrap resampling

Data Collection Process

The backbone of this project is the nflfastR package in R, which provides comprehensive play-by-play data for all NFL games. My data collection workflow included:

  • Extracting play-by-play data for NFL seasons 2018-2023
  • Programmatically identifying nullified interceptions by searching for play descriptions containing "INTERCEPTED" alongside penalty flags
  • Algorithmically matching interceptions with starting quarterbacks when the passer information was incomplete
  • Merging pass attempt data to normalize interception rates
  • Outputting a cleaned dataset as nullified_interceptions_with_attempts_2018_2023.csv

Here's a glimpse at the R code I used to collect the data:

# Collect nullified interceptions (interceptions negated by penalties) nullified_ints <- pbp_data %>% filter(str_detect(desc, "INTERCEPTED"), penalty == 1) %>% select(season, game_id, week, posteam, defteam, passer_player_name, desc, penalty_team, penalty_type)

Analysis Techniques

After collecting the data, I analyzed it using Python to identify statistical outliers and differences in nullified interception rates. The analysis included:

  • Mann-Whitney U Test: A non-parametric test to check if Mahomes' nullified interception rate was statistically different from other quarterbacks
  • Bootstrapping: A resampling technique to estimate confidence intervals for nullified interception rates
  • Z-Score Analysis: To identify statistical outliers in the dataset
  • Data Visualization: To compare Mahomes' nullified interception rate against the distribution for other quarterbacks

The Python analysis that led to these conclusions included:

# Perform Mann-Whitney U test (to check if Mahomes is statistically different) if len(mahomes_rate) > 0 and len(other_qbs_rate) > 0: u_stat, p_value = stats.mannwhitneyu(mahomes_rate, other_qbs_rate, alternative="two-sided") else: u_stat, p_value = np.nan, np.nan

Key Findings

The analysis revealed some fascinating insights. Most importantly, Patrick Mahomes stood out significantly from his peers. The statistical tests confirmed what my eyes had suspected:

  • Mahomes had a Z-score of 4.80 for nullified interceptions
  • This is well beyond the typical threshold of 3.0 for identifying statistical outliers
  • The Mann-Whitney U Test confirmed the statistical significance (p-value < 0.001)
  • Bootstrap resampling demonstrated the robustness of these findings

What This Means for How We Evaluate Quarterbacks

These findings have significant implications for how we evaluate quarterback performance in the NFL. Standard interception statistics only tell part of the story. When a quarterback consistently benefits from nullified interceptions:

  • Their interception statistics appear better than their actual on-field decision-making
  • The team's offensive performance might be artificially boosted
  • Risk-taking behavior is effectively rewarded without the statistical penalty

This doesn't necessarily mean there's anything nefarious happening - it could be a result of offensive line play, penalty tendencies, or even coaching strategies. What it does mean is that the traditional box score doesn't capture the full picture.

Implementation Details

The project structure consists of two main components:

  • get_nullified_interceptions.R: An R script that collects and processes the NFL play-by-play data using nflfastR
  • analyze_nullified_interceptions.py: A Python script that performs the statistical analysis

Here's an example of how I calculated Z-scores to identify outliers:

# Calculate Z-scores for nullified interceptions mean_nullified = df['nullified_int'].mean() std_nullified = df['nullified_int'].std() df['z_score'] = (df['nullified_int'] - mean_nullified) / std_nullified # Identify outliers outliers = df[df['z_score'].abs() > 3.0] print(f"Quarterbacks with Z-scores beyond ±3.0 (statistical outliers):") print(outliers[['player_name', 'nullified_int', 'z_score']])

Challenges and Solutions

During this project, I encountered several challenges:

  • Data Integration: Combining the play-by-play data with quarterback information required careful matching algorithms
  • Sample Size Concerns: To address potential sample size issues, I implemented bootstrap resampling techniques
  • Controlling for Variables: I needed to account for factors like total passing attempts and offensive style when analyzing the data

Future Work

This project could be extended in several ways:

  • Expanding analysis to include the impact of nullified interceptions on game outcomes
  • Comparing nullified interception rates across different eras of the NFL
  • Refining methods to distinguish intentional penalties versus incidental fouls leading to nullified interceptions
  • Investigating the types of penalties that lead to nullified interceptions
  • Analyzing game situations (score, down, distance) when nullified interceptions occur
  • Developing a predictive model for nullified interceptions based on quarterback style and team factors

How to Use This Project

If you're interested in reproducing or building on this analysis, the process is straightforward:

  1. Run get_nullified_interceptions.R in R to generate the dataset (requires nflfastR and tidyverse packages)
  2. Run analyze_nullified_interceptions.py in Python to analyze the data and visualize results (requires pandas, numpy, scipy, and matplotlib)

Conclusion

This analysis provides statistical evidence that Patrick Mahomes benefits from an unusually high number of nullified interceptions. The findings suggest that standard interception statistics may not fully capture quarterback performance and risk-taking behavior.

The next time you watch an NFL game and see an interception wiped away by a penalty flag, remember - that play might not count in the official statistics, but it still tells us something important about quarterback performance. And in Patrick Mahomes' case, it tells us quite a lot.