PIR

College Basketball Analytics

Player
Impact
Rating

A data-driven framework for evaluating college basketball players and identifying undervalued transfer portal targets in the NIL era.

Motivation

The transfer portal is the modern draft

College basketball has been quietly undergoing one of the most significant structural transformations in the history of American sports. The combination of NIL (Name, Image, and Likeness) rights and an unrestricted transfer portal has dismantled the old model of roster building, where a program recruited a class, developed players over four years, and competed with what it grew. That model is gone.

What replaced it looks a lot more like professional sports free agency. Programs now treat the transfer portal as an annual draft. Stars leave for money and playing time. Role players find better fits. And the coaches thriving in this environment are the ones operating like front offices: scouting broadly, identifying players whose value the market is underpricing, and moving fast when they find one.

“The teams winning championships in this era aren’t just recruiting better, they’re evaluating the portal more intelligently than everyone else.”

Michigan’s 2026 Championship (Starting 5 all transfers). UConn’s back-to-back titles. San Diego State’s Final Four appearance. Florida Atlantic’s Cinderella run. Program after program has cracked the code not through only blue-chip high school recruiting, but through intelligent portal assembly, finding players who fit their system and whose production in a prior context didn’t fully reflect their actual ability.

The problem is that college basketball is dramatically underserved by public analytics compared to the NBA and even college football. Most portal evaluation relies on recruiting rankings, eye tests, and social media buzz, the same inputs everyone else is using. If every program is looking at the same information, nobody has an edge.

That gap is what this project was built to close. The goal was simple: build a statistical framework rigorous enough to surface players the market is sleeping on, before consensus catches up.

Methodology

How PIR is built

PIR — Player Impact Rating — is a composite metric that blends four independent signals about player quality into a single, interpretable score. No single data source tells the full story. Box score data is widely available but misses context. Play-by-play models capture context but have coverage gaps. The insight behind PIR is that combining sources with known, different failure modes produces a more reliable estimate than any single source alone.

4
Data sources
1,100+
Players rated
0.605
CV R² offense
0.84
PIR–BPR corr.

Data Pipeline

The model draws from three statistical sources and one portal-specific source, each contributing what the others cannot:

01

Sports-Reference CBB — Box Score Foundation

Advanced per-player statistics for every qualifying Division I player: effective field goal percentage, turnover rate, free throw rate, rebound rates, assists, steals, and blocks. Scraped directly from each team’s season page. This is the raw material — granular, widely available, and the starting point for feature engineering.

02

EvanMiya BPR — Play-by-Play Impact

Box Plus/Minus Rating uses play-by-play lineup data to estimate each player’s impact on team efficiency per 100 possessions. It captures things the box score misses entirely: screen assists, off-ball defense, transition impact. BPR is the closest college equivalent to NBA RPM and carries 30% of the composite weight.

03

Bart Torvik PRPG — Independent Efficiency Estimate

Torvik’s pace-adjusted production grade provides a second independent estimate of player value derived from lineup efficiency data — fully independent of box score arithmetic. PRPG also supplies individually-adjusted per-possession rates used in the defensive composite, a critical correction over team-confounded DRTG.

04

VerbalCommits — Transfer Portal Overlay

The full transfer portal population is matched against the player database using fuzzy string matching — the fix for the most common failure mode in naive CBB data pipelines. Every portal player is flagged and re-ranked within the portal cohort, enabling rank-gap analysis against BPR consensus.

The Offensive Model

The offensive component uses Ridge regression (a regularized linear model) targeting Torvik’s PRPG rather than Sports-Reference’s ORTG. This is a deliberate design choice: ORTG and True Shooting Percentage share arithmetic construction (both derive from points and shot attempts), which creates structural leakage that inflates R² without adding genuine predictive signal. PRPG is built from play-by-play lineup data, so the relationship between box score features and the target is earned, not definitional.

Features — True Shooting %, Free Throw Rate, Three-Point Attempt Rate, Turnover %, Assists per minute, and Offensive Rebounds per minute, are all z-scored within position groups before entry. A center’s offensive rebound rate is not compared to a guard’s. Players are evaluated against their positional peers.

Model evaluation uses five-fold cross-validation with out-of-fold predictions. The CV R² of 0.605 represents genuine out-of-sample predictive accuracy — not in-sample memorization.

The Defensive Composite

The defensive component makes a deliberate choice: no regression, no team-confounded target. Defensive Rating (DRTG), the most commonly used defensive metric, measures points allowed per 100 possessions while a player is on the floor. It is almost entirely determined by the other four players on the court and the defensive scheme. A player on a top-5 defensive team will always look better than the same player on a weak team, regardless of individual contribution.

Instead, the defensive score is a weighted composite of individually-attributable rates: Steals per minute (35%), Blocks per minute (30%), Defensive Rebounds per minute (25%), and Personal Fouls per minute (−10%). The model is intentionally conservative, claiming more precision than the available data supports would be misleading.

The PIR Composite

The four components are combined with weights calibrated to reflect both predictive strength and data source independence:

EvanMiya BPR
30%
Offensive Model (Ridge → PRPG)
30%
Torvik BPM
20%
Defensive Composite
20%

BPR and BPM together carry 50% because they incorporate information the box score structurally cannot: lineup context, off-ball movement, shot quality. A completeness penalty automatically scales down players missing data sources, so a player with only Sports-Reference data is never ranked above a fully-covered player of equal apparent quality. Scores are stabilized using a square-root minutes weight — high-efficiency players in smaller roles remain visible rather than being buried by sample size.

Differentiation

What PIR does differently

There are excellent public models for evaluating college basketball players. The goal of PIR is not to replace them, BPR and BPM are explicitly included in the composite because they are the best single-source estimates available. The goal is to identify where those models disagree with box score evidence and whether those disagreements represent genuine market inefficiencies.

Feature BPR only BPM only PIR
Play-by-play signal
Box score feature regression
Position-adjusted z-scores
Cross-validated model evaluation
Avoids DRTG team noise
Multi-source data blending
Completeness confidence scoring
Transfer portal rank-gap analysis
PIR composite score

The most practically valuable output is the rank gap: the difference between a player’s PIR rank within the transfer portal cohort and their BPR rank within the same group. A player ranked 8th by PIR but 22nd by BPR is one the model values significantly more than public consensus — worth watching film on, not because the model is certainly right, but because the disagreement itself is a signal worth investigating.

The model doesn’t tell you who to recruit. It tells you who deserves a closer look — and where the market might be wrong.

Limitations

No model replaces film. PIR is a prioritization tool, not a verdict. It does not account for position fit within a specific system, injury history, or how a player handles a role change. The defensive component is the weakest signal – steals, blocks, and rebounds are individually attributable but still miss shot deterrence and help defense. Mid-major players may be slightly discounted due to lower Torvik play-by-play coverage in smaller conferences. Use it as a first filter, not a final answer.

Explore the data

Open the Dashboard

Search and filter 1,100+ evaluated players. View the full transfer portal board, rank-gap analysis, and side-by-side player radar comparisons.

Open PIR Dashboard →

Opens in a new tab  ·  Free to use

Data: Sports-Reference CBB · EvanMiya.com · Barttorvik.com · VerbalCommits.com  ·  Season 2025–26  ·  600-min qualifying threshold