Two-Deep Strength of Schedule: Opponents' Opponents in Python

Average opponent win percentage is a fine first cut at strength of schedule (we built it in the SoS explainer). But it has a blind spot: it treats a 10-2 opponent the same whether that opponent earned its record against giants or cupcakes. The fix is to go one level deeper — fold in your opponents' opponents. Here's the two-deep version in Python. Full code: scripts/cfb-two-deep-sos-python.py.

One-deep, then two-deep

One-deep is just the mean of opponents' win percentages. Two-deep blends each opponent's own record with their schedule strength:

one_deep(t) = mean( winpct(o)                       for o in opponents(t) )
two_deep(t) = mean( 0.67*winpct(o) + 0.33*one_deep(o)  for o in opponents(t) )

The 0.67/0.33 split weights an opponent's own record more than its schedule; tune to taste.

The two-deep term asks: was your opponent's record built against a tough slate (raising its value) or a soft one (lowering it)? It's the same recursive instinct that, taken all the way, becomes an iterative ratings system.

Compute it

Using the shared season helper (ESPN results), build records, then the two passes:

from _cfb_season import season_games, records, winpct, opponents
games = season_games(2024); rec = records(games)
one = {t: mean(winpct(rec, o) for o in opponents(games, t)) for t in teams}
two = {t: mean(0.67*winpct(rec, o) + 0.33*one[o] for o in opponents(games, t)) for t in one}

The result

Team    1-deep  2-deep
MICH     64.7%   60.1%
UCLA     64.0%   59.1%
FSU      64.2%   57.6%
VT       60.0%   57.2%
KU       62.9%   56.8%
OSU      57.9%   55.8%

Actual output, 2024 season (ESPN results), retrieved June 2026.

Grouped bars comparing one-deep and two-deep strength of schedule for the toughest 2024 schedules; two-deep compresses the values. — One-deep vs two-deep SoS, 2024. Two-deep pulls the extremes toward the middle. Data: ESPN public API; calculation by the code above. Retrieved June 2026.

Notice what two-deep does: it compresses the extremes. Michigan's schedule still rates toughest, but its number drops from 64.7% to 60.1% — because some of its opponents' gaudy records were themselves built against softer competition, and the second level discounts that. Two-deep is more conservative and, generally, more accurate. The values cluster tighter because real schedules are more similar than one-deep makes them look.

Where to take it

Keep going. Three-deep, four-deep… in the limit you've reinvented an iterative rating (our spreadsheet ranking does exactly that). Two-deep captures most of the benefit cheaply.
Weight by location. A road game against a good team is tougher than the same game at home.
Drop FCS games or handle them explicitly; they distort win-percentage math.

Sources & further reading

Free textbook: Chapter 18: Game Outcome Prediction — the theory behind this, at DataField.dev.
ESPN public API (results) — via scripts/_cfb_season.py
Companion code: scripts/cfb-two-deep-sos-python.py
Related: Strength of schedule explained · Adjusted ranking in a spreadsheet

C. B. Zakarian

C. B. Zakarian is an independent analyst who writes about what he can measure: ball sports and the player-run economies inside Roblox. He builds every model, chart, and calculator here himself from public data, shows the working, and never invents a number. When the data can't answer a question, he says so. On CollegeAthleteInsider, that means college football and basketball by the numbers, plus a plain-English read on the NIL-era rules. More about the methodology →

One-deep, then two-deep

Compute it

The result

Where to take it

Sources & further reading

C. B. Zakarian

Related in Tutorials

Simulate March Madness with a Monte Carlo Bracket (Python)

Build a Strength-of-Schedule-Adjusted Ranking in a Spreadsheet

Pull College Basketball Data with sportsdataverse: Men's and Women's Hoops