College football has CFBD; college basketball has sportsdataverse. It's an open-source family of tools that bundles public, ESPN-derived data into clean tables you can load in one line — and, best of all for our purposes, it covers both the men's game (via hoopR) and the women's game (via wehoop) with an identical structure. That symmetry is the secret weapon: write your analysis once, run it on both. No API key required. The full script is in scripts/sportsdataverse-basketball-tutorial.py.

Step 1: Install

pip install sportsdataverse pyarrow
pyarrow lets the library read the cached data files efficiently.

sportsdataverse pulls pre-built season files (so you're not hammering any live API), then hands you a dataframe. The first load of a season downloads and caches it; after that it's instant.

Step 2: Load a season of team box scores

The men's module is sportsdataverse.mbb, the women's is sportsdataverse.wbb. Their loaders mirror each other:

import sportsdataverse.mbb as mbb
import sportsdataverse.wbb as wbb

season = 2025  # the season-ending year

men   = mbb.load_mbb_team_boxscore(seasons=[season]).to_pandas()
women = wbb.load_wbb_team_boxscore(seasons=[season]).to_pandas()
.to_pandas() converts the result to a familiar pandas DataFrame.

Each row is one team's stat line from one game: team_score, field_goals_attempted, offensive_rebounds, turnovers, free_throws_attempted, three_point_field_goals_attempted, and dozens more. Crucially, the column names are the same for men and women — which is why the next function works on either.

Step 3: Compute scoring and pace yourself

Let's turn raw box scores into the possession-based numbers from our tempo and efficiency guide. One function, reused for both leagues:

def league_averages(df, label):
    df = df[df["team_score"] > 0]   # drop blank rows
    pts  = df["team_score"]
    poss = (df["field_goals_attempted"] - df["offensive_rebounds"]
            + df["total_turnovers"] + 0.475 * df["free_throws_attempted"])
    print(f"{label}: {len(df):,} team-games | "
          f"{pts.mean():.1f} pts/team | "
          f"{poss.mean():.1f} possessions | "
          f"{df['three_point_field_goals_attempted'].mean():.1f} 3PA")

league_averages(men,   "Men's 2025")
league_averages(women, "Women's 2025")
The same code runs on both dataframes because the schemas match.

Step 4: Read the output

Run it and you get real, league-wide averages computed from every Division I game in the season:

Men's 2025:   12,572 team-games | 73.2 pts/team | 69.1 possessions | 22.9 3PA
Women's 2025: 11,252 team-games | 65.4 pts/team | 71.2 possessions | 19.7 3PA
Actual output, sportsdataverse data retrieved June 2026.

Look what fell out of four lines of analysis: the women's game is played at a higher pace (71.2 possessions to the men's 69.1) yet produces fewer points (65.4 to 73.2). The gap is the three-pointer — men attempt nearly 23 a game to the women's ~20, at higher accuracy. That's a genuine, sourced insight you generated yourself, and it's the backbone of our women's game analysis. This is the entire promise of the toolkit: real conclusions, from public data, in minutes.

What else is in the box

The same modules expose much more than team box scores:

  • load_mbb_player_boxscore() / load_wbb_player_boxscore() — player-level lines for leaderboards and usage analysis.
  • load_mbb_schedule() / load_wbb_schedule() — full schedules and results, perfect for strength-of-schedule work.
  • load_mbb_pbp() / load_wbb_pbp() — play-by-play, for possession-level and lineup analysis (these files are large).

Good habits

  • Let the cache work. Load a season once; the library stores it locally. Don't re-download in a loop.
  • Filter junk rows. Drop rows where team_score is zero or missing before averaging, as we did above.
  • Mind the season convention. "2025" means the 2024-25 season (the ending year). Off-by-one here is the most common beginner mistake.
  • Credit the source. sportsdataverse aggregates public data; cite it (and respect that some underlying providers have their own terms).

Where to go next

Try computing the same averages across several seasons to build a trend (that's exactly how we charted the women's game over time), or join the box scores to schedules to make your own opponent-adjusted ratings. Because the men's and women's data share a schema, every tool you build works on both halves of the sport for free — which, frankly, is how all of college basketball analysis should work.

Sources & further reading

The CollegeAthleteInsider Analyst

I'm an independent analyst covering college football and basketball through public data. Every number here traces to a script in /scripts. More about the methodology →