High School to College Enrollment: Feeder School Data Sources

Source: data_feeder_schools.md


High School to College Enrollment: Feeder School Data Sources

National Student Clearinghouse

The National Student Clearinghouse (NSC) is the most comprehensive source of student-level postsecondary enrollment data in the U.S., covering ~3,600 institutions enrolling 97%+ of all students.

What's Publicly Available

What's NOT Publicly Available

Simulation Relevance

NSC aggregate reports provide national baselines (e.g., what % of HS grads enroll in 4-year vs 2-year, persistence rates by school type) but do NOT provide school-to-school linkage data publicly. The HS-to-college mapping lives inside StudentTracker, which is paywalled.


State-Level Data Sources

Several states publish high-school-level college-going rates that can serve as proxies for feeder patterns.

California Department of Education (CDE)

Texas Education Agency (TEA)

Illinois

Key Limitation

State databases generally track whether HS graduates go to college, not which specific college. The HS-to-specific-college linkage requires NSC StudentTracker or institution-specific data.


Research Papers with Feeder Data

Chetty, Deming, Friedman (2023) — "Diversifying Society's Leaders?"

Arcidiacono, Kinsler, Ransom — "Legacy and Athlete Preferences at Harvard"

Arcidiacono — "What the SFFA Cases Reveal About Racial Preferences"

Chetty et al. — "Mobility Report Cards" (2017)

Mulhern — "Changing College Choices with Personalized Admissions Information at Scale"

Glasener — "Shaping Elite College Pathways: Mapping the Field of Feeder Schools"


Public Datasets

Opportunity Insights (Best Available)

IPEDS (Integrated Postsecondary Education Data System)

College Scorecard

Common Data Set (CDS)

Kaggle Datasets

Dataset URL Notes
Elite College Admissions kaggle.com/datasets/mexwell/elite-college-admissions Admissions data for selective colleges
College Admissions (Qian) kaggle.com/datasets/samsonqian/college-admissions General college admissions data
US College Data kaggle.com/datasets/yashgpt/us-college-data Institutional characteristics
US Schools Dataset kaggle.com/datasets/andrewmvd/us-schools-dataset K-12 school data

Note: None of the Kaggle datasets provide direct HS-to-college linkage. They are primarily college-side or student-attribute datasets.

GitHub Repositories


Journalism Sources

Harvard Crimson Feeder School Investigation (2024) — Most Detailed Source

Harvard Crimson (2013) — Historical Feeder Analysis

Prep Review — Multi-University Feeder Rankings

Chicardgo School — LA Private School Elite College Placement Index

Crimson Education / Rise

NPR (2023)


Most Useful for Simulation

Tier 1: Directly Usable Data

Source What It Provides Format Access
Harvard Crimson 2024 feeder data Named schools, counts (100+ per school over 15 years), public/private split, tuition Structured article with data widget Free, web
Prep Review rankings Top 30 feeders per HYPSM university, % matriculation rates Web tables Free, web
Chicardgo ECPI LA private school placement rates into T25 colleges Web tables Free, web
California CDE CGR files Per-school college-going rates, demographic breakdowns CSV download Free, govt
Opportunity Insights data College-level parent income distributions, mobility rates CSV download Free

Tier 2: Contextual / Calibration Data

Source What It Provides Format Access
Arcidiacono SFFA data (via papers) ALDC admit rates, HS type effects on ratings Published tables/figures Free (papers)
Chetty "Diversifying" paper Private HS advantage quantified, non-academic rating gaps Published tables/figures Free (paper)
CDS / IPEDS / College Scorecard College-side acceptance rates, SAT/GPA ranges, enrollment CSV download Free, govt
NSC High School Benchmarks National HS-to-college enrollment baselines Report/dashboard Free

Tier 3: Restricted but Valuable

Source What It Provides Why Restricted
NSC StudentTracker Actual HS-to-college enrollment linkage Subscription required
Naviance Per-HS scattergrams of admits/denies at specific colleges School login required
SFFA trial microdata Harvard admissions records with HS identifiers Court records, not easily accessible
  1. Use Harvard Crimson 2024 data to define feeder school archetypes: elite boarding (Exeter, Andover), selective magnet (Stuyvesant, Boston Latin, TJHSST), affluent suburban (Lexington, Scarsdale, Brookline), elite day school (Noble & Greenough, Trinity).

  2. Use Prep Review to cross-reference feeder patterns across HYPSM — some schools are Harvard feeders but not MIT feeders, etc.

  3. Use Chetty/Arcidiacono papers to calibrate the private-school advantage multiplier: private school students ~2x admission probability at comparable test scores, driven by non-academic ratings + legacy + athletics.

  4. Use CDE CGR data to set realistic college-going rate baselines for different school types (affluent suburban ~90%+, average public ~65%, low-income ~40%).

  5. Use Opportunity Insights data to calibrate income-to-enrollment relationships: top-1% families produce 77x the Ivy-Plus enrollment of bottom-20% families.

  6. Derive feeder school parameters: For the simulation's 20 high schools, map each to an archetype with calibrated feeder rates:

  7. Elite boarding school: ~15-20% of grads to HYPSM, ~40-50% to T20

  8. Selective magnet: ~8-12% to HYPSM, ~30-40% to T20

  9. Affluent suburban: ~5-8% to HYPSM, ~20-30% to T20

  10. Average public: ~0.5-1% to HYPSM, ~5-10% to T20

  11. Under-resourced public: ~0.1% to HYPSM, ~2-5% to T20