Source: data_feeder_schools.md
The National Student Clearinghouse (NSC) is the most comprehensive source of student-level postsecondary enrollment data in the U.S., covering ~3,600 institutions enrolling 97%+ of all students.
High School Benchmarks Report: Annual report on HS graduates' postsecondary enrollment, persistence, and completion. Tracks first-fall college enrollment by graduating class. Available as downloadable data dashboards.
Current Term Enrollment Estimates: Aggregate enrollment trends released 3x/year (November preliminary, January and June comprehensive).
Research Reports: Published analyses on enrollment patterns, persistence, transfer rates.
StudentTracker: The core product that links individual HS students to their college enrollment outcomes. Available only to subscribing high schools and districts — not to the general public. Schools upload student rosters and receive back which colleges their graduates enrolled in.
Student-level microdata: Not available for download. Researchers can apply for access through the NSC Research Center.
NSC aggregate reports provide national baselines (e.g., what % of HS grads enroll in 4-year vs 2-year, persistence rates by school type) but do NOT provide school-to-school linkage data publicly. The HS-to-college mapping lives inside StudentTracker, which is paywalled.
Several states publish high-school-level college-going rates that can serve as proxies for feeder patterns.
College-Going Rate (CGR) Data: Downloadable CSV files showing college-going rates at state, county, district, and individual school level, disaggregated by race/ethnicity and student group.
12-Month CGR Files: Track % of HS completers enrolling in postsecondary within 12 months.
16-Month CGR Files: Extended tracking window.
Limitation: Shows aggregate college-going rates per HS, but does NOT break down which specific colleges students enrolled in.
Student Data: Statewide enrollment by grade, race/ethnicity, gender, economic status, program participation.
College-going data available through TEA's accountability system but less granular than California's.
State databases generally track whether HS graduates go to college, not which specific college. The HS-to-specific-college linkage requires NSC StudentTracker or institution-specific data.
Used anonymized admissions data from Ivy-Plus colleges (Ivy League + Stanford, MIT, Duke, UChicago) linked to tax records and SAT/ACT scores.
Key findings on HS type:
Children from top-1% families are 2x as likely to attend Ivy-Plus as middle-class students with comparable test scores.
The rich-kid advantage in non-academic ratings is "almost entirely driven by the fact that they are much more likely to attend elite private high schools."
Three drivers of high-income admissions advantage: (1) legacy preferences, (2) non-academic credential weighting, (3) athletic recruitment.
Children from high-income families have no admissions advantage at flagship public colleges.
Data availability: Opportunity Insights data portal provides downloadable CSV files with college mobility statistics by institution and birth cohort.
Used Harvard admissions microdata from the SFFA v. Harvard trial (Classes of 2014-2019).
Key findings:
43% of white admits were ALDCs (Athletes, Legacies, Dean's list, Children of faculty/staff).
~75% of ALDC admits would have been rejected without those preferences.
68%+ of recruited athletes, legacies, and dean's list applicants are white vs. <41% of typical applicants.
Admit rates: athletes 86%, legacy 33%, dean's list 42%, faculty children 47% — vs. ~6% overall.
Data: Harvard admissions data includes demographic, geographic, academic measures, internal Harvard ratings (academic, extracurricular, athletic, personal), plus HS counselor/teacher letter ratings. Data is not publicly downloadable — available only through court records.
Companion analysis of the SFFA trial data showing admit rate differentials by race.
African American applicants' admit rates ~4x higher than comparable white applicants; Hispanic applicants ~2.4x.
Children with parents in top 1% are 77x more likely to attend Ivy-Plus than children from bottom 20%.
Income segregation across colleges comparable to income segregation across census tracts.
Downloadable data: College-level mobility statistics (parent income distributions, student earnings outcomes) available as CSV at opportunityinsights.org/data.
Uses Naviance data to study how showing HS students past admission outcomes from their own school affects application behavior.
Demonstrates that personalized HS-to-college outcome data reduces undermatching.
Systematic mapping of feeder school networks to elite colleges.
What's there: College-level data on parent income distributions, student earnings, mobility rates. Linkable to IPEDS college identifiers.
Format: CSV downloads with readme documentation.
Limitation: College-level aggregates, not HS-to-college linkage. But provides the income/mobility context that shapes feeder dynamics.
URL: nces.ed.gov/ipeds
What's there: Annual survey data from every Title IV institution — enrollment, graduation rates, finances, student demographics. 12 interrelated survey components.
Limitation: College-side data only. No HS origin information.
What's there: Institution-level and field-of-study-level data going back to 1997. Includes earnings outcomes, debt, completion rates.
Format: Downloadable CSV files.
Limitation: No HS-to-college linkage.
URL: commondataset.org
What's there: Standardized self-reported data from colleges including acceptance rates, enrolled student profiles (SAT/ACT ranges, GPA distributions), financial aid.
Limitation: College-reported aggregates. No HS-level breakdown.
| Dataset | URL | Notes |
|---|---|---|
| Elite College Admissions | kaggle.com/datasets/mexwell/elite-college-admissions | Admissions data for selective colleges |
| College Admissions (Qian) | kaggle.com/datasets/samsonqian/college-admissions | General college admissions data |
| US College Data | kaggle.com/datasets/yashgpt/us-college-data | Institutional characteristics |
| US Schools Dataset | kaggle.com/datasets/andrewmvd/us-schools-dataset | K-12 school data |
Note: None of the Kaggle datasets provide direct HS-to-college linkage. They are primarily college-side or student-attribute datasets.
US Colleges Analysis — Kaggle-derived college admissions analysis
College Scorecard Analysis — Visualization of DOE College Scorecard data
No public GitHub repos found with direct feeder school data.
Used by ~10,000+ high schools. Contains HS-specific scattergrams (GPA/test scores vs. admit/deny at specific colleges).
Not publicly accessible — requires authenticated school login.
Data threshold: scattergrams shown only if HS has 5+ applicants to that college.
Academic researchers have obtained access for studies (see Mulhern paper above).
Interactive: "Most Schools Dream of Sending Students to Harvard. These 21 Expect To."
Methodology: Analyzed Freshman Register data for 15 matriculated classes (2009-2024).
Key data points:
21 schools sent 2,200+ students to Harvard over 15 years
1 in 11 accepted students comes from these 21 schools
Top feeders (100+ students each, 2009-2024): Boston Latin, Phillips Academy Andover, Stuyvesant, Phillips Exeter
5% of freshmen come from just 7 schools: Boston Latin, Phillips Andover, Stuyvesant, Noble & Greenough, Phillips Exeter, Trinity (NYC), Lexington HS
Of 21 schools: 12 private (avg tuition ~$64K), 9 public (4 selective magnet, 4 affluent suburban, 1 local)
Private school students = ~25-30% of Ivy undergrad classes (25.5% Harvard, 27% Princeton, 32.4% Brown, 37.9% Cornell)
Earlier analysis of historical feeder patterns and the institutional relationships behind them.
Harvard Feeders | Yale Feeders | Princeton Feeders | MIT Feeders
Methodology: Top 30 feeder schools per university, ranked by % of graduates matriculating to that university over past 5 years.
Eligibility: College-prep schools with grade 12/PG, minimum 40-student average graduating class.
Provides per-university feeder rankings — useful for cross-referencing patterns.
Uses proprietary Elite College Placement Index (LA-ECPI) based on matriculation to T25 National and T15 Liberal Arts colleges (70% weighting).
Top LA schools: Harvard-Westlake (50% elite placement), Polytechnic (40%), Marlborough (40%).
Data modeled from publicly available school profiles and matriculation lists.
"Will Attending A Private High School Boost Your Chances of Ivy League Admission?"
Compiled data showing private school students are 2x as likely to be admitted as comparable public school students.
| Source | What It Provides | Format | Access |
|---|---|---|---|
| Harvard Crimson 2024 feeder data | Named schools, counts (100+ per school over 15 years), public/private split, tuition | Structured article with data widget | Free, web |
| Prep Review rankings | Top 30 feeders per HYPSM university, % matriculation rates | Web tables | Free, web |
| Chicardgo ECPI | LA private school placement rates into T25 colleges | Web tables | Free, web |
| California CDE CGR files | Per-school college-going rates, demographic breakdowns | CSV download | Free, govt |
| Opportunity Insights data | College-level parent income distributions, mobility rates | CSV download | Free |
| Source | What It Provides | Format | Access |
|---|---|---|---|
| Arcidiacono SFFA data (via papers) | ALDC admit rates, HS type effects on ratings | Published tables/figures | Free (papers) |
| Chetty "Diversifying" paper | Private HS advantage quantified, non-academic rating gaps | Published tables/figures | Free (paper) |
| CDS / IPEDS / College Scorecard | College-side acceptance rates, SAT/GPA ranges, enrollment | CSV download | Free, govt |
| NSC High School Benchmarks | National HS-to-college enrollment baselines | Report/dashboard | Free |
| Source | What It Provides | Why Restricted |
|---|---|---|
| NSC StudentTracker | Actual HS-to-college enrollment linkage | Subscription required |
| Naviance | Per-HS scattergrams of admits/denies at specific colleges | School login required |
| SFFA trial microdata | Harvard admissions records with HS identifiers | Court records, not easily accessible |
Use Harvard Crimson 2024 data to define feeder school archetypes: elite boarding (Exeter, Andover), selective magnet (Stuyvesant, Boston Latin, TJHSST), affluent suburban (Lexington, Scarsdale, Brookline), elite day school (Noble & Greenough, Trinity).
Use Prep Review to cross-reference feeder patterns across HYPSM — some schools are Harvard feeders but not MIT feeders, etc.
Use Chetty/Arcidiacono papers to calibrate the private-school advantage multiplier: private school students ~2x admission probability at comparable test scores, driven by non-academic ratings + legacy + athletics.
Use CDE CGR data to set realistic college-going rate baselines for different school types (affluent suburban ~90%+, average public ~65%, low-income ~40%).
Use Opportunity Insights data to calibrate income-to-enrollment relationships: top-1% families produce 77x the Ivy-Plus enrollment of bottom-20% families.
Derive feeder school parameters: For the simulation's 20 high schools, map each to an archetype with calibrated feeder rates:
Elite boarding school: ~15-20% of grads to HYPSM, ~40-50% to T20
Selective magnet: ~8-12% to HYPSM, ~30-40% to T20
Affluent suburban: ~5-8% to HYPSM, ~20-30% to T20
Average public: ~0.5-1% to HYPSM, ~5-10% to T20
Under-resourced public: ~0.1% to HYPSM, ~2-5% to T20