College Scorecard Deep-Dive Research

Source: kaggle_scorecard_deepdive.md


College Scorecard Deep-Dive Research

1. Dataset Overview

The US Department of Education College Scorecard is the most comprehensive federal dataset on higher education outcomes. It combines data from the Integrated Postsecondary Education Data System (IPEDS), federal financial aid records, and IRS tax data to create a unified view of institutional performance.

Most Recent Data Year


2. Key Fields Reference

CSV Variable Names and API Field Paths

The College Scorecard uses two naming conventions:

  1. CSV variable names (e.g., ADM_RATE) used in downloadable data files
  2. API dot-notation paths (e.g., latest.admissions.admission_rate.overall) used in REST API queries
CSV Variable API Field Path Description
UNITID id Unique institution identifier (IPEDS)
OPEID ope8_id 8-digit OPE ID
INSTNM school.name Institution name
STABBR school.state State abbreviation
CONTROL school.ownership 1=public, 2=private nonprofit, 3=private for-profit
PREDDEG school.degrees_awarded.predominant Predominant degree type (3=bachelor's)
HIGHDEG school.degrees_awarded.highest Highest degree awarded

Admissions

CSV Variable API Field Path Description
ADM_RATE latest.admissions.admission_rate.overall Overall admission rate (admitted / applied)
ADM_RATE_ALL latest.admissions.admission_rate.by_ope_id Admission rate across all campuses
SAT_AVG latest.admissions.sat_scores.average.overall Average SAT equivalent score (combined)
SATVR25 latest.admissions.sat_scores.25th_percentile.critical_reading SAT reading 25th percentile
SATVR75 latest.admissions.sat_scores.75th_percentile.critical_reading SAT reading 75th percentile
SATVRMID latest.admissions.sat_scores.midpoint.critical_reading SAT reading midpoint
SATMT25 latest.admissions.sat_scores.25th_percentile.math SAT math 25th percentile
SATMT75 latest.admissions.sat_scores.75th_percentile.math SAT math 75th percentile
SATMTMID latest.admissions.sat_scores.midpoint.math SAT math midpoint
SATWR25 latest.admissions.sat_scores.25th_percentile.writing SAT writing 25th percentile
SATWR75 latest.admissions.sat_scores.75th_percentile.writing SAT writing 75th percentile
SATWRMID latest.admissions.sat_scores.midpoint.writing SAT writing midpoint
ACTCM25 latest.admissions.act_scores.25th_percentile.cumulative ACT composite 25th percentile
ACTCM75 latest.admissions.act_scores.75th_percentile.cumulative ACT composite 75th percentile
ACTCMMID latest.admissions.act_scores.midpoint.cumulative ACT composite midpoint
ACTEN25 latest.admissions.act_scores.25th_percentile.english ACT English 25th percentile
ACTMT25 latest.admissions.act_scores.25th_percentile.math ACT math 25th percentile

Student Demographics & Enrollment

CSV Variable API Field Path Description
UGDS latest.student.size Total undergraduate degree-seeking enrollment
UGDS_WHITE latest.student.demographics.race_ethnicity.white Share of enrollment that is white
UGDS_BLACK latest.student.demographics.race_ethnicity.black Share that is Black
UGDS_HISP latest.student.demographics.race_ethnicity.hispanic Share that is Hispanic
UGDS_ASIAN latest.student.demographics.race_ethnicity.asian Share that is Asian
UGDS_AIAN latest.student.demographics.race_ethnicity.aian Share that is American Indian / Alaska Native
UGDS_NHPI latest.student.demographics.race_ethnicity.nhpi Share that is Native Hawaiian / Pacific Islander
UGDS_2MOR latest.student.demographics.race_ethnicity.two_or_more Share that is two or more races
UGDS_NRA latest.student.demographics.race_ethnicity.non_resident_alien Share that is non-resident alien
UGDS_UNKN latest.student.demographics.race_ethnicity.unknown Share that is unknown race/ethnicity
UG latest.student.enrollment.all Total undergraduate enrollment (all students)

Completion & Retention

CSV Variable API Field Path Description
C150_4 latest.completion.completion_rate_4yr_150nt 6-year graduation rate (150% of normal time, 4-year institutions)
C150_4_WHITE latest.completion.completion_rate_4yr_150nt_white 6-year graduation rate for white students
C150_4_BLACK latest.completion.completion_rate_4yr_150nt_black 6-year graduation rate for Black students
C150_4_HISP latest.completion.completion_rate_4yr_150nt_hisp 6-year graduation rate for Hispanic students
C150_4_ASIAN latest.completion.completion_rate_4yr_150nt_asian 6-year graduation rate for Asian students
RET_FT4 latest.student.retention_rate.four_year.full_time First-year retention rate (full-time, 4-year institutions)
RET_PT4 latest.student.retention_rate.four_year.part_time First-year retention rate (part-time)

Financial: Cost & Aid

CSV Variable API Field Path Description
NPT4_PUB latest.cost.avg_net_price.public Average net price (public institutions, Title IV recipients)
NPT4_PRIV latest.cost.avg_net_price.private Average net price (private institutions, Title IV recipients)
NPT41_PUB latest.cost.net_price.public.by_income_level.0-30000 Net price for family income $0-$30K (public)
NPT45_PUB latest.cost.net_price.public.by_income_level.110001-plus Net price for family income $110K+ (public)
COSTT4_A latest.cost.attendance.academic_year Average cost of attendance (academic year)
COSTT4_P latest.cost.attendance.program_year Average cost of attendance (program year)
TUITIONFEE_IN latest.cost.tuition.in_state In-state tuition and fees
TUITIONFEE_OUT latest.cost.tuition.out_of_state Out-of-state tuition and fees
PCTPELL latest.aid.pell_grant_rate Share of undergraduates receiving Pell grants
PCTFLOAN latest.aid.federal_loan_rate Share receiving federal student loans
GRAD_DEBT_MDN latest.aid.median_debt.completers.overall Median debt at graduation (completers)
GRAD_DEBT_MDN_SUPP latest.aid.median_debt_suppressed.completers.overall Median debt (suppressed for privacy)

Earnings & Outcomes

CSV Variable API Field Path Description
MD_EARN_WNE_P6 latest.earnings.6_yrs_after_entry.median Median earnings 6 years after entry
MD_EARN_WNE_P10 latest.earnings.10_yrs_after_entry.median Median earnings 10 years after entry
MN_EARN_WNE_P6 latest.earnings.6_yrs_after_entry.mean_earnings Mean earnings 6 years after entry
MN_EARN_WNE_P10 latest.earnings.10_yrs_after_entry.mean_earnings Mean earnings 10 years after entry
COUNT_WNE_P6 latest.earnings.6_yrs_after_entry.working_not_enrolled.earnings_count Count of students in 6-year earnings cohort
RPY_3YR_RT_SUPP latest.repayment.3_yr_repayment.overall 3-year loan repayment rate

Faculty & Institutional

CSV Variable API Field Path Description
PFTFAC school.ft_faculty_rate Share of faculty that is full-time
AVGFACSAL school.faculty_salary Average faculty salary

3. API Access

Base URL

https://api.data.gov/ed/collegescorecard/v1/schools

Authentication

Query Structure

GET https://api.data.gov/ed/collegescorecard/v1/schools?
    api_key={KEY}
    &school.name=Harvard University
    &fields=id,school.name,latest.admissions.admission_rate.overall,latest.admissions.sat_scores.average.overall
    &per_page=100

Key Parameters

Parameter Description
fields Comma-separated list of fields to return
per_page Results per page (max 100)
page Page number for pagination
sort Sort by field (e.g., latest.admissions.admission_rate.overall:asc)
keys_nested=true Return JSON objects instead of dotted strings

Filtering

Rate Limits

Example: Fetch All 30 Simulation Colleges

bash proof:W3sidHlwZSI6InByb29mQXV0aG9yZWQiLCJmcm9tIjowLCJ0byI6OTQ5LCJhdHRycyI6eyJieSI6ImFpOmNsYXVkZSJ9fV0= curl "https://api.data.gov/ed/collegescorecard/v1/schools?\ api_key=YOUR_KEY&\ fields=id,school.name,\ latest.admissions.admission_rate.overall,\ latest.admissions.sat_scores.average.overall,\ latest.admissions.sat_scores.25th_percentile.critical_reading,\ latest.admissions.sat_scores.75th_percentile.critical_reading,\ latest.admissions.sat_scores.25th_percentile.math,\ latest.admissions.sat_scores.75th_percentile.math,\ latest.student.size,\ latest.student.demographics.race_ethnicity.white,\ latest.student.demographics.race_ethnicity.black,\ latest.student.demographics.race_ethnicity.hispanic,\ latest.student.demographics.race_ethnicity.asian,\ latest.completion.completion_rate_4yr_150nt,\ latest.student.retention_rate.four_year.full_time,\ latest.cost.avg_net_price.private,\ latest.aid.pell_grant_rate,\ latest.aid.median_debt.completers.overall,\ latest.earnings.10_yrs_after_entry.median&\ school.name=Harvard University&\ per_page=1"


4. Admission Rates & SAT Scores for 30 Simulation Colleges

Data Sources

HYPSM Tier

College ADM_RATE (Scorecard) ADM_RATE (Class 2029) SAT Middle 50% SAT Avg (est.)
Harvard 5% 4.18% 1460-1580 1520
Yale 5% ~5% 1460-1580 1520
Princeton 6% 4.42% 1450-1570 1510
Stanford 5% ~4% (TBA) 1420-1570 1500
MIT 7% 4.56% 1510-1580 1545

Ivy+ Tier

College ADM_RATE (Scorecard) ADM_RATE (Class 2029) SAT Middle 50% SAT Avg (est.)
Columbia 6% 4.94% 1470-1570 1520
UPenn 9% ~6% 1450-1570 1510
Brown 8% 5.65% 1440-1570 1505
Dartmouth 9% 6.02% 1440-1560 1500
Cornell 11% 8.38% 1400-1540 1470
Duke 8% 5.20% 1510-1560 1535
Northwestern 9% 7.00% 1430-1550 1490
UChicago 7% ~5% (est.) 1500-1570 1535
Caltech 7% 3.78% 1530-1580 1555

Near-Ivy Tier

College ADM_RATE (Scorecard) ADM_RATE (Class 2029) SAT Middle 50% SAT Avg (est.)
Johns Hopkins 9% 5.14% 1480-1570 1525
Vanderbilt 12% 4.6% 1460-1560 1510
Rice 11% 8.01% 1460-1570 1515
Notre Dame 19% 9% 1420-1560 1490
Georgetown 17% 12% 1380-1550 1465
Carnegie Mellon 17% 11.07% 1460-1560 1510
WashU 8% ~12% 1460-1560 1510

Selective Tier

College ADM_RATE (Scorecard) ADM_RATE (Class 2029) SAT Middle 50% SAT Avg (est.)
Emory 19% 10.30% 1380-1530 1455
Tufts 16% 10.81% 1380-1530 1455
Boston College 26% 13.85% 1330-1500 1415
UVA 23% 15.4% 1320-1510 1415
UCLA 14% 9.42% 1290-1520 1405
Michigan 17% ~16% 1340-1560 1450

Top LACs

College ADM_RATE (Scorecard) ADM_RATE (Class 2029) SAT Middle 50% SAT Avg (est.)
Williams 15% 8.5% 1410-1560 1485
Amherst 12% 7.72% 1410-1550 1480
Middlebury 22% 12.77% 1340-1520 1430

Key Observations

  1. Scorecard data lags: The Scorecard's ADM_RATE reflects IPEDS data from 1-2 years prior. Class of 2029 rates (2024-25 cycle) are significantly lower for most schools.
  2. Acceptance rates are plummeting: Almost every school dropped 3-10 percentage points from Scorecard data to Class of 2029 actuals.
  3. SAT ranges are remarkably compressed: At the top tier, virtually all admitted students score 1450+. The 25th-75th percentile range is only ~100-140 points wide.
  4. Test-optional effects: Many schools have test-optional policies, meaning reported SAT scores may be inflated (only score-advantaged students submit).

5. Simulation-Relevant Insights

What College Scorecard Provides That Our Sim Needs

Simulation Parameter Scorecard Field Notes
Acceptance rate ADM_RATE Use as baseline; adjust with Class of 2029 actuals
SAT score thresholds SAT_AVG, SATVR25/75, SATMT25/75 Middle 50% ranges for scoring calibration
ACT thresholds ACTCM25/75, ACTCMMID Alternative test score data
Student body size UGDS For modeling class size and yield
Demographics mix UGDS_WHITE/BLACK/HISP/ASIAN/2MOR/NRA For diversity modeling and hook calibration
Graduation rate C150_4 6-year completion rate by race for outcome modeling
Retention rate RET_FT4 Proxy for student satisfaction / institutional quality
Pell grant rate PCTPELL Socioeconomic diversity indicator
Net price NPT4_PUB/PRIV Financial aid modeling
Median debt GRAD_DEBT_MDN Post-graduation outcome modeling
Earnings MD_EARN_WNE_P10 Long-term ROI by institution

Fields Not in Scorecard That Our Sim Uses


6. Kaggle Availability

Primary Dataset

Alternative Kaggle Datasets

Recommendation for Simulation

For the most current data, use the API directly rather than Kaggle downloads. The Kaggle mirror is 8+ years stale. The API provides latest.* fields that always return the most current available data, and year-specific queries (e.g., 2023.admissions.*) for historical trends.


7. Data Integration Strategy for Simulation

  1. Use College Scorecard API to fetch current data for all 30 colleges (ADM_RATE, SAT ranges, UGDS demographics, C150_4, RET_FT4, PCTPELL)
  2. Override ADM_RATE with Class of 2029 actual rates where available (significantly more selective than Scorecard reports)
  3. Cross-reference with CDS (Common Data Set) for ED/EA acceptance rates, yield rates, and hook-specific data
  4. Supplement with CommonApp data for application volume trends and round-specific patterns

API Query for All 30 Colleges

The API doesn't support querying by a list of names in one call. Strategy:

Key UNITIDs for Simulation Colleges

College UNITID
Harvard 166027
Yale 130794
Princeton 186131
Stanford 243744
MIT 166683
Columbia 190150
UPenn 215062
Brown 217156
Dartmouth 182670
Cornell 190415

(Additional UNITIDs can be looked up via school.name queries)