Kaggle & Web Survey: SAT/ACT Test Score Datasets

Source: kaggle_survey_test_scores.md


Kaggle & Web Survey: SAT/ACT Test Score Datasets

Purpose

Identify datasets to validate and improve the simulation's SAT score distributions by school type (elite boarding ~1479, elite day ~1424, public magnet ~1380, etc.).


Tier 1: Most Useful for the Simulation

1. US School Scores (SAT + ACT, State-Level, Multi-Year)

2. Elite College Admissions (Opportunity Insights / Chetty Data)

3. SAT Test Results Over the Years -- California

4. College Board SAT Suite Annual Reports (Official, Non-Kaggle)


Tier 2: Useful Supporting Data

5. New York City SAT Results

6. Average SAT Scores for NYC Public Schools

7. SAT Score Data By State

8. SAT and GPA Data (OpenIntro)


Tier 3: Tangentially Useful / Lower Priority

9. College Board SAT Dataset

10. College Exam Results (SAT)

11. College Admissions (Samson Qian)

12. College Admission Data Set


ACT-Specific Resources

13. ACT Graduating Class Data (Official, Non-Kaggle)

14. NCES ACT Scores by State


Relevance Summary for Our Simulation

Dataset Granularity SAT Data? Years Usefulness
US School Scores State + Income Yes (Math/Verbal) 2005+ HIGH
Elite College Admissions Univ x Income Yes (50-pt bands) 2010-2015 VERY HIGH
California SAT Results School/District Yes (CR/M/W) 2015-16 HIGH
College Board Annual Reports National/State Yes (full dist.) 2023-2024 VERY HIGH
NYC SAT Results (2012) School Yes (by section) 2012 MEDIUM
NYC High Schools (2014-15) School Yes + demographics 2014-15 MEDIUM
SAT by State (Kruschke) State Yes Unknown LOW-MEDIUM
OpenIntro SAT/GPA Student Percentiles + GPA Unknown MEDIUM
ACT Graduating Class State/National ACT only 10 years MEDIUM
NCES ACT by State State ACT only 2013, 2017 MEDIUM

Key Validation Opportunities

1. School-Type SAT Distributions

Current model values: Elite boarding ~1479, Elite day ~1424, Public magnet ~1380, Competitive suburban ~1310, Average suburban ~1190, Average public ~1130, Rural ~1080, Under-resourced urban ~1010

Best datasets: California school-level data (#3) to compare top public magnets vs. average schools; NYC data (#5, #6) to compare specialized exam schools vs. neighborhood schools.

2. Income-Tier Score Stratification

Best datasets: US School Scores (#1) with income-bracket breakdowns; Elite College Admissions (#2) with 13 income percentile levels cross-tabbed with SAT bands.

3. Overall Distribution Shape

Best datasets: College Board Annual Reports (#4) -- the ground truth for national SAT percentile curves. Validates whether our simulated score distribution matches the real-world shape.

4. SAT-to-Admission Mapping

Best datasets: Elite College Admissions (#2) -- directly shows which SAT score bands map to admission/attendance at each college tier (Ivy Plus, other elite, selective). Directly comparable to our tier structure.


  1. Download College Board 2024 Annual Report -- Extract percentile tables and score distribution curves for the new 1600-scale SAT
  2. Download Elite College Admissions dataset -- Cross-tab SAT bands x college tier to validate our scoring/admission model
  3. Download US School Scores -- Use income-bracket SAT breakdowns to refine school-type mean scores
  4. Download California school-level SAT data -- Compare individual school averages against our school-type categories
  5. Download NYC SAT datasets -- Spot-check specialized exam school vs. average school spreads