Kaggle & Web Survey: Application Volumes, CommonApp Trends, and Strategy

Source: kaggle_survey_applications.md


Kaggle & Web Survey: Application Volumes, CommonApp Trends, and Strategy

1. CommonApp Official Data (Primary Source)

Applications Per Applicant - Historical Trend

Cycle Apps/Applicant Applicants Total Apps Notes
2013-14 4.63 ~820K ~3.8M Pre-pandemic baseline
2014-15 ~4.8 ~850K ~4.1M Steady growth
2015-16 ~4.9 ~880K ~4.3M Steady growth
2016-17 ~5.0 ~900K ~4.5M Crossed 5.0 threshold
2017-18 ~5.1 ~920K ~4.7M Continued upward
2018-19 ~5.2 ~950K ~4.9M Pre-COVID
2019-20 ~5.4 ~960K ~5.2M COVID begins (spring)
2020-21 ~5.6 ~1.0M ~5.6M Test-optional boom
2021-22 6.22 ~1.1M ~6.8M Sharp post-COVID jump
2022-23 6.41 ~1.2M ~7.7M Continued acceleration
2023-24 6.65 1.42M ~9.4M 9.4M+ applications
2024-25 6.80 ~1.50M 10.19M Surpassed 10M for first time

Source: CommonApp End-of-Season Reports (https://www.commonapp.org/about/reports-and-insights/)

Key Takeaways for Simulation

Growth by Selectivity Tier (2024-25)

Implication: Students are "applying wider" -- adding more match/safety schools while maintaining reach applications.


2. Early Decision / Early Action Statistics

ED Acceptance Rates vs. Overall Rates (2024-25 Cycle, Class of 2029)

School ED Rate EA Rate Overall Rate ED Advantage
Harvard -- ~9% (REA) 3.6% 2.5x
Yale -- 10.8% (SCEA) 4.5% 2.4x
Princeton -- ~10% (SCEA) ~4% ~2.5x
Stanford -- ~8% (REA) ~3.9% ~2.1x
MIT -- 5.2% (EA) 4.5% 1.2x
Columbia 13.2% -- 3.9% 3.4x
UPenn 14.2% -- 5.4% 2.6x
Brown 14.4% -- 5.4% 2.7x
Dartmouth 19.1% -- 5.4% 3.5x
Cornell ~18% -- ~8% ~2.3x
Duke 19.7% -- 6.7% 2.9x
Northwestern 23% -- 7.7% 3.0x
UChicago ~20% ~6% (EA) ~5% ~4x ED
Rice 16.8% -- 7.9% 2.1x
Vanderbilt ~20% -- ~6% ~3.3x
Johns Hopkins 11% -- ~7% 1.6x
Notre Dame -- 12.9% (REA) 11.2% 1.2x
Georgetown -- ~15% (EA) 12.9% 1.2x
Emory 23.2% -- 10.2% 2.3x
WashU 25.2% -- 12% 2.1x
UVA 27.9% ~16% 16.8% 1.7x ED
USC -- 7.1% 9.8% 0.7x (EA lower)
Williams 23.3% -- 8.3% 2.8x
Amherst 29.3% -- 9% 3.3x
Middlebury 30.5% -- 10.7% 2.9x

Source: CollegeVine (https://blog.collegevine.com/ed-and-ea-acceptance-rates), Spark Admissions, College Kickstart

Average ED Advantage

Percentage of Class Filled by ED

School % of Class Filled by ED Source
Duke 49-51% Duke Chronicle, Ivy Coach
UPenn ~50% Common knowledge, multiple sources
Brown ~45% Multiple sources
Cornell ~49% Multiple sources
Dartmouth ~45% Multiple sources
Northwestern ~50% Estimates
WashU ~55% Estimates
Emory ~40% Estimates
Vanderbilt ~50% Estimates

Simulation uses 40-60% -- well aligned with real data.


3. Application Strategy: Reach/Match/Safety Ratios

Expert Recommendations

Selectivity Categories (CollegeVine / Appily / Princeton Review)

Observed Patterns

Source: Appily (https://www.appily.com/guidance/articles/finding-your-college/what-are-safety-reach-and-match-schools), CollegeVine, Princeton Review


4. Kaggle Datasets Surveyed

4a. College Admissions (Samson Qian)

4b. US College Data (yashgpt)

4c. Post Secondary Education Data - IPEDS (hark99)

4d. College Admission Data Set (pandanup)

4e. Student Admission Dataset (amanace)

4f. College Admission Dataset (darkhorse3141)

4g. University Admission Dataset (farhansadeek)


5. Self-Reported Outcome Datasets

5a. r/collegeresults (Reddit)

5b. College Kickstart

5c. ChanceyNN (GitHub)

5d. CollegeData.com Scraper


6. Alternative Official Sources

6a. College Scorecard (U.S. Department of Education)

6b. NACAC State of College Admission

6c. CommonApp Reports & Insights


7. Simulation Parameter Validation

Current Simulation Parameters vs. Real Data

Parameter Simulation Value Real Data Status
Avg apps/student 6.8 6.80 (CommonApp 2024-25) EXACT MATCH
ED fills % of class 40-60% 40-55% for Ivy+, up to 60% for some WELL CALIBRATED
ED acceptance multiplier (varies by college) 1.6-3.5x vs. overall rate CHECK: may need per-school tuning
Hook: athlete multiplier 3.5x Not directly comparable (recruited vs. non-recruited) REASONABLE
Hook: donor multiplier 4.0x No public data; anecdotal support PLAUSIBLE
Hook: legacy multiplier 2.5x Some schools phasing out legacy; historically ~2-3x REASONABLE
Hook: first-gen multiplier 1.4x Growing institutional priority; modest boost REASONABLE

Suggested Improvements

  1. Application count distribution: Currently flat 6.8 average. Consider modeling:

  2. High-achievers: 8-12 apps (heavily weighted to reaches)

  3. Average students: 5-8 apps (balanced reach/match/safety)

  4. Low-income/first-gen: potentially fewer apps (4-6) due to fee waiver limits

  5. Application round distribution: Model based on school-level data:

  6. ~12-15% of all applicants use ED somewhere

  7. ~30-40% use EA at one or more schools

  8. ~60-70% submit at least one RD application

  9. Selectivity-aware list building: Students should apply to 30% reaches, 40% matches, 30% safeties
  10. Year-over-year growth: Could model application inflation at ~3-4% per year