Kaggle Survey: Financial Aid, Net Price, Yield Rates & Enrollment Decisions

Source: kaggle_survey_financial_aid.md


Kaggle Survey: Financial Aid, Net Price, Yield Rates & Enrollment Decisions

Overview

Survey of Kaggle datasets and related research relevant to modeling student yield decisions, financial aid elasticity, and net price impacts on enrollment in the college admissions simulation.


1. Primary Kaggle Datasets

1.1 US Dept of Education: College Scorecard

Key Financial Aid & Net Price Fields (from College Scorecard data dictionary):

Variable Description
cost_tuition_in In-state tuition and fees
cost_tuition_out Out-of-state tuition and fees
cost_books Estimated books and supplies
cost_room_board_on On-campus room and board
cost_room_board_off Off-campus room and board
cost_avg (NPT4) Average net price for Title IV institutions
cost_avg_income_0_30k (NPT41) Average net price, family income $0-$30K
cost_avg_income_30_48k (NPT42) Average net price, family income $30K-$48K
cost_avg_income_48_75k (NPT43) Average net price, family income $48K-$75K
cost_avg_income_75_110k (NPT44) Average net price, family income $75K-$110K
cost_avg_income_110k_plus (NPT45) Average net price, family income $110K+
rate_admissions Admission rate
n_undergrads Undergraduate enrollment
rate_completion Completion rate (4-year, first-time, full-time)
amnt_earnings_med_10y Median earnings 10 years after entry

Relevance to Simulation: Net price by income bracket directly maps to financial aid modeling. The NPT41-NPT45 breakdown shows how aid varies by family income tier, useful for calibrating yield differences across income groups.

Limitations: Does not include per-institution yield rate, Pell grant counts, or institutional grant breakdowns in the simplified R dataset. The raw College Scorecard files (589 MB) contain 3,000+ columns with much more detail.


1.2 College Tuition, Diversity, and Pay

Key Fields:

Relevance to Simulation: Net cost by income bracket is directly useful for yield modeling. Price trends from 1985-2016 provide historical context. Salary data can inform return-on-investment calculations that affect student choice.


1.3 Elite College Admissions (Opportunity Insights)

Key Fields:

Field Description
Attendance rate Raw and test-score-reweighted
Application rate By income bracket
Admission rates By income bracket
Matriculation rates By income bracket (this IS yield rate)
Parental income Via tax records, 13 income percentile bins
SAT/ACT scores By income bracket
Earnings Post-graduation by income bracket

College Tiers (6 categories):

  1. Ivy Plus
  2. Other elite schools (public & private)
  3. Highly selective public/private
  4. Selective public/private

Relevance to Simulation: This is the MOST directly relevant dataset. Contains:

Key Insight: The 13 income bins (up to 99th-99.9th percentile and top 1%) allow modeling how wealthy families respond differently to admissions offers than low-income families -- critical for yield modeling.


1.4 US College Data

Key Fields:

Field Description
Apps Number of applications received
Accept Number accepted
Enroll Number enrolled
Private Public vs private indicator
Outstate Out-of-state tuition
Room.Board Room and board costs
Expend Instructional expenditure per student
Grad.Rate Graduation rate
Top10perc % students from top 10% of HS class
Top25perc % students from top 25% of HS class
perc.alumni % of alumni who donate

Yield Rate: Calculable as Enroll / Accept for each institution.

Relevance to Simulation: Simple dataset with raw admit/enroll numbers. Yield can be computed directly. Covers 777 schools including elite ones.


1.5 College Common Data Sets

Key CDS Sections (standard format):

Colleges Include: Cornell, Carnegie Mellon, Pomona, Smith, Wellesley, Colby, Rensselaer, Michigan Tech, and ~165 others.

Relevance to Simulation: CDS Section H is the gold standard for financial aid data. Contains need-based vs merit aid breakdowns, percentage of need met, and average aid packages -- exactly what we need for yield modeling. Section C provides admit/enrolled for yield calculation.


1.6 College Performance, Debt and Earnings

Key Fields: Cost of attendance, average salary after graduation, loan repayment rates, gainful employment rates, student demographics, faculty diversity, campus cultural climate.

Relevance to Simulation: Student debt and earnings outcomes affect perceived value of attendance, which influences yield decisions.


1.7 Post Secondary Education Data (IPEDS)

Relevance to Simulation: IPEDS is the comprehensive federal data source. Contains admissions, enrollment, financial aid, and institutional characteristics. The Kaggle version may not have all 250+ IPEDS variables but provides a cleaned subset.


1.8 American University Data (IPEDS)

Relevance to Simulation: Designed for predicting enrollment rates, directly applicable to yield modeling.


1.9 College Enrollment Demographics 2021

Key Fields: UNITID (IPEDS ID), enrollment by level, full-time/part-time, first-time/transfer/continuing, gender, 9 race/ethnicity categories.

Limitations: Enrollment counts only, no financial aid data.


1.10 College Scorecard (Devastator version)

Key Fields: UNITID, INSTNM, location fields, NPCURL (Net Price Calculator URL), enrollment totals, retention rate, degree info, Carnegie Classification, minority-serving institution flags.


2. IPEDS Data (Primary Federal Source)

IPEDS Admissions Component (ADM)

IPEDS Student Financial Aid Component (SFA)

Access


3. Research Findings: Financial Aid Elasticity

Avery & Hoxby (2003) - "Do Financial Aid Packages Affect College Choices?"

Key Findings:

Price Elasticity Estimates (Multiple Studies)

Study/Context Price Elasticity Notes
Aggregate (all 4-year) -0.44 Own-price elasticity
Public universities -1.058 More price-sensitive
Private institutions -0.6414 Less price-sensitive
Full-paying students (selective) -0.76 At selective colleges
Financial-aid students (selective) -1.18 More responsive to price changes
Occidental College (individual) -0.72 Single institution study

Income Elasticity


4. Implications for Simulation Parameters

Current Simulation Parameters (for reference)

Calibration Recommendations Based on Data

Yield Rate Sources:

Financial Aid Elasticity:

Net Price by Income:

Income-Differentiated Yield:

Suggested Parameter Updates

  1. Differentiate yield by income bracket: High-income yield ~65-75% at HYPSM, low-income yield ~85-95% at HYPSM
  2. Aid elasticity should vary by student type:

  3. Full-pay families: $1K = ~1-2pp yield change (low sensitivity)

  4. Aid-receiving families: $1K = ~3-5pp yield change (moderate sensitivity)

  5. Low-income families at schools not meeting full need: $1K = ~5-8pp yield change

  6. Merit aid matters more at non-HYPSM schools: Named scholarships have psychological value beyond dollar amount
  7. Consider gross tuition signal: Students perceive "expensive" schools differently even when net price is equal

5. Dataset Comparison Matrix

Dataset Yield Data Financial Aid Net Price by Income Elite Colleges Size
College Scorecard No Partial Yes (5 brackets) Yes 589 MB
Tuition/Diversity/Pay No Net cost Yes (by bracket) Yes 2 MB
Elite College Admissions Yes (by income) No No Yes (139) 371 KB
US College Data Calculable No No Partial (777) 32 KB
College Common Data Sets Yes (Section C) Yes (Section H) Partial Partial (173) 220 MB
IPEDS (direct) Yes (ADM) Yes (SFA) Yes Yes (7,000+) Custom
College Perf/Debt/Earnings No Partial No Yes 17 MB
Post-Secondary (IPEDS) Unknown Unknown Unknown Yes 19 MB

For the simulation, the most actionable approach:

  1. Elite College Admissions (Opportunity Insights) -- yield by income at elite schools
  2. College Common Data Sets -- financial aid breakdown (need vs merit, % need met)
  3. College Scorecard (NPT4 fields) -- net price by income bracket
  4. IPEDS direct download -- admission yield + financial aid for all 30 simulation colleges
  5. Avery/Hoxby research -- elasticity parameters for aid-enrollment relationship