Source: student_welfare_matching.md
The Gale-Shapley Deferred Acceptance (DA) algorithm (1962) produces a student-optimal stable matching when students propose: each student receives their most-preferred partner consistent with stability. The key properties:
Stability: No student-college pair mutually prefers each other over their assigned match
Strategy-proofness: Truthful preference reporting is a dominant strategy for the proposing side (students)
Optimality within stability: The student-proposing DA yields the best possible stable matching for students; no other stable matching is weakly preferred by all students
Lattice structure: The set of stable matchings forms a lattice, with student-optimal and college-optimal matchings at opposite extremes
NYC High School Match (2003)
Replaced an uncoordinated system where ~30,000 students were unassigned annually
Adopted student-proposing DA with single tiebreaking
Reduced unassigned students from 30,000 to ~3,000
Abdulkadiroglu, Pathak, and Roth found that simulations with field data favor single tiebreaking (breaking ties the same way at every school) for efficiency
Boston School Choice (2005)
Boston School Committee replaced the "Boston mechanism" (immediate acceptance) with DA
Under the old Boston mechanism, sophisticated parents strategically misrepresented preferences while unsophisticated parents (disproportionately low-income and minority) reported truthfully and were penalized
The switch to strategy-proof DA eliminated the "gaming advantage" of informed families
Abdulkadiroglu, Pathak, Roth, and Sonmez documented both sophisticated and unsophisticated strategic behavior, establishing fairness as a rationale for strategy-proof mechanisms
NRMP Medical Residency Match
Roth (1984) showed that NRMP had independently converged on a DA-equivalent algorithm
The match has operated stably since 1952, with periodic refinements (couples matching added in 1998)
U.S. college admissions does not use DA. Instead, it operates as a decentralized market with:
Students applying to multiple colleges simultaneously
Colleges making independent admission decisions
Multiple rounds (ED, EA, RD) creating a sequential matching structure
No centralized clearinghouse
This decentralized structure introduces information frictions, strategic complexity, and welfare losses that a centralized DA mechanism would partially address.
| Mechanism | Strategy-Proof | Stable | Pareto Efficient | Used Where |
|---|---|---|---|---|
| Student-proposing DA | Yes (for students) | Yes | No | NYC schools, Boston, NRMP |
| College-proposing DA | No (for students) | Yes | No | Theoretical |
| Top Trading Cycles (TTC) | Yes | No | Yes | Theoretical; some kidney exchange variants |
| Boston/Immediate Acceptance | No | No | No | Pre-2005 Boston, China (variants) |
| Serial Dictatorship | Yes | N/A | Yes | Simple assignment problems |
| Decentralized (current U.S.) | N/A | No | No | U.S. college admissions |
Pareto efficient and strategy-proof for students
Students can form "trading cycles" to swap assignments, leading to efficiency gains over DA
Not stable: can produce justified envy (a student prefers another school that would prefer them)
Abdulkadiroglu and Sonmez (2003) proposed TTC for school choice; it was considered but not adopted in Boston or NYC due to perceived fairness concerns about justified envy
When priority structures satisfy both strong acyclicity and Kesten-acyclicity, TTC and the Boston mechanism produce equivalent outcomes
Students rank schools; in each round, schools permanently accept top applicants up to capacity
Not strategy-proof: Parents must strategically rank "realistic" choices first, not true preferences
Sophisticated families game the system; unsophisticated families are harmed
Research on China's parallel college admissions (a Boston mechanism variant) found significant gender, rural-urban, and ethnic gaps in mismatching explained by risk aversion and information disadvantage
Some theoretical work suggests the Boston mechanism may produce higher aggregate welfare when all agents are fully strategic, but this assumption fails empirically
Recent theoretical work (Tang and Yu, 2014; Erdil and Ergin, 2008) proposes mechanisms that achieve Pareto improvements over student-optimal DA without sacrificing strategy-proofness
These involve finding "stable improvement cycles" -- groups of students who can swap assignments while maintaining stability
Practical significance: even small efficiency gains can matter at scale
The decentralized U.S. college admissions market is none of these mechanisms -- it lacks strategy-proofness, stability, and efficiency. This creates space for modeling:
How much welfare is lost vs. a centralized DA mechanism?
How does information asymmetry compound these losses?
Which students bear disproportionate welfare costs?
"Agent-Based Simulation Models of the College Sorting Process" Published in Journal of Artificial Societies and Social Simulation (JASSS), Vol. 19, Issue 1.
Model Architecture:
8,000 students, 40 colleges, 150 seats per college (75% capacity utilization)
Two student attributes: "resources" (socioeconomic capital) and "caliber" (academic achievement), bivariate normal with correlation 0.3
One college attribute: "quality" (running average of enrolled student caliber)
Three-stage annual cycle: application, admission, enrollment
Key Parameters:
| Parameter | Value | Source |
|---|---|---|
| Resource-caliber correlation | 0.3 | ELS:2002 |
| Quality reliability | 0.7 + 0.1 x resources | Plausible estimate |
| Caliber enhancement | +0.1 x resources | Test prep literature |
| Application count | 4 + 0.5 x resources | ELS:2002 |
Information Model:
Students observe college quality with noise; noise decreases with resources
Students observe their own caliber with some error
Information quality = 0.7 + 0.1 x resources (wealthy students have better information)
Admission Model:
Colleges rank by observed caliber and admit based on expected yield
Yield estimated from 3-year running average
Colleges adjust admission volume to fill seats
Key Findings:
Relevance: This is the closest published model to the college-sim project architecture. Key differences from our simulator: Reardon et al. use continuous distributions rather than archetype-based student generation, and a simpler two-attribute student model.
"Agent-Based Simulation for University Students Admission: Medical Colleges in Jordan Universities"
Built in NetLogo v6.3
Two agents: high school students, medical colleges
Parameters: family income, high school GPA
Focused on seat allocation fairness
Found that high-ranking universities consistently set high GPA cutoffs
Simulated both partially centralized (each university sets cutoffs) and fully centralized (central authority allocates) scenarios
"Student Behaviors in College Admissions: A Survey of Agent-Based Models" Published in International Journal of Emerging Multidisciplinaries.
Comprehensive survey of ABM approaches to college admissions
Identified common patterns: two agent types (students, colleges), three-stage matching (application, admission, enrollment)
Highlighted how family resources impact application strategy and outcomes
Emphasized the role of ABM in studying fairness and equity
"A Toy Model of College Admissions"
50 colleges, 100 seats each
Students modeled with normally distributed ability W ~ N(0,1)
Noisy signals sent to colleges
Utility function: u_i(k) = I_k^(-beta) + gamma(K - k)
Students solve portfolio optimization: maximize expected utility minus application costs
Found application volume concentrates at selective colleges; information cascades amplify competitive pressure
Reardon et al. (2015) extended the base ABM to study affirmative action policy effects, simulating race-based and socioeconomic-based policies
Matching Impacts of School Admission Mechanisms (ResearchGate, 2016): compared DA, Boston, and TTC mechanisms using agent-based simulation, measuring mismatch and welfare outcomes
Lee et al. (2023, Cornell): used learned admission-prediction models as replacement for standardized tests; calibration-focused approach
"The Missing 'One-Offs': The Hidden Supply of High-Achieving, Low-Income Students" NBER Working Paper 18586.
Key Findings:
25,000-35,000 low-income students annually have SAT/ACT scores and GPAs in the top 10% nationally
The vast majority do not apply to any selective college, despite being admissible
These students are geographically dispersed ("one-offs") in small towns, not concentrated in urban areas where selective colleges recruit
Selective institutions would often cost them less than non-selective alternatives due to generous financial aid
High schools serving these students have overworked counselors unfamiliar with selective admissions
Student Typology:
"Achievement-typical" low-income students: application behavior mirrors high-income peers with similar achievement (only 8% of high-achieving low-income students)
"Income-typical" low-income students: application behavior mirrors other low-income students regardless of achievement (the vast majority, ~92%)
"Expanding College Opportunities for High-Achieving, Low-Income Students"
Intervention Design:
Low-cost information packet sent to 39,682 high-achieving, low-income students (2010-2012)
Included: application guidance, financial aid information, fee waivers, college resource/graduation data
Cost: approximately $6 per student
Results:
Treated students were 46% more likely to enroll at peer-quality institutions matching their abilities
Institutions attended had graduation rates 15.1% higher on average
Instructional spending was 21.5% higher at enrolled institutions
Benefit-to-cost ratio was "extremely high, even under the most conservative assumptions"
Impact was 275x greater than equivalent spending on in-person counseling
Implication: Information intervention alone dramatically reduces undermatching. The problem is primarily informational, not financial or academic.
Key Findings:
Mismatch is driven primarily by student application and enrollment decisions, not college admission decisions
Most mismatched students either never applied to well-matched schools or were accepted but chose differently
Financial constraints, information access, and public college options all affect mismatch probability
More information = less mismatch; lower socioeconomic backgrounds = less information = more undermatch
"Match or Mismatch? Automatic Admissions and College Preferences of Low- and High-Income Students" NBER Working Paper 22559.
Studied Texas top 10% automatic admissions policy
Low-income students still undermatch even with guaranteed admission
Preferences, not access, drive much of the remaining mismatch
"Conceptual and Methodological Problems in Research on College Undermatch"
Challenged assumptions in undermatching research
Argued that definitions of "match" are often arbitrary
Questioned whether attending a more selective institution is always welfare-improving
Important caveat for simulation design: how we define "optimal match" matters
"Bright but Poor: Undermatching in the Access to Postsecondary Education" American Educational Research Journal.
Extended undermatching analysis to international contexts
Confirmed that socioeconomic status is a persistent predictor of undermatching across different educational systems
Empirical evidence on outcomes:
Based on the literature, these are the critical parameters for modeling student welfare in a college admissions simulation:
| Parameter | Literature Value | Source |
|---|---|---|
| Resource-caliber correlation | 0.3 | Reardon et al. (ELS:2002) |
| Information quality (low-resource) | 0.7 base | Reardon et al. |
| Information quality (high-resource) | 0.7 + 0.1 x resources | Reardon et al. |
| Application count (low-resource) | 4 applications | Reardon et al. (ELS:2002) |
| Application count (high-resource) | 4 + 0.5 x resources (up to ~7) | Reardon et al. (ELS:2002) |
| Caliber enhancement from resources | +0.1 x resources | Test prep literature |
| Undermatching rate (low-income, high-achieving) | ~92% income-typical behavior | Hoxby & Avery (2012) |
| Information intervention effect | +46% peer enrollment | Hoxby & Turner (2013) |
| Parameter | Literature Value | Source |
|---|---|---|
| Yield estimation window | 3-year running average | Reardon et al. |
| Admission volume adjustment | Based on prior year fill rate | Reardon et al. |
| Quality metric | Weighted average enrolled caliber | Reardon et al. |
| ED yield boost | Binding commitment ~90%+ yield | Common knowledge |
| Parameter | Description | Typical Range |
|---|---|---|
| Stability | % of matched pairs with no blocking pair | 85-95% in decentralized markets |
| Pareto efficiency | % of students who could improve without harming others | DA achieves ~85-90% of optimal |
| Undermatching rate | % of students at institutions below their caliber | 20-40% depending on definition |
| Strategic behavior prevalence | % of students who misrepresent preferences | 10-30% under non-strategy-proof mechanisms |
The current simulator uses deterministic scoring. The literature strongly suggests adding:
Student perception noise: Students should have imperfect knowledge of their admission probability at each college, with noise inversely correlated with socioeconomic status
Application portfolio optimization: Students should choose where to apply based on perceived probability x perceived utility, not perfect knowledge
Counselor quality: High school counselor quality (varying by school type) should influence which colleges students consider
Implementation suggestion: Add a perceptionNoise parameter to each student archetype. Elite prep school students get low noise (0.05-0.1); rural/under-resourced students get high noise (0.3-0.5). This single parameter captures much of the Reardon et al. information asymmetry finding.
Based on Hoxby and Avery:
Income-typical behavior: 92% of high-achieving low-income students should exhibit application patterns matching their income cohort, not their achievement cohort
Achievement-typical behavior: Only 8% of such students apply like high-achieving high-income peers
Geographic isolation: Students at rural or under-resourced high schools should have shorter college consideration lists biased toward local/state options
Implementation suggestion: When generating application lists for students from under-resourced high schools, apply a "consideration set filter" that removes colleges the student has never heard of (probability based on distance, marketing reach, and school counselor quality).
Add post-simulation welfare analysis:
Match quality: For each student, compute the gap between their enrolled college's tier and their "optimal" placement based on academic index
Undermatching rate: Percentage of students enrolled at colleges 1+ tiers below their academic qualification
Overmatching rate: Percentage enrolled 1+ tiers above (these students face academic mismatch risk)
Welfare by demographic: Break down match quality by student archetype, high school type, hook status
Counterfactual DA comparison: Run the same student population through a centralized DA mechanism and compare aggregate welfare
Colleges should adjust behavior over simulation runs:
Track acceptance rate vs. target enrollment
Adjust number of offers based on historical yield
This creates the dynamic feedback loop that Reardon et al. found drives equilibrium convergence (10-20 iterations)
Not all students are equally strategic:
Sophisticated applicants (high-SES, well-counseled): optimize application portfolios, use ED strategically, apply to safety/target/reach spread
Naive applicants (low-SES, poorly counseled): apply to too few schools, skip safeties, miss ED advantages, or apply only to local/familiar options
The Boston mechanism literature shows this heterogeneity causes the most welfare damage under non-strategy-proof mechanisms
For research validity, implement an optional mode where:
All students submit truthful preference rankings
All colleges submit preference rankings
A centralized DA algorithm produces the student-optimal stable matching
Compare this benchmark to the decentralized simulation outcome
This would allow measuring the "price of decentralization" in student welfare terms.
Validate the simulation against known empirical patterns:
Acceptance rate vs. yield rate correlation should match IPEDS data
Proportion of students within 1 tier of their "match" should be 60-80%
Low-SES undermatching rate should be 2-4x higher than high-SES
ED acceptance rate advantage should be 2-3x regular admission at top schools
Hook multiplier effects should produce demographic compositions matching published CDS data