Methodology

Data sources, sample definitions, estimation approaches, and robustness checks.

1. Data Sources

Dataset Agency Years Role N per Year
ACS PUMS Census Bureau 2015-2019, 2021-2023 Primary wage estimates ~900K
CPS ASEC Census/BLS 2015-2023 Cross-check + selection ~50-60K
SIPP Census Bureau 2023 Additional cross-check 182,658
ATUS BLS Pooled Time-use mechanisms Various
SCE NY Fed Public chart series Wage expectations Various
NLSY79/97 BLS Cohort Background + skills ~2,500

2. Sample Definition

3. Estimation Approaches

Method Description
Sequential OLS Progressively adds control blocks to measure how much each absorbs; primary approach
Gelbach decomposition Order-invariant attribution of how much each covariate block moves the female coefficient (Gelbach 2016); resolves the sequential ladder’s order-sensitivity
Oaxaca-Blinder Supplemental decomposition of the gap into explained and unexplained components; not the headline reporting series
Double/debiased ML Flexible nuisance-model adjustment with elastic net; sensitivity check
Employment-selection correction Inverse probability weighting to separate worker wage gaps from total earnings gaps
Survey uncertainty Successive-difference replication using 80 ACS replicate weights

4. Sequential OLS Control Blocks

Each step adds a block of controls to measure how much of the raw gap it absorbs. The order matters — the Gelbach decomposition (below) provides the order-invariant version.

Step Question Controls Added
Raw gap How large is the unadjusted gap? Female indicator only
Demographics Does the gap survive basic demographic controls? + age, age², race/ethnicity, education
Geography Are women concentrated in lower-paying states? + state
Job sorting How much does occupation/industry choice explain? + occupation, industry, class of worker
Schedule Do hours and work arrangement differences matter? + usual hours, work from home, commute
Family Does marriage and parenthood status absorb more? + marital status, children, children under 5

5. Survey Uncertainty

5b. Reproductive Extension

Extends the baseline ladder with reproductive-stage, job-context, interaction, and household-sensitivity controls.

Step Question Controls Added
Reproductive stage Does the gap vary by parenthood timing and couple type? + recent birth, recent marriage, own child under 6, own child 6–17, couple type, reproductive stage
Job context Do schedule rigidity and autonomy matter independently? + O*NET autonomy, schedule unpredictability, time pressure, coordination responsibility, physical proximity, job rigidity
Interactions Where specifically does the gap concentrate? + female × recent birth, female × child under 6, female × recent marriage, female × same-sex, female × autonomy, female × job rigidity, female × child under 6 × job rigidity. (Because autonomy enters job rigidity with a negative weight, the standalone female × autonomy term is a residual conditional slope.)
Household sensitivity How much more do linked household-composition and partner-resource variables absorb? + multigenerational household, other adults present, then partnered-sample checks with partner employment and partner wage

Household composition is descriptive, and partner-resource variables are partly post-market. They are reported as sensitivity checks rather than folded into the headline sequential ladder.

Sensitivity panel Added controls Baseline Gap Augmented Gap Change
Household composition + other adults present 10.8% 11.0% +0.1 pp
Partner resources + partner employed, partner wage 15.1% 17.9% +2.8 pp

In the current public ACS extract, MULTG is unavailable, so the fitted household-composition row uses other_adults_present only. relative_earnings is excluded because it is built from the respondent’s own wage.

5c. Gelbach Decomposition

The sequential ladder is order-sensitive: the amount each block “explains” depends on when it enters. The Gelbach (2016) decomposition solves this by computing each block’s contribution in an order-invariant way. The identity is exact: base coef − full coef = ∑ block contributions.

Block What it measures Mean Δ (2015–2023) Share of explained
Job sorting How much do occupation/industry choices move the female coefficient? −0.084 ~70%
Reproductive burden How much does parenthood timing, couple type, and reproductive stage move it? −0.036 ~30%
Job context How much do schedule rigidity and autonomy move it, independently of sorting? −0.016 ~13%
Schedule Do hours and work arrangements absorb or reveal more of the gap? +0.017 −14%
Geography Are women concentrated in lower-paying states? −0.002 ~2%
Family Does generic marital/child status add anything beyond reproductive stage? +0.002 ~0%

Reproductive decomposition (demographics as base, full model includes reproductive stage + job context). Means are across 8 ACS years (2015–2023). Percentages sum to >100% because schedule works in the opposite direction (controlling for hours reveals a larger penalty). SEs are approximate (delta method, ignoring cross-term covariance). Identity residuals are <10−13 in every year.

5d. Variance & Distributional Analysis

Analysis Question
Raw and residual dispersion How much of the variance gap survives full controls? (Ratio, P90/P10, P95/P50, top-earner shares)
Selection-corrected variance Does employment selection explain the dispersion gap? (IPW reweighting)
Dispersion by reproductive stage Does variance compress for mothers vs. childless workers?
Dispersion by job rigidity Does the variance ratio flip in rigid-schedule occupations?

6. Robustness Checks

7. Limitations