Methodology
Data sources, sample definitions, estimation approaches, and robustness checks.
1. Data Sources
| Dataset | Agency | Years | Role | N per Year |
|---|---|---|---|---|
| ACS PUMS | Census Bureau | 2015-2019, 2021-2023 | Primary wage estimates | ~900K |
| CPS ASEC | Census/BLS | 2015-2023 | Cross-check + selection | ~50-60K |
| SIPP | Census Bureau | 2023 | Additional cross-check | 182,658 |
| ATUS | BLS | Pooled | Time-use mechanisms | Various |
| SCE | NY Fed | Public chart series | Wage expectations | Various |
| NLSY79/97 | BLS | Cohort | Background + skills | ~2,500 |
2. Sample Definition
- Age 25-54 (prime working age)
- Wage/salary workers only
- Positive hours and positive pay
- Excludes active-duty military and self-employed
3. Estimation Approaches
| Method | Description |
|---|---|
| Sequential OLS | Progressively adds control blocks to measure how much each absorbs; primary approach |
| Gelbach decomposition | Order-invariant attribution of how much each covariate block moves the female coefficient (Gelbach 2016); resolves the sequential ladder’s order-sensitivity |
| Oaxaca-Blinder | Supplemental decomposition of the gap into explained and unexplained components; not the headline reporting series |
| Double/debiased ML | Flexible nuisance-model adjustment with elastic net; sensitivity check |
| Employment-selection correction | Inverse probability weighting to separate worker wage gaps from total earnings gaps |
| Survey uncertainty | Successive-difference replication using 80 ACS replicate weights |
4. Sequential OLS Control Blocks
Each step adds a block of controls to measure how much of the raw gap it absorbs. The order matters — the Gelbach decomposition (below) provides the order-invariant version.
| Step | Question | Controls Added |
|---|---|---|
| Raw gap | How large is the unadjusted gap? | Female indicator only |
| Demographics | Does the gap survive basic demographic controls? | + age, age², race/ethnicity, education |
| Geography | Are women concentrated in lower-paying states? | + state |
| Job sorting | How much does occupation/industry choice explain? | + occupation, industry, class of worker |
| Schedule | Do hours and work arrangement differences matter? | + usual hours, work from home, commute |
| Family | Does marriage and parenthood status absorb more? | + marital status, children, children under 5 |
5. Survey Uncertainty
- ACS: 80 replicate weights, successive-difference replication
- Average 90% margin of error: 0.29 percentage points
- Raw gap SE: ~0.19 pp (2023)
- Fully-adjusted female coefficient SE: 0.0017
5b. Reproductive Extension
Extends the baseline ladder with reproductive-stage, job-context, interaction, and household-sensitivity controls.
| Step | Question | Controls Added |
|---|---|---|
| Reproductive stage | Does the gap vary by parenthood timing and couple type? | + recent birth, recent marriage, own child under 6, own child 6–17, couple type, reproductive stage |
| Job context | Do schedule rigidity and autonomy matter independently? | + O*NET autonomy, schedule unpredictability, time pressure, coordination responsibility, physical proximity, job rigidity |
| Interactions | Where specifically does the gap concentrate? | + female × recent birth, female × child under 6, female × recent marriage, female × same-sex, female × autonomy, female × job rigidity, female × child under 6 × job rigidity. (Because autonomy enters job rigidity with a negative weight, the standalone female × autonomy term is a residual conditional slope.) |
| Household sensitivity | How much more do linked household-composition and partner-resource variables absorb? | + multigenerational household, other adults present, then partnered-sample checks with partner employment and partner wage |
Household composition is descriptive, and partner-resource variables are partly post-market. They are reported as sensitivity checks rather than folded into the headline sequential ladder.
| Sensitivity panel | Added controls | Baseline Gap | Augmented Gap | Change |
|---|---|---|---|---|
| Household composition | + other adults present | 10.8% | 11.0% | +0.1 pp |
| Partner resources | + partner employed, partner wage | 15.1% | 17.9% | +2.8 pp |
In the current public ACS extract, MULTG is unavailable, so the fitted household-composition
row uses other_adults_present only. relative_earnings is excluded because it is built
from the respondent’s own wage.
5c. Gelbach Decomposition
The sequential ladder is order-sensitive: the amount each block “explains” depends on
when it enters. The Gelbach (2016) decomposition solves this by computing each block’s
contribution in an order-invariant way. The identity is exact:
base coef − full coef = ∑ block contributions.
| Block | What it measures | Mean Δ (2015–2023) | Share of explained |
|---|---|---|---|
| Job sorting | How much do occupation/industry choices move the female coefficient? | −0.084 | ~70% |
| Reproductive burden | How much does parenthood timing, couple type, and reproductive stage move it? | −0.036 | ~30% |
| Job context | How much do schedule rigidity and autonomy move it, independently of sorting? | −0.016 | ~13% |
| Schedule | Do hours and work arrangements absorb or reveal more of the gap? | +0.017 | −14% |
| Geography | Are women concentrated in lower-paying states? | −0.002 | ~2% |
| Family | Does generic marital/child status add anything beyond reproductive stage? | +0.002 | ~0% |
Reproductive decomposition (demographics as base, full model includes reproductive stage + job context). Means are across 8 ACS years (2015–2023). Percentages sum to >100% because schedule works in the opposite direction (controlling for hours reveals a larger penalty). SEs are approximate (delta method, ignoring cross-term covariance). Identity residuals are <10−13 in every year.
5d. Variance & Distributional Analysis
| Analysis | Question |
|---|---|
| Raw and residual dispersion | How much of the variance gap survives full controls? (Ratio, P90/P10, P95/P50, top-earner shares) |
| Selection-corrected variance | Does employment selection explain the dispersion gap? (IPW reweighting) |
| Dispersion by reproductive stage | Does variance compress for mothers vs. childless workers? |
| Dispersion by job rigidity | Does the variance ratio flip in rigid-schedule occupations? |
6. Robustness Checks
- Employment selection (IPW reweighting)
- DML with elastic net
- Oaxaca decomposition
- SIPP validation (including reproductive stage gaps)
- CPS cross-check
- ACS family field correction
- Year-by-year stability (8 ACS years)
- Gelbach decomposition (order-invariant covariate attribution)
- Quantile regression
- Same-sex placebo / lesbian married premium (negative control)
- Fertility-risk gradient among childless women
- NLSY79/97 cohort validation of reproductive controls
7. Limitations
- No firm/employer effects
- No tenure or work-history interruption data
- No direct bargaining/negotiation variables
- No within-occupation task specialization
- No schedule flexibility measures
- ACS 2020 unavailable due to COVID collection issues