Methodology
Technical details of the measurement pipeline, statistical models, and robustness testing.
| Cohort |
Battery |
Subtests |
N |
| NLSY79 |
ASVAB |
10: GS, AR, WK, PC, NO, CS, AS, MK, MC, EI |
12,686 |
| NLSY97 |
CAT-ASVAB |
12: GS, AR, WK, PC, NO, CS, AS, MK, MC, EI, AI, SI |
8,984 |
| CNLSY |
PIAT |
3: Math, Reading Recognition, Reading Comprehension |
11,545 |
- Quadratic OLS residualization: each subtest regressed on age + age² within each cohort
- Residuals standardized to mean=0, SD=1
- Multi-group CFA (males vs females)
- Estimator: MLR (robust maximum likelihood)
- Missing data: FIML (full information maximum likelihood)
- Standardization: std.lv = TRUE, reference group = female
- Software: R lavaan package via subprocess
| Level |
What’s Constrained |
Gate Thresholds |
| Configural |
Factor structure only |
Baseline |
| Metric |
+ Factor loadings |
ΔCFI ≤ 0.010, ΔRMSEA ≤ 0.015, ΔSRMR ≤ 0.030 |
| Scalar |
+ Intercepts |
ΔCFI ≤ 0.010, ΔRMSEA ≤ 0.015, ΔSRMR ≤ 0.010 |
Invariance Results
| Cohort |
Metric |
Scalar |
dg Status |
| NLSY79 |
Pass |
Pass |
Confirmatory |
| NLSY97 |
Pass |
Fail (ΔCFI) |
Gated — exploratory only |
| CNLSY |
Pass |
Fail (ΔCFI, ΔRMSEA) |
Gated — exploratory only |
- 499 family-clustered bootstrap replicates
- Full SEM refit on each replicate
- Percentile CIs from bootstrap distribution
- g_proxy (observed composite) used for all exploratory analyses
- Outcome associations via OLS, logistic regression, within-family fixed effects
- Cross-cohort comparisons at age-matched windows
| Variant |
Description |
| Sampling |
One pair per family deduplication |
| Age adjustment |
Linear vs quadratic vs none |
| Model form |
Single-factor vs bifactor |
| Survey weights |
Weighted vs unweighted |
| Dedup seed |
5 different random seeds |
- Raw data from NLS Investigator (Bureau of Labor Statistics)
- No microdata redistributed; only aggregate outputs in repository
- Variable mapping documented in
data/raw/manifest.json