Skip to content

efficient-did: thread vcov_type as narrow {hc1} contract per Chen-Sant'Anna-Xie 2025 (Phase 1b #4)#495

Merged
igerber merged 3 commits into
mainfrom
feature/efficient-did-vcov-type-phase1b
May 26, 2026
Merged

efficient-did: thread vcov_type as narrow {hc1} contract per Chen-Sant'Anna-Xie 2025 (Phase 1b #4)#495
igerber merged 3 commits into
mainfrom
feature/efficient-did-vcov-type-phase1b

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 26, 2026

Summary

  • Phase 1b interstitial Claude/setup pip install k2 m4j #4: thread vcov_type through EfficientDiD as permanently narrow to {"hc1"} per Chen-Sant'Anna-Xie (2025) EIF-based variance achieving the semiparametric efficiency bound. Analytical-sandwich families {classical, hc2, hc2_bm} and conley are rejected at __init__ / set_params with methodology-rooted messages (the per-unit EIF aggregation has no equivalent single design matrix on which hat-matrix leverage or Bell-McCaffrey Satterthwaite DOF can be defined).
  • BC change on EfficientDiDResults: cluster field renamed to cluster_name; new n_clusters + vcov_type fields added; new to_dict() method (mirrors TripleDifferenceResults / ImputationDiDResults). DiagnosticReport._pt_hausman updated to read the renamed field.
  • Summary block harmonized: emits Variance estimator: <HC1 / CR1 cluster-robust at <col>, G=<n>> after the survey block; suppressed under bootstrap (replaced with Inference method: bootstrap + Bootstrap replications: <n>). Default cluster=None renders "HC1 heteroskedasticity-robust" — methodologically correct (per-unit EIF SE is HC1-style with no Liang-Zeger G/(G-1) correction); diverges from ImputationDiD which auto-clusters at unit per BJS Theorem 3.
  • Defensive fix: survey-PSU multiplier bootstrap with <2 PSUs now returns NaN SE (was ≈0 from BLAS roundoff). Single-site guard in the survey-PSU branch + new _build_nan_bootstrap_results helper that keys NaN dicts to the same (g,t) / event-time / group reductions the downstream override loop expects.
  • Eager set_params validation: EfficientDiD.set_params(vcov_type=bad) raises immediately (matches existing pattern for pt_assumption/control_group/etc); intentional deviation from sibling IF-based estimators which use sklearn mutate-then-validate-at-use. Documented in REGISTRY.md.

Methodology references (required if estimator / math changes)

  • Method name(s): EfficientDiD (Chen-Sant'Anna-Xie 2025) IF-based variance for ATT(g,t); Liang-Zeger CR1 on cluster-aggregated EIF; Taylor Series Linearization (Binder 1983) on the combined IF under survey designs.
  • Paper / source link(s): Chen, J., Sant'Anna, P.H.C., & Xie, X. (2025). Efficient Difference-in-Differences with Multiple Time Periods; Liang, K.Y., & Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models; Binder, D.A. (1983). On the variances of asymptotically normal estimators from complex surveys.
  • Any intentional deviations from the source (and why):
    • Permanently narrow vcov_type={"hc1"} contract — structural property of the IF-based variance derivation (no global design matrix on which HC2 leverage or Bell-McCaffrey DOF can be defined). Documented in docs/methodology/REGISTRY.md "IF-based variance estimators vs analytical-sandwich estimators" section + EfficientDiD per-estimator Notes. Matches the established pattern across CallawaySantAnna / TripleDifference / ImputationDiD.
    • Default cluster=None renders HC1, NOT CR1-at-unit — per-unit EIF SE sqrt(mean(EIF²)/n) is HC1-style (no Liang-Zeger G/(G-1) finite-sample correction). Diverges from ImputationDiD which auto-clusters at unit per BJS Theorem 3.
    • Eager set_params(vcov_type) validationEfficientDiD.set_params calls _validate_params() which now invokes _validate_vcov_type, so bad vcov raises immediately rather than at fit-time. Matches EfficientDiD's existing eager-validation pattern; diverges from sibling IF-based estimators (sklearn mutate-then-validate-at-use). Documented in REGISTRY.md.

Validation

  • Tests added/updated:
    • tests/test_efficient_did.py — new TestEfficientDiDVcovType class with 40 tests across 7 surfaces (default / cluster / TSL-survey / replicate-survey bit-equal parametrized over aggregate ∈ {None, event_study, group, all}; bootstrap × cluster + bootstrap × survey bit-equal with per-horizon/per-group SE override branches asserted; set_params eager revalidation; bootstrap n_psu<2 NaN propagation; DR (covariates=) path bit-equality; 5 input-rejection pins; cluster + replicate_weights rejection via blanket guard; 7 introspection tests).
    • tests/test_business_report.py::TestHausmanPretestPropagatesCluster — assertion updated from getattr(edid, "cluster", None)getattr(edid, "cluster_name", None) to track the field rename.
  • Backtest / simulation / notebook evidence (if applicable): N/A — vcov_type="hc1" is bit-equal to current default across all aggregate modes, the DR covariates path, and all bootstrap branches; no R parity needed.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

igerber added 2 commits May 26, 2026 10:18
…t'Anna-Xie 2025 (Phase 1b interstitial #4)

Mirrors the IF-based narrow-contract template from CallawaySantAnna #487,
TripleDifference #488, and ImputationDiD #492. EfficientDiD uses
influence-function-based variance per Chen-Sant'Anna-Xie (2025) achieving
the semiparametric efficiency bound; the per-unit EIF aggregation has no
equivalent single design matrix, so analytical-sandwich families cannot
be defined and vcov_type is permanently narrow to {"hc1"}.

Input contract:
- vcov_type kwarg threaded through __init__ / get_params / set_params
- _validate_vcov_type rejects {classical, hc2, hc2_bm} with methodology-rooted
  message citing Chen-Sant'Anna-Xie (2025) and the missing design matrix
- conley rejected as deferred (TODO.md follow-up row)
- set_params(vcov_type=bad) raises immediately via _validate_params chain
  (intentional eager-validation; diverges from sibling IF-based estimators
  which defer to fit-time per sklearn mutate-then-validate-at-use)

EfficientDiDResults harmonization (BC):
- cluster field renamed to cluster_name (matches IF-based estimator naming)
- New n_clusters field for the G=<n> suffix on the variance summary line
- New vcov_type field
- New to_dict() method (mirrors TripleDifferenceResults / ImputationDiDResults)
- summary() now renders "Variance estimator: <label>" after the survey block,
  suppressed under bootstrap (renders "Inference method: bootstrap" instead)
- Default cluster=None renders "HC1 heteroskedasticity-robust" — methodologically
  correct (per-unit EIF SE is HC1-style, no Liang-Zeger G/(G-1) correction);
  diverges from ImputationDiD which auto-clusters at unit per BJS Theorem 3
- DiagnosticReport._pt_hausman updated to read renamed cluster_name field

Bootstrap defensive fix:
- Survey-PSU multiplier bootstrap with G<2 PSUs collapses to constant draws
  → BLAS roundoff produces ≈0 SE (not NaN). Single-site guard at the
  survey-PSU branch returns NaN dicts keyed to the same (g,t)/event-time/
  group reductions the downstream override loop expects (cluster path
  already guards n_clusters≥2 at fit-time; unit path guarded upstream by
  balanced-panel validator).
- New _build_nan_bootstrap_results helper constructs the all-NaN object.

Tests (40 new in TestEfficientDiDVcovType across 7 surfaces):
- Default / cluster / TSL-survey / replicate-survey bit-equal parametrized
  over aggregate ∈ {None, event_study, group, all}
- Bootstrap × cluster + bootstrap × survey bit-equal (per-horizon + per-group
  SE override branches asserted)
- set_params eager revalidation
- Bootstrap n_psu<2 NaN propagation (regression on defensive fix)
- DR (covariates=) path bit-equal under explicit vcov_type="hc1"
- 5 input-rejection pins (classical/hc2/hc2_bm/conley/unknown)
- cluster + replicate_weights rejection (validates blanket guard at L357)
- 7 introspection tests (default attr, get_params, Results carries,
  to_dict default vs cluster, summary HC1 default vs CR1 cluster vs
  bootstrap-suppressed, cluster_name suppression under both analytical and
  replicate survey, fit-clone idempotence)

Documentation:
- REGISTRY.md: IF-based taxonomy "Enforced today" expanded to four-way
  (CS+TD+ImputationDiD+EfficientDiD); 3 EfficientDiD Notes added
- llms-full.txt: vcov_type added to EfficientDiD signature; shared
  staggered-results variance metadata section updated
- CHANGELOG [Unreleased] entry surfaces both vcov_type threading AND the
  Results-field BC break
- TODO.md Phase 1b row collapses to TwoStageDiD only (last remaining);
  new EfficientDiD Conley follow-up row added
- EfficientDiD class + module + Results docstrings updated

210 tests pass across test_efficient_did.py + test_efficient_did_validation.py
+ test_diagnostics.py. black + ruff clean.
…plicate Bootstrap summary line

Two follow-up fixes:

1. cluster → cluster_name propagation. The original PR renamed the
   EfficientDiDResults.cluster field to cluster_name but left two
   consumers still referencing the old name:
   - tests/test_business_report.py:1267 (TestHausmanPretestPropagatesCluster
     regression test asserting the field is persisted for the Hausman
     pretest replay) — updated assertion + class docstring.
   - docs/api/_autosummary/diff_diff.EfficientDiDResults.rst — the
     checked-in Sphinx autosummary still listed cluster and was missing
     cluster_name, n_clusters, vcov_type, and the new to_dict() method.

2. Suppress duplicate Bootstrap summary header. EfficientDiDResults.summary()
   previously emitted both the legacy "Bootstrap: <n> (<weights>)" header
   line (L250-L251 pre-PR) and the new "Inference method: bootstrap" +
   "Bootstrap replications: <n>" block. The legacy header is now gated on
   `bootstrap_results is None` so analytical-only fits keep the original
   render and bootstrap-overwritten fits use the canonical inference-method
   block alone (matches the sibling result-container convention).

211 tests pass across test_efficient_did.py, test_efficient_did_validation.py,
test_diagnostics.py, and test_business_report.py::TestHausmanPretestPropagatesCluster.
black + ruff clean on touched Python files.
@github-actions
Copy link
Copy Markdown

Overall Assessment

✅ Looks good

Executive Summary

  • No P0/P1 methodology defects found. The new vcov_type="hc1"-only contract is consistent with EfficientDiD’s EIF-based source framing, and the PR documents the estimator-specific variance behavior instead of changing it silently in code. docs/methodology/REGISTRY.md:L306-L367, docs/methodology/REGISTRY.md:L1052-L1059. citeturn1academia12
  • The new survey-bootstrap NaN override is wired through overall, group-time, event-study, and group outputs, so the changed path does not introduce the misleading finite-inference pattern from degenerate bootstrap draws. diff_diff/efficient_did_bootstrap.py:L116-L137, diff_diff/efficient_did.py:L1089-L1138
  • vcov_type propagation is otherwise complete across constructor params, get_params(), results metadata, summary()/to_dict(), and the visible Hausman replay call site.
  • One non-blocking issue remains: rejected set_params(vcov_type=...) calls mutate the estimator before validation fails.

Methodology

Code Quality

  • Severity P2; Impact: EfficientDiD.set_params() assigns attributes first and validates afterward in diff_diff/efficient_did.py:L353-L360. A rejected call such as set_params(vcov_type="classical", alpha=0.1) therefore raises only after leaving self.vcov_type and any other batched kwargs partially applied. That weakens the new eager-validation surface for callers that catch ValueError and continue using the object. The added tests in tests/test_efficient_did.py:L2510-L2522 only pin the raise, not post-error state. Concrete fix: stage updates in a copy/snapshot and commit only on success, or rollback on exception; add a regression test mirroring the project’s existing atomic set_params() guards.

Performance

  • No findings.

Maintainability

  • Severity P3; Impact: the public results surface is now a hard rename from cluster to cluster_name in diff_diff/efficient_did_results.py:L161-L174. The visible internal caller was updated in diff_diff/diagnostic_report.py:L2437-L2446, so I do not see an internal break in this diff, but external consumers of results.cluster will break on upgrade. Concrete fix: none required if this release is intentionally breaking; otherwise add a one-release deprecated alias.

Tech Debt

  • No findings. The deferred vcov_type="conley" work is explicitly tracked in TODO.md:L107-L115, so it is not untracked blocker debt in this PR.

Security

  • No findings.

Documentation/Tests

  • No findings in the changed docs/tests themselves.
  • I could not execute the targeted tests in this review environment because pytest and numpy are not installed, so runtime validation here was static-only.

…ons on validation failure

Previously, EfficientDiD.set_params iterated over the kwargs and called
setattr() before invoking _validate_params(). A rejected batched call
such as set_params(vcov_type="classical", alpha=0.1, anticipation=2)
raised on validation but left every kwarg in the partially-mutated
state — the rejected vcov_type plus the otherwise-valid alpha and
anticipation. Callers that catch ValueError and continue using the
estimator would then operate on a silently corrupted parameter
configuration, defeating the eager-validation contract introduced when
_validate_vcov_type was wired into _validate_params.

Fix: snapshot original attribute values for every kwarg before applying
mutations, run validation, and on exception restore the snapshot before
re-raising. The snapshot pass also moves the "Unknown parameter" check
ahead of any mutation so that even unknown-name rejections leave the
estimator untouched.

Regression test added: test_set_params_rollback_on_validation_failure
fires the 3-kwarg batched call and pins all three attribute values to
their pre-call snapshot after the raise.
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: f808ae69a18363104e65bf3bb92932e084a172c3


Overall Assessment

✅ Looks good

Executive Summary

Methodology

  • No findings. The paper’s current framing is an efficient staggered-DiD/event-study estimator under multiple-period parallel-trends assumptions, and the PR’s narrow {"hc1"} contract plus cluster/survey notes are documented in the repository rather than constituting an undocumented methodology change. docs/methodology/REGISTRY.md:L1054-L1059, diff_diff/efficient_did.py:L14-L21. citeturn0view0

Code Quality

Performance

  • No findings.

Maintainability

  • Severity P3; Impact: EfficientDiDResults.cluster was renamed to cluster_name without a compatibility alias, so out-of-tree consumers of result.cluster will break even though in-tree callers were updated. diff_diff/efficient_did_results.py:L163-L174, diff_diff/diagnostic_report.py:L2438-L2446. Concrete fix: if this release is intended to be non-breaking, add a deprecated read-only cluster alias for one release; otherwise no action is required.

Tech Debt

  • No findings. The deferred vcov_type="conley" work is explicitly tracked in TODO.md:L107-L115, so it is properly recorded rather than silently deferred.

Security

  • No findings.

Documentation/Tests

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 26, 2026
@igerber igerber merged commit 2656e44 into main May 26, 2026
33 of 34 checks passed
@igerber igerber deleted the feature/efficient-did-vcov-type-phase1b branch May 26, 2026 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant