diff --git a/METHODOLOGY_REVIEW.md b/METHODOLOGY_REVIEW.md
index 4752747..65c0818 100644
--- a/METHODOLOGY_REVIEW.md
+++ b/METHODOLOGY_REVIEW.md
@@ -153,14 +153,69 @@ Each estimator in diff-diff should be periodically reviewed to ensure:
 | Module | `twfe.py` |
 | Primary Reference | Wooldridge (2010), Ch. 10 |
 | R Reference | `fixest::feols()` |
-| Status | Not Started |
-| Last Review | - |
+| Status | **Complete** |
+| Last Review | 2026-02-08 |
+
+**Verified Components:**
+- [x] Within-transformation algebra: `y_it - ȳ_i - ȳ_t + ȳ` matches hand calculation (rtol=1e-12)
+- [x] ATT matches manual demeaned OLS (rtol=1e-10)
+- [x] ATT matches `DifferenceInDifferences` on 2-period data (rtol=1e-10)
+- [x] Covariates are also within-transformed (sum to zero within unit/time groups)
+- [x] R comparison: ATT matches `fixest::feols(y ~ treated:post | unit + post, cluster=~unit)` (rtol<0.1%)
+- [x] R comparison: Cluster-robust SE match (rtol<1%)
+- [x] R comparison: P-value match (atol<0.01)
+- [x] R comparison: CI bounds match (rtol<1%)
+- [x] R comparison: ATT and SE match with covariate (same tolerances)
+- [x] Edge case: Staggered treatment triggers `UserWarning`
+- [x] Edge case: Auto-clusters at unit level (SE matches explicit `cluster="unit"`)
+- [x] Edge case: DF adjustment for absorbed FE matches manual `solve_ols()` with `df_adjustment`
+- [x] Edge case: Covariate collinear with interaction raises `ValueError` ("cannot be identified")
+- [x] Edge case: Covariate collinearity warns but ATT remains finite
+- [x] Edge case: `rank_deficient_action="error"` raises `ValueError`
+- [x] Edge case: `rank_deficient_action="silent"` emits no warnings
+- [x] Edge case: Unbalanced panel produces valid results (finite ATT, positive SE)
+- [x] Edge case: Missing unit column raises `ValueError`
+- [x] Integration: `decompose()` returns `BaconDecompositionResults`
+- [x] SE: Cluster-robust SE >= HC1 SE
+- [x] SE: VCoV positive semi-definite
+- [x] Wild bootstrap: Valid inference (finite SE, p-value in [0,1])
+- [x] Wild bootstrap: All weight types (rademacher, mammen, webb) produce valid inference
+- [x] Wild bootstrap: `inference="wild_bootstrap"` routes correctly
+- [x] Params: `get_params()` returns all inherited parameters
+- [x] Params: `set_params()` modifies attributes
+- [x] Results: `summary()` contains "ATT"
+- [x] Results: `to_dict()` contains att, se, t_stat, p_value, n_obs
+- [x] Results: residuals + fitted = demeaned outcome (not raw)
+- [x] Edge case: Multi-period time emits UserWarning advising binary post indicator
+- [x] Edge case: Non-{0,1} binary time emits UserWarning (ATT still correct)
+- [x] Edge case: ATT invariant to time encoding ({0,1} vs {2020,2021} produces identical results)
+
+**Key Implementation Detail:**
+The interaction term `D_i × Post_t` must be within-transformed (demeaned) alongside the outcome,
+consistent with the Frisch-Waugh-Lovell (FWL) theorem: all regressors and the outcome must be
+projected out of the fixed effects space. R's `fixest::feols()` does this automatically when
+variables appear to the left of the `|` separator.
 
 **Corrections Made:**
-- (None yet)
+- **Bug fix: interaction term must be within-transformed** (found during review). The previous
+  implementation used raw (un-demeaned) `D_i × Post_t` in the demeaned regression. This gave
+  correct results only for 2-period panels where `post == period`. For multi-period panels
+  (e.g., 4 periods with binary `post`), the raw interaction had incorrect correlation with
+  demeaned Y, producing ATT approximately 1/3 of the true value. Fixed by applying the same
+  within-transformation to the interaction term before regression. This matches R's
+  `fixest::feols()` behavior. (`twfe.py` lines 99-113)
 
 **Outstanding Concerns:**
-- (None yet)
+- **Multi-period `time` parameter**: Multi-period time values (e.g., 1,2,3,4) produce
+  `treated × period_number` instead of `treated × post_indicator`, which is not the standard
+  D_it treatment indicator. A `UserWarning` is emitted when `time` has >2 unique values.
+  For binary time with non-{0,1} values (e.g., {2020, 2021}), the ATT is mathematically
+  correct (the within-transformation absorbs the scaling), but a warning recommends 0/1
+  encoding for clarity. Users with multi-period data should create a binary `post` column.
+- **Staggered treatment warning**: The warning only fires when `time` has >2 unique values
+  (i.e., actual period numbers). With binary `time="post"`, all treated units appear to start
+  treatment at `time=1`, making staggering undetectable. Users with staggered designs should
+  use `decompose()` or `CallawaySantAnna` directly for proper diagnostics.
 
 ---
 
diff --git a/benchmarks/R/benchmark_twfe.R b/benchmarks/R/benchmark_twfe.R
new file mode 100644
index 0000000..e1c9771
--- /dev/null
+++ b/benchmarks/R/benchmark_twfe.R
@@ -0,0 +1,132 @@
+#!/usr/bin/env Rscript
+# Benchmark: Two-Way Fixed Effects (R `fixest` package with absorbed FE)
+#
+# This uses fixest::feols() with absorbed unit + post FE and unit-level clustering,
+# matching the Python TwoWayFixedEffects estimator's approach.
+#
+# Usage:
+#   Rscript benchmark_twfe.R --data path/to/data.csv --output path/to/results.json
+
+library(fixest)
+library(jsonlite)
+library(data.table)
+
+# Parse command line arguments
+args <- commandArgs(trailingOnly = TRUE)
+
+parse_args <- function(args) {
+  result <- list(
+    data = NULL,
+    output = NULL
+  )
+
+  i <- 1
+  while (i <= length(args)) {
+    if (args[i] == "--data") {
+      result$data <- args[i + 1]
+      i <- i + 2
+    } else if (args[i] == "--output") {
+      result$output <- args[i + 1]
+      i <- i + 2
+    } else {
+      i <- i + 1
+    }
+  }
+
+  if (is.null(result$data) || is.null(result$output)) {
+    stop("Usage: Rscript benchmark_twfe.R --data <path> --output <path>")
+  }
+
+  return(result)
+}
+
+config <- parse_args(args)
+
+# Load data
+message(sprintf("Loading data from: %s", config$data))
+data <- fread(config$data)
+
+# Run benchmark
+message("Running TWFE estimation with absorbed FE...")
+start_time <- Sys.time()
+
+# TWFE with absorbed unit + post fixed effects, clustered at unit level
+# This matches Python's TwoWayFixedEffects:
+#   - Within-transformation removes unit and time (post) FE
+#   - Cluster-robust SE at unit level (automatic)
+model <- feols(
+  outcome ~ treated:post | unit + post,
+  data = data,
+  cluster = ~unit
+)
+
+estimation_time <- as.numeric(difftime(Sys.time(), start_time, units = "secs"))
+
+# Extract results
+coef_name <- "treated:post"
+coefs <- coef(model)
+ses <- se(model)
+pvals <- pvalue(model)
+ci <- confint(model)
+
+# Find the treatment effect coefficient
+if (coef_name %in% names(coefs)) {
+  att <- coefs[coef_name]
+  att_se <- ses[coef_name]
+  att_pval <- pvals[coef_name]
+  att_ci <- ci[coef_name, ]
+} else {
+  # Try alternative name formats
+  idx <- grep("treated.*post|post.*treated", names(coefs))
+  if (length(idx) > 0) {
+    att <- coefs[idx[1]]
+    att_se <- ses[idx[1]]
+    att_pval <- pvals[idx[1]]
+    att_ci <- ci[idx[1], ]
+    coef_name <- names(coefs)[idx[1]]
+  } else {
+    stop("Could not find treatment effect coefficient")
+  }
+}
+
+# Format output
+results <- list(
+  estimator = "fixest::feols (absorbed FE)",
+  cluster = "unit",
+
+  # Treatment effect
+  att = unname(att),
+  se = unname(att_se),
+  pvalue = unname(att_pval),
+  ci_lower = unname(att_ci[1]),
+  ci_upper = unname(att_ci[2]),
+  coef_name = coef_name,
+
+  # Model statistics
+  model_stats = list(
+    r_squared = summary(model)$r2,
+    adj_r_squared = summary(model)$adj.r2,
+    n_obs = model$nobs
+  ),
+
+  # Timing
+  timing = list(
+    estimation_seconds = estimation_time,
+    total_seconds = estimation_time
+  ),
+
+  # Metadata
+  metadata = list(
+    r_version = R.version.string,
+    fixest_version = as.character(packageVersion("fixest")),
+    n_units = length(unique(data$unit)),
+    n_periods = length(unique(data$post)),
+    n_obs = nrow(data)
+  )
+)
+
+# Write output
+message(sprintf("Writing results to: %s", config$output))
+write_json(results, config$output, auto_unbox = TRUE, pretty = TRUE, digits = 15)
+
+message(sprintf("Completed in %.3f seconds", estimation_time))
diff --git a/benchmarks/python/benchmark_twfe.py b/benchmarks/python/benchmark_twfe.py
new file mode 100644
index 0000000..22cadac
--- /dev/null
+++ b/benchmarks/python/benchmark_twfe.py
@@ -0,0 +1,129 @@
+#!/usr/bin/env python3
+"""
+Benchmark: TwoWayFixedEffects (diff-diff TwoWayFixedEffects class).
+
+This benchmarks the actual TwoWayFixedEffects class with within-transformation,
+as opposed to benchmark_basic.py which uses DifferenceInDifferences with formula.
+
+Usage:
+    python benchmark_twfe.py --data path/to/data.csv --output path/to/results.json
+"""
+
+import argparse
+import json
+import os
+import sys
+from pathlib import Path
+
+# IMPORTANT: Parse --backend and set environment variable BEFORE importing diff_diff
+def _get_backend_from_args():
+    """Parse --backend argument without importing diff_diff."""
+    parser = argparse.ArgumentParser(add_help=False)
+    parser.add_argument("--backend", default="auto", choices=["auto", "python", "rust"])
+    args, _ = parser.parse_known_args()
+    return args.backend
+
+_requested_backend = _get_backend_from_args()
+if _requested_backend in ("python", "rust"):
+    os.environ["DIFF_DIFF_BACKEND"] = _requested_backend
+
+# NOW import diff_diff and other dependencies (will see the env var)
+import pandas as pd
+
+# Add parent to path for imports
+sys.path.insert(0, str(Path(__file__).parent.parent.parent))
+
+from diff_diff import TwoWayFixedEffects, HAS_RUST_BACKEND
+from benchmarks.python.utils import Timer
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description="Benchmark TwoWayFixedEffects estimator")
+    parser.add_argument("--data", required=True, help="Path to input CSV data")
+    parser.add_argument("--output", required=True, help="Path to output JSON results")
+    parser.add_argument(
+        "--backend", default="auto", choices=["auto", "python", "rust"],
+        help="Backend to use: auto (default), python (pure Python), rust (Rust backend)"
+    )
+    return parser.parse_args()
+
+
+def get_actual_backend() -> str:
+    """Return the actual backend being used based on HAS_RUST_BACKEND."""
+    return "rust" if HAS_RUST_BACKEND else "python"
+
+
+def main():
+    args = parse_args()
+
+    actual_backend = get_actual_backend()
+    print(f"Using backend: {actual_backend}")
+
+    # Load data
+    print(f"Loading data from: {args.data}")
+    data = pd.read_csv(args.data)
+
+    # Run benchmark using TwoWayFixedEffects (within-transformation approach)
+    print("Running TWFE estimation...")
+
+    twfe = TwoWayFixedEffects(robust=True)  # auto-clusters at unit level
+
+    with Timer() as timer:
+        results = twfe.fit(
+            data,
+            outcome="outcome",
+            treatment="treated",
+            time="post",
+            unit="unit",
+        )
+
+    att = results.att
+    se = results.se
+    pvalue = results.p_value
+    ci = results.conf_int
+
+    total_time = timer.elapsed
+
+    # Build output
+    output = {
+        "estimator": "diff_diff.TwoWayFixedEffects",
+        "backend": actual_backend,
+        "cluster": "unit",
+        # Treatment effect
+        "att": float(att),
+        "se": float(se),
+        "pvalue": float(pvalue),
+        "ci_lower": float(ci[0]),
+        "ci_upper": float(ci[1]),
+        # Model statistics
+        "model_stats": {
+            "n_obs": len(data),
+            "n_units": len(data["unit"].unique()),
+            "n_periods": len(data["post"].unique()),
+        },
+        # Timing
+        "timing": {
+            "estimation_seconds": total_time,
+            "total_seconds": total_time,
+        },
+        # Metadata
+        "metadata": {
+            "n_units": len(data["unit"].unique()),
+            "n_periods": len(data["post"].unique()),
+            "n_obs": len(data),
+        },
+    }
+
+    # Write output
+    print(f"Writing results to: {args.output}")
+    output_path = Path(args.output)
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+    with open(output_path, "w") as f:
+        json.dump(output, f, indent=2)
+
+    print(f"Completed in {total_time:.3f} seconds")
+    return output
+
+
+if __name__ == "__main__":
+    main()
diff --git a/benchmarks/run_benchmarks.py b/benchmarks/run_benchmarks.py
index 7501ee7..b4d3d41 100644
--- a/benchmarks/run_benchmarks.py
+++ b/benchmarks/run_benchmarks.py
@@ -695,6 +695,130 @@ def run_basic_did_benchmark(
     return results
 
 
+def run_twfe_benchmark(
+    data_path: Path,
+    name: str = "twfe",
+    scale: str = "small",
+    n_replications: int = 1,
+    backends: Optional[List[str]] = None,
+) -> Dict[str, Any]:
+    """Run TwoWayFixedEffects benchmarks (Python and R) with replications."""
+    print(f"\n{'='*60}")
+    print(f"TWFE BENCHMARK ({scale})")
+    print(f"{'='*60}")
+
+    if backends is None:
+        backends = ["python", "rust"]
+
+    timeouts = TIMEOUT_CONFIGS.get(scale, TIMEOUT_CONFIGS["small"])
+    results = {
+        "name": name,
+        "scale": scale,
+        "n_replications": n_replications,
+        "python_pure": None,
+        "python_rust": None,
+        "r": None,
+        "comparison": None,
+    }
+
+    # Run Python benchmark for each backend
+    for backend in backends:
+        backend_label = f"python_{'pure' if backend == 'python' else backend}"
+        print(f"\nRunning Python (diff_diff.TwoWayFixedEffects, backend={backend}) - {n_replications} replications...")
+        py_output = RESULTS_DIR / "accuracy" / f"{backend_label}_{name}_{scale}.json"
+        py_output.parent.mkdir(parents=True, exist_ok=True)
+
+        py_timings = []
+        py_result = None
+        for rep in range(n_replications):
+            try:
+                py_result = run_python_benchmark(
+                    "benchmark_twfe.py", data_path, py_output,
+                    timeout=timeouts["python"],
+                    backend=backend,
+                )
+                py_timings.append(py_result["timing"]["total_seconds"])
+                if rep == 0:
+                    print(f"  ATT: {py_result['att']:.4f}")
+                    print(f"  SE:  {py_result['se']:.4f}")
+                print(f"  Rep {rep+1}/{n_replications}: {py_timings[-1]:.3f}s")
+            except Exception as e:
+                print(f"  Rep {rep+1} failed: {e}")
+
+        if py_result and py_timings:
+            timing_stats = compute_timing_stats(py_timings)
+            py_result["timing"] = timing_stats
+            results[backend_label] = py_result
+            print(f"  Mean time: {timing_stats['stats']['mean']:.3f}s ± {timing_stats['stats']['std']:.3f}s")
+
+    # For backward compatibility, also store as "python" (use rust if available)
+    if results.get("python_rust"):
+        results["python"] = results["python_rust"]
+    elif results.get("python_pure"):
+        results["python"] = results["python_pure"]
+
+    # R benchmark with replications
+    print(f"\nRunning R (fixest::feols with absorbed FE) - {n_replications} replications...")
+    r_output = RESULTS_DIR / "accuracy" / f"r_{name}_{scale}.json"
+
+    r_timings = []
+    r_result = None
+    for rep in range(n_replications):
+        try:
+            r_result = run_r_benchmark(
+                "benchmark_twfe.R", data_path, r_output,
+                timeout=timeouts["r"]
+            )
+            r_timings.append(r_result["timing"]["total_seconds"])
+            if rep == 0:
+                print(f"  ATT: {r_result['att']:.4f}")
+                print(f"  SE:  {r_result['se']:.4f}")
+            print(f"  Rep {rep+1}/{n_replications}: {r_timings[-1]:.3f}s")
+        except Exception as e:
+            print(f"  Rep {rep+1} failed: {e}")
+
+    if r_result and r_timings:
+        timing_stats = compute_timing_stats(r_timings)
+        r_result["timing"] = timing_stats
+        results["r"] = r_result
+        print(f"  Mean time: {timing_stats['stats']['mean']:.3f}s ± {timing_stats['stats']['std']:.3f}s")
+
+    # Compare results
+    if results["python"] and results["r"]:
+        print("\nComparison (Python vs R):")
+        comparison = compare_estimates(
+            results["python"], results["r"], "TWFE", scale=scale,
+            se_rtol=0.01,
+            python_pure_results=results.get("python_pure"),
+            python_rust_results=results.get("python_rust"),
+        )
+        results["comparison"] = comparison
+        print(f"  ATT diff: {comparison.att_diff:.2e}")
+        print(f"  SE rel diff: {comparison.se_rel_diff:.1%}")
+        print(f"  Status: {'PASS' if comparison.passed else 'FAIL'}")
+
+    # Print timing comparison table
+    print("\nTiming Comparison:")
+    print(f"  {'Backend':<15} {'Time (s)':<12} {'vs R':<12} {'vs Pure Python':<15}")
+    print(f"  {'-'*54}")
+
+    r_mean = results["r"]["timing"]["stats"]["mean"] if results["r"] else None
+    pure_mean = results["python_pure"]["timing"]["stats"]["mean"] if results.get("python_pure") else None
+    rust_mean = results["python_rust"]["timing"]["stats"]["mean"] if results.get("python_rust") else None
+
+    if r_mean:
+        print(f"  {'R':<15} {r_mean:<12.3f} {'1.00x':<12} {'-':<15}")
+    if pure_mean:
+        r_speedup = f"{r_mean/pure_mean:.2f}x" if r_mean else "-"
+        print(f"  {'Python (pure)':<15} {pure_mean:<12.3f} {r_speedup:<12} {'1.00x':<15}")
+    if rust_mean:
+        r_speedup = f"{r_mean/rust_mean:.2f}x" if r_mean else "-"
+        pure_speedup = f"{pure_mean/rust_mean:.2f}x" if pure_mean else "-"
+        print(f"  {'Python (rust)':<15} {rust_mean:<12.3f} {r_speedup:<12} {pure_speedup:<15}")
+
+    return results
+
+
 def run_multiperiod_benchmark(
     data_path: Path,
     n_pre: int,
@@ -844,7 +968,7 @@ def main():
     )
     parser.add_argument(
         "--estimator",
-        choices=["callaway", "synthdid", "basic", "multiperiod"],
+        choices=["callaway", "synthdid", "basic", "twfe", "multiperiod"],
         help="Run specific estimator benchmark",
     )
     parser.add_argument(
@@ -932,6 +1056,16 @@ def main():
                 )
                 all_results.append(results)
 
+        if args.all or args.estimator == "twfe":
+            basic_key = f"basic_{scale}"
+            if basic_key in datasets:
+                results = run_twfe_benchmark(
+                    datasets[basic_key],
+                    scale=scale,
+                    n_replications=args.replications,
+                )
+                all_results.append(results)
+
         if args.all or args.estimator == "multiperiod":
             mp_key = f"multiperiod_{scale}"
             if mp_key in datasets:
diff --git a/diff_diff/twfe.py b/diff_diff/twfe.py
index 45b08d6..bf9a493 100644
--- a/diff_diff/twfe.py
+++ b/diff_diff/twfe.py
@@ -93,22 +93,49 @@ def fit(  # type: ignore[override]
         # Check for staggered treatment timing and warn if detected
         self._check_staggered_treatment(data, treatment, time, unit)
 
+        # Warn if time has more than 2 unique values (not a binary post indicator)
+        n_unique_time = data[time].nunique()
+        if n_unique_time > 2:
+            warnings.warn(
+                f"The '{time}' column has {n_unique_time} unique values. "
+                f"TwoWayFixedEffects expects a binary (0/1) post indicator. "
+                f"Multi-period time values produce 'treated * period_number' instead of "
+                f"'treated * post_indicator', which may not estimate the standard DiD ATT. "
+                f"Consider creating a binary post column: "
+                f"df['post'] = (df['{time}'] >= cutoff).astype(int)",
+                UserWarning,
+                stacklevel=2,
+            )
+        elif n_unique_time == 2:
+            unique_vals = set(data[time].unique())
+            if unique_vals != {0, 1} and unique_vals != {False, True}:
+                warnings.warn(
+                    f"The '{time}' column has values {sorted(unique_vals)} instead of {{0, 1}}. "
+                    f"The ATT estimate is mathematically correct (within-transformation "
+                    f"absorbs the scaling), but 0/1 encoding is recommended for clarity. "
+                    f"Consider: df['{time}'] = (df['{time}'] == {max(unique_vals)}).astype(int)",
+                    UserWarning,
+                    stacklevel=2,
+                )
+
         # Use unit-level clustering if not specified (use local variable to avoid mutation)
         cluster_var = self.cluster if self.cluster is not None else unit
 
-        # Demean data (within transformation for fixed effects)
-        data_demeaned = self._within_transform(data, outcome, unit, time, covariates)
+        # Create treatment × post interaction from raw data before demeaning.
+        # This must be within-transformed alongside the outcome and covariates
+        # so that the regression uses demeaned regressors (FWL theorem).
+        data = data.copy()
+        data["_treatment_post"] = data[treatment] * data[time]
 
-        # Create treatment × post interaction
-        # For staggered designs, we'd need to identify treatment timing per unit
-        # For now, assume standard 2-period design
-        data_demeaned["_treatment_post"] = (
-            data_demeaned[treatment] * data_demeaned[time]
+        # Demean outcome, covariates, AND interaction in a single pass
+        all_vars = [outcome] + (covariates or []) + ["_treatment_post"]
+        data_demeaned = _within_transform_util(
+            data, all_vars, unit, time, suffix="_demeaned"
         )
 
         # Extract variables for regression
         y = data_demeaned[f"{outcome}_demeaned"].values
-        X_list = [data_demeaned["_treatment_post"].values]
+        X_list = [data_demeaned["_treatment_post_demeaned"].values]
 
         if covariates:
             for cov in covariates:
@@ -292,6 +319,10 @@ def _check_staggered_treatment(
 
         Identifies if different units start treatment at different times,
         which can bias TWFE estimates when treatment effects are heterogeneous.
+
+        Note: This check requires ``time`` to have actual period values (not
+        binary 0/1). With binary time, all treated units appear to start at
+        time=1, so staggering is undetectable.
         """
         # Find first treatment time for each unit
         treated_obs = data[data[treatment] == 1]
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index 5b594de..97897ea 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -231,9 +231,14 @@ Estimated via within-transformation (demeaning):
 ```
 where tildes denote demeaned variables.
 
+**Note:** The interaction term `D_i × Post_t` is within-transformed (demeaned) alongside the
+outcome and covariates before regression. This is required by the Frisch-Waugh-Lovell theorem:
+all regressors must be projected out of the same fixed effects space as the dependent variable.
+This matches the behavior of R's `fixest::feols()` with absorbed FE.
+
 *Standard errors:*
 - Default: Cluster-robust at unit level (accounts for serial correlation)
-- Degrees of freedom adjusted for absorbed fixed effects
+- Degrees of freedom adjusted for absorbed fixed effects: `df_adjustment = n_units + n_times - 2`
 
 *Edge cases:*
 - Singleton units/periods are automatically dropped
@@ -241,16 +246,28 @@ where tildes denote demeaned variables.
 - Covariate collinearity emits warning but estimation continues (ATT still identified)
 - Rank-deficient design matrix: warns and sets NA for dropped coefficients (R-style, matches `lm()`)
 - Unbalanced panels handled via proper demeaning
+- Multi-period `time` parameter: only binary (0/1) post indicator is recommended; multi-period values
+  produce `treated × period_number` rather than `treated × post_indicator`. A `UserWarning` is
+  emitted when `time` has >2 unique values, advising users to create a binary post column.
+  Non-{0,1} binary time (e.g., {2020, 2021}) also emits a warning, though the ATT is mathematically
+  correct — the within-transformation absorbs the scaling.
+- Staggered warning limitation: requires `time` to have actual period values (not binary 0/1)
+  so that different cohort first-treatment times can be distinguished. With binary `time="post"`,
+  all treated units appear to start at `time=1`, making staggering undetectable. Users with
+  staggered designs should use `decompose()` or `CallawaySantAnna` directly.
 
 **Reference implementation(s):**
-- R: `fixest::feols(y ~ treat | unit + time, data)`
+- R: `fixest::feols(y ~ treat:post | unit + post, data, cluster = ~unit)`
 - Stata: `reghdfe y treat, absorb(unit time) cluster(unit)`
 
 **Requirements checklist:**
-- [ ] Staggered treatment automatically triggers warning
-- [ ] Auto-clusters standard errors at unit level
-- [ ] `decompose()` method returns BaconDecompositionResults
-- [ ] Within-transformation correctly handles unbalanced panels
+- [ ] Staggered adoption detection warning (only fires when `time` has >2 unique values; with binary `time`, staggering is undetectable)
+- [x] Multi-period time warning (fires when `time` has >2 unique values)
+- [x] Auto-clusters standard errors at unit level
+- [x] `decompose()` method returns BaconDecompositionResults
+- [x] Within-transformation correctly handles unbalanced panels
+- [x] Non-{0,1} binary time warning (fires when time has 2 unique values not in {0,1})
+- [x] ATT invariance to time encoding (verified by test)
 
 ---
 
diff --git a/tests/test_estimators.py b/tests/test_estimators.py
index 507309e..3a05510 100644
--- a/tests/test_estimators.py
+++ b/tests/test_estimators.py
@@ -3475,7 +3475,12 @@ def test_did_with_near_collinear_covariates(self):
         assert np.isfinite(results.att)
 
     def test_twfe_with_absorbed_covariate(self):
-        """Test TWFE handles covariate absorbed by fixed effects."""
+        """Test TWFE handles covariate absorbed by fixed effects.
+
+        A unit-level covariate (constant within unit) becomes zero after
+        within-transformation, causing rank deficiency. TWFE should handle
+        this gracefully (warn but still estimate ATT).
+        """
         from diff_diff import TwoWayFixedEffects
 
         np.random.seed(42)
@@ -3506,9 +3511,14 @@ def test_twfe_with_absorbed_covariate(self):
 
         df = pd.DataFrame(data)
 
-        twfe = TwoWayFixedEffects()
-        # unit_covariate is absorbed by unit fixed effects
-        results = twfe.fit(df, outcome="outcome", treatment="post", unit="unit", time="period")
+        # Use correct TWFE specification: treatment="treated", time="post"
+        # Include unit_covariate which is constant within unit and will be
+        # absorbed by unit FE (becomes zero after within-transformation)
+        twfe = TwoWayFixedEffects(rank_deficient_action="silent")
+        results = twfe.fit(
+            df, outcome="outcome", treatment="treated", unit="unit",
+            time="post", covariates=["unit_covariate"],
+        )
 
         assert np.isfinite(results.att)
         assert results.se > 0
diff --git a/tests/test_methodology_twfe.py b/tests/test_methodology_twfe.py
new file mode 100644
index 0000000..5734db9
--- /dev/null
+++ b/tests/test_methodology_twfe.py
@@ -0,0 +1,1111 @@
+"""
+Comprehensive methodology verification tests for TwoWayFixedEffects estimator.
+
+This module verifies that the TwoWayFixedEffects implementation matches:
+1. The theoretical formulas from within-transformation algebra
+2. The behavior of R's fixest::feols() with absorbed unit+time FE
+3. All documented edge cases in docs/methodology/REGISTRY.md
+
+References:
+- Wooldridge, J.M. (2010). Econometric Analysis of Cross Section and Panel Data, 2nd ed.
+  MIT Press, Chapter 10.
+- Goodman-Bacon, A. (2021). Difference-in-Differences with variation in treatment timing.
+  Journal of Econometrics, 225(2), 254-277.
+"""
+
+import json
+import os
+import subprocess
+import warnings
+from typing import Any, Dict
+
+import numpy as np
+import pandas as pd
+import pytest
+
+from diff_diff import TwoWayFixedEffects
+from diff_diff.linalg import LinearRegression
+from diff_diff.utils import within_transform
+
+
+# =============================================================================
+# R Availability Fixtures
+# =============================================================================
+
+_fixest_available_cache = None
+
+
+def _check_fixest_available() -> bool:
+    """Check if R and fixest package are available (cached)."""
+    global _fixest_available_cache
+    if _fixest_available_cache is None:
+        r_env = os.environ.get("DIFF_DIFF_R", "auto").lower()
+        if r_env == "skip":
+            _fixest_available_cache = False
+        else:
+            try:
+                result = subprocess.run(
+                    ["Rscript", "-e", "library(fixest); library(jsonlite); cat('OK')"],
+                    capture_output=True,
+                    text=True,
+                    timeout=30,
+                )
+                _fixest_available_cache = result.returncode == 0 and "OK" in result.stdout
+            except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
+                _fixest_available_cache = False
+    return _fixest_available_cache
+
+
+@pytest.fixture(scope="session")
+def fixest_available():
+    """Lazy check for R/fixest availability."""
+    return _check_fixest_available()
+
+
+@pytest.fixture
+def require_fixest(fixest_available):
+    """Skip test if R/fixest is not available."""
+    if not fixest_available:
+        pytest.skip("R or fixest package not available")
+
+
+# =============================================================================
+# Data Generation Helpers
+# =============================================================================
+
+
+def generate_twfe_panel(
+    n_units: int = 20,
+    n_periods: int = 4,
+    treatment_effect: float = 3.0,
+    noise_sd: float = 0.5,
+    seed: int = 42,
+) -> pd.DataFrame:
+    """Generate panel data for TWFE testing with known ATT."""
+    np.random.seed(seed)
+    n_treated = n_units // 2
+    data = []
+
+    for unit in range(n_units):
+        is_treated = unit < n_treated
+        unit_effect = np.random.normal(0, 2)
+
+        for period in range(n_periods):
+            post = 1 if period >= n_periods // 2 else 0
+            time_effect = period * 1.0
+
+            y = 10.0 + unit_effect + time_effect
+            if is_treated and post:
+                y += treatment_effect
+            y += np.random.normal(0, noise_sd)
+
+            data.append({
+                "unit": unit,
+                "period": period,
+                "treated": int(is_treated),
+                "post": post,
+                "outcome": y,
+            })
+
+    return pd.DataFrame(data)
+
+
+def generate_hand_calculable_panel() -> pd.DataFrame:
+    """
+    Generate a minimal 2-period panel with exact hand-calculable values.
+
+    4 units (2 treated, 2 control) × 2 periods = 8 observations.
+    No noise, so ATT is exactly 3.0.
+    """
+    return pd.DataFrame({
+        "unit": [0, 0, 1, 1, 2, 2, 3, 3],
+        "period": [0, 1, 0, 1, 0, 1, 0, 1],
+        "treated": [1, 1, 1, 1, 0, 0, 0, 0],
+        "post": [0, 1, 0, 1, 0, 1, 0, 1],
+        "outcome": [
+            10.0, 15.0,  # Unit 0 (treated): pre=10, post=15 (diff=5)
+            12.0, 17.0,  # Unit 1 (treated): pre=12, post=17 (diff=5)
+            8.0, 10.0,   # Unit 2 (control): pre=8, post=10 (diff=2)
+            6.0, 8.0,    # Unit 3 (control): pre=6, post=8 (diff=2)
+        ],
+    })
+    # ATT = (mean treated diff) - (mean control diff) = 5.0 - 2.0 = 3.0
+
+
+# =============================================================================
+# Phase 1: Within-Transformation Algebra
+# =============================================================================
+
+
+class TestWithinTransformationAlgebra:
+    """Verify the within-transformation (two-way demeaning) is correct."""
+
+    def test_within_transform_hand_calculation(self):
+        """Verify within-transformation matches hand calculation: y_it - ȳ_i - ȳ_t + ȳ."""
+        data = generate_hand_calculable_panel()
+
+        # Hand-calculate within-transformed outcome
+        # Unit means: unit 0 = 12.5, unit 1 = 14.5, unit 2 = 9.0, unit 3 = 7.0
+        # Time means: period 0 = (10+12+8+6)/4 = 9.0, period 1 = (15+17+10+8)/4 = 12.5
+        # Grand mean = (10+15+12+17+8+10+6+8)/8 = 86/8 = 10.75
+        unit_means = data.groupby("unit")["outcome"].transform("mean")
+        time_means = data.groupby("period")["outcome"].transform("mean")
+        grand_mean = data["outcome"].mean()
+        expected_demeaned = data["outcome"] - unit_means - time_means + grand_mean
+
+        # Use the library function
+        result = within_transform(data, ["outcome"], "unit", "period")
+
+        np.testing.assert_allclose(
+            result["outcome_demeaned"].values,
+            expected_demeaned.values,
+            rtol=1e-12,
+        )
+
+    def test_within_transform_covariates_also_demeaned(self):
+        """Verify covariates are demeaned (not just outcome)."""
+        data = generate_twfe_panel(n_units=10, n_periods=4, seed=123)
+        data["x1"] = np.random.default_rng(42).normal(0, 1, len(data))
+
+        result = within_transform(data, ["outcome", "x1"], "unit", "period")
+
+        # Demeaned covariates should sum to ~0 within each unit and time group
+        for var in ["outcome_demeaned", "x1_demeaned"]:
+            unit_sums = result.groupby("unit")[var].sum()
+            time_sums = result.groupby("period")[var].sum()
+            np.testing.assert_allclose(unit_sums.values, 0, atol=1e-10)
+            np.testing.assert_allclose(time_sums.values, 0, atol=1e-10)
+
+    def test_twfe_att_matches_hand_calculated_demeaned_ols(self):
+        """
+        Verify TWFE ATT matches manual demeaned OLS on a small panel.
+
+        By FWL theorem, regressing demeaned Y on demeaned (D_i * Post_t) gives ATT.
+        Both outcome and regressors must be within-transformed.
+        """
+        data = generate_hand_calculable_panel()
+
+        # Run TWFE
+        twfe = TwoWayFixedEffects(robust=True)
+        results = twfe.fit(
+            data, outcome="outcome", treatment="treated", time="post", unit="unit"
+        )
+
+        # Manual demeaned OLS: demean both y and the interaction term
+        data_with_tp = data.copy()
+        data_with_tp["tp"] = data["treated"] * data["post"]
+        demeaned = within_transform(data_with_tp, ["outcome", "tp"], "unit", "period")
+        y = demeaned["outcome_demeaned"].values
+        tp = demeaned["tp_demeaned"].values
+        X = np.column_stack([np.ones(len(y)), tp])
+        coeffs = np.linalg.lstsq(X, y, rcond=None)[0]
+        manual_att = coeffs[1]
+
+        np.testing.assert_allclose(results.att, manual_att, rtol=1e-10)
+
+    def test_twfe_att_matches_basic_did_for_two_period_design(self):
+        """TWFE and basic DiD should agree on 2-period data."""
+        from diff_diff import DifferenceInDifferences
+
+        data = generate_hand_calculable_panel()
+
+        # TWFE
+        twfe = TwoWayFixedEffects(robust=True)
+        twfe_results = twfe.fit(
+            data, outcome="outcome", treatment="treated", time="post", unit="unit"
+        )
+
+        # Basic DiD
+        did = DifferenceInDifferences(robust=True, cluster="unit")
+        did_results = did.fit(
+            data, outcome="outcome", treatment="treated", time="post"
+        )
+
+        np.testing.assert_allclose(twfe_results.att, did_results.att, rtol=1e-10)
+
+    def test_demeaned_outcome_sums_to_zero(self):
+        """Within-transformed outcome sums to zero within each unit and time group."""
+        data = generate_twfe_panel(n_units=20, n_periods=4, seed=99)
+
+        result = within_transform(data, ["outcome"], "unit", "period")
+
+        unit_sums = result.groupby("unit")["outcome_demeaned"].sum()
+        time_sums = result.groupby("period")["outcome_demeaned"].sum()
+
+        np.testing.assert_allclose(unit_sums.values, 0, atol=1e-10)
+        np.testing.assert_allclose(time_sums.values, 0, atol=1e-10)
+
+
+# =============================================================================
+# Phase 2: R Comparison
+# =============================================================================
+
+
+def _run_r_feols_twfe(data_path: str, covariates=None) -> Dict[str, Any]:
+    """Run R's fixest::feols() with absorbed unit+post FE, clustered at unit."""
+    escaped_path = data_path.replace("\\", "/")
+
+    if covariates:
+        cov_str = " + ".join(covariates)
+        formula = f"outcome ~ treated:post + {cov_str} | unit + post"
+    else:
+        formula = "outcome ~ treated:post | unit + post"
+
+    r_script = f'''
+    suppressMessages(library(fixest))
+    suppressMessages(library(jsonlite))
+
+    data <- read.csv("{escaped_path}")
+    data$treated <- as.numeric(data$treated)
+    data$post <- as.numeric(data$post)
+
+    result <- feols({formula}, data = data, cluster = ~unit)
+
+    # Use coeftable() to get fixest's own inference (SE, t-stat, p-value)
+    # This ensures we use fixest's df adjustment, not a manual pt() call
+    ct <- coeftable(result)
+    att_row <- which(rownames(ct) == "treated:post")
+    if (length(att_row) == 0) {{
+        att_row <- which(grepl("treated.*post", rownames(ct)))
+    }}
+
+    att <- ct[att_row, "Estimate"]
+    se_val <- ct[att_row, "Std. Error"]
+    tstat <- ct[att_row, "t value"]
+    pval <- ct[att_row, "Pr(>|t|)"]
+    ci <- confint(result)
+    ci_lower <- ci[att_row, 1]
+    ci_upper <- ci[att_row, 2]
+
+    output <- list(
+        att = unbox(att),
+        se = unbox(se_val),
+        t_stat = unbox(tstat),
+        p_value = unbox(pval),
+        ci_lower = unbox(ci_lower),
+        ci_upper = unbox(ci_upper),
+        n_obs = unbox(result$nobs)
+    )
+
+    cat(toJSON(output, pretty = TRUE, digits = 15))
+    '''
+
+    result = subprocess.run(
+        ["Rscript", "-e", r_script],
+        capture_output=True,
+        text=True,
+        timeout=60,
+    )
+
+    if result.returncode != 0:
+        raise RuntimeError(f"R script failed: {result.stderr}")
+
+    parsed = json.loads(result.stdout)
+    # Unwrap single-element lists from R's JSON encoding
+    for key in parsed:
+        if isinstance(parsed[key], list) and len(parsed[key]) == 1:
+            parsed[key] = parsed[key][0]
+
+    return parsed
+
+
+@pytest.fixture(scope="session")
+def r_benchmark_panel_data(tmp_path_factory):
+    """Session-scoped panel data + CSV for R comparison (no covariate)."""
+    np.random.seed(12345)
+    n_units = 50
+    n_periods = 4
+
+    data = []
+    for unit in range(n_units):
+        is_treated = unit < n_units // 2
+        unit_effect = unit * 0.2
+
+        for period in range(n_periods):
+            post = 1 if period >= 2 else 0
+            period_effect = period * 1.0
+
+            y = 10.0 + unit_effect + period_effect
+            if is_treated and post:
+                y += 3.0
+            y += np.random.normal(0, 0.5)
+
+            data.append({
+                "unit": unit,
+                "period": period,
+                "treated": int(is_treated),
+                "post": post,
+                "outcome": y,
+            })
+
+    df = pd.DataFrame(data)
+    tmp_dir = tmp_path_factory.mktemp("r_benchmark")
+    csv_path = tmp_dir / "panel_data.csv"
+    df.to_csv(csv_path, index=False)
+    return df, str(csv_path)
+
+
+@pytest.fixture(scope="session")
+def r_benchmark_panel_data_with_covariate(tmp_path_factory):
+    """Session-scoped panel data + CSV for R comparison (with covariate)."""
+    np.random.seed(12345)
+    n_units = 50
+    n_periods = 4
+
+    data = []
+    for unit in range(n_units):
+        is_treated = unit < n_units // 2
+        unit_effect = unit * 0.2
+
+        for period in range(n_periods):
+            post = 1 if period >= 2 else 0
+            period_effect = period * 1.0
+            x1 = np.random.normal(0, 1) + period * 0.3
+
+            y = 10.0 + unit_effect + period_effect + 1.5 * x1
+            if is_treated and post:
+                y += 3.0
+            y += np.random.normal(0, 0.5)
+
+            data.append({
+                "unit": unit,
+                "period": period,
+                "treated": int(is_treated),
+                "post": post,
+                "outcome": y,
+                "x1": x1,
+            })
+
+    df = pd.DataFrame(data)
+    tmp_dir = tmp_path_factory.mktemp("r_benchmark_cov")
+    csv_path = tmp_dir / "panel_data_cov.csv"
+    df.to_csv(csv_path, index=False)
+    return df, str(csv_path)
+
+
+@pytest.fixture(scope="session")
+def r_twfe_results(fixest_available, r_benchmark_panel_data):
+    """Cache R fixest results for the base panel (session-scoped)."""
+    if not fixest_available:
+        pytest.skip("R or fixest package not available")
+    _, csv_path = r_benchmark_panel_data
+    return _run_r_feols_twfe(csv_path)
+
+
+@pytest.fixture(scope="session")
+def r_twfe_results_with_covariate(fixest_available, r_benchmark_panel_data_with_covariate):
+    """Cache R fixest results for the covariate panel (session-scoped)."""
+    if not fixest_available:
+        pytest.skip("R or fixest package not available")
+    _, csv_path = r_benchmark_panel_data_with_covariate
+    return _run_r_feols_twfe(csv_path, covariates=["x1"])
+
+
+class TestRBenchmarkTWFE:
+    """Compare TWFE estimates against R's fixest::feols() with absorbed FE."""
+
+    def _run_python_twfe(self, data, covariates=None):
+        """Run Python TWFE estimator."""
+        twfe = TwoWayFixedEffects(robust=True)
+        results = twfe.fit(
+            data,
+            outcome="outcome",
+            treatment="treated",
+            time="post",
+            unit="unit",
+            covariates=covariates,
+        )
+        return results
+
+    def test_att_matches_r_twfe(self, r_twfe_results, r_benchmark_panel_data):
+        """ATT within rtol=1e-3 (0.1%) of R's fixest."""
+        data, _ = r_benchmark_panel_data
+
+        py_results = self._run_python_twfe(data)
+
+        np.testing.assert_allclose(
+            py_results.att, r_twfe_results["att"], rtol=1e-3,
+            err_msg=f"ATT mismatch: Python={py_results.att:.6f}, R={r_twfe_results['att']:.6f}",
+        )
+
+    def test_se_matches_r_twfe(self, r_twfe_results, r_benchmark_panel_data):
+        """Cluster-robust SE within rtol=0.01 (1%) of R's fixest."""
+        data, _ = r_benchmark_panel_data
+
+        py_results = self._run_python_twfe(data)
+
+        np.testing.assert_allclose(
+            py_results.se, r_twfe_results["se"], rtol=0.01,
+            err_msg=f"SE mismatch: Python={py_results.se:.6f}, R={r_twfe_results['se']:.6f}",
+        )
+
+    def test_pvalue_matches_r_twfe(self, r_twfe_results, r_benchmark_panel_data):
+        """P-value within atol=0.01 of R's fixest."""
+        data, _ = r_benchmark_panel_data
+
+        py_results = self._run_python_twfe(data)
+
+        np.testing.assert_allclose(
+            py_results.p_value, r_twfe_results["p_value"], atol=0.01,
+            err_msg=f"P-value mismatch: Python={py_results.p_value:.6f}, R={r_twfe_results['p_value']:.6f}",
+        )
+
+    def test_ci_matches_r_twfe(self, r_twfe_results, r_benchmark_panel_data):
+        """CI bounds within rtol=0.01 (1%) of R's fixest."""
+        data, _ = r_benchmark_panel_data
+
+        py_results = self._run_python_twfe(data)
+
+        np.testing.assert_allclose(
+            py_results.conf_int[0], r_twfe_results["ci_lower"], rtol=0.01,
+            err_msg=f"CI lower mismatch: Python={py_results.conf_int[0]:.6f}, R={r_twfe_results['ci_lower']:.6f}",
+        )
+        np.testing.assert_allclose(
+            py_results.conf_int[1], r_twfe_results["ci_upper"], rtol=0.01,
+            err_msg=f"CI upper mismatch: Python={py_results.conf_int[1]:.6f}, R={r_twfe_results['ci_upper']:.6f}",
+        )
+
+    def test_att_matches_r_with_covariate(
+        self, r_twfe_results_with_covariate, r_benchmark_panel_data_with_covariate
+    ):
+        """ATT with demeaned covariate within rtol=1e-3 of R."""
+        data, _ = r_benchmark_panel_data_with_covariate
+
+        py_results = self._run_python_twfe(data, covariates=["x1"])
+
+        np.testing.assert_allclose(
+            py_results.att, r_twfe_results_with_covariate["att"], rtol=1e-3,
+            err_msg=f"ATT w/ cov mismatch: Python={py_results.att:.6f}, R={r_twfe_results_with_covariate['att']:.6f}",
+        )
+
+    def test_se_matches_r_with_covariate(
+        self, r_twfe_results_with_covariate, r_benchmark_panel_data_with_covariate
+    ):
+        """SE with covariate within rtol=0.01 of R."""
+        data, _ = r_benchmark_panel_data_with_covariate
+
+        py_results = self._run_python_twfe(data, covariates=["x1"])
+
+        np.testing.assert_allclose(
+            py_results.se, r_twfe_results_with_covariate["se"], rtol=0.01,
+            err_msg=f"SE w/ cov mismatch: Python={py_results.se:.6f}, R={r_twfe_results_with_covariate['se']:.6f}",
+        )
+
+
+# =============================================================================
+# Phase 3: Edge Cases (from REGISTRY.md)
+# =============================================================================
+
+
+class TestTWFEEdgeCases:
+    """Test all edge cases documented in docs/methodology/REGISTRY.md."""
+
+    def test_staggered_treatment_warning_multiperiod_time(self):
+        """Staggered treatment warning fires when `time` is multi-valued.
+
+        This tests the multi-period `time` scenario. When `time` has actual
+        period values (not binary 0/1), the staggered check can detect
+        different cohorts starting treatment at different periods. We use
+        `time="period"` here because the standard binary `time="post"`
+        configuration cannot detect staggering (see
+        test_staggered_warning_not_fired_with_binary_time).
+        """
+        np.random.seed(42)
+        data = []
+        for unit in range(20):
+            # Units 0-4: treated at period 2
+            # Units 5-9: treated at period 3
+            # Units 10-19: never treated
+            for period in range(5):
+                if unit < 5:
+                    treated = 1 if period >= 2 else 0
+                elif unit < 10:
+                    treated = 1 if period >= 3 else 0
+                else:
+                    treated = 0
+                y = 10.0 + unit * 0.1 + period * 0.5 + treated * 3.0 + np.random.normal(0, 0.5)
+                data.append({
+                    "unit": unit, "period": period, "treated": treated,
+                    "outcome": y,
+                })
+        df = pd.DataFrame(data)
+
+        twfe = TwoWayFixedEffects(robust=True)
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            # Use time="period" so staggered detection sees different first-treat times
+            twfe.fit(df, outcome="outcome", treatment="treated", time="period", unit="unit")
+
+        staggered_warnings = [x for x in w if "Staggered treatment" in str(x.message)]
+        assert len(staggered_warnings) > 0, "Expected staggered treatment warning"
+
+        # Multi-period time warning also fires (time="period" has 5 unique values)
+        multiperiod_warnings = [x for x in w if "unique values" in str(x.message)]
+        assert len(multiperiod_warnings) > 0, (
+            "Expected multi-period time warning when time='period' with 5 values"
+        )
+
+    def test_staggered_warning_not_fired_with_binary_time(self):
+        """Staggered warning does NOT fire with binary time (known limitation).
+
+        When `time` is a binary post indicator (0/1), all treated units appear
+        to start treatment at time=1, so unique_treat_times=[1] and the
+        staggered check cannot distinguish cohorts. This is a documented
+        limitation — users with staggered designs should use `decompose()` or
+        `CallawaySantAnna` directly.
+        """
+        np.random.seed(42)
+        data = []
+        for unit in range(20):
+            # Units 0-4: treated at period 2 (early cohort)
+            # Units 5-9: treated at period 3 (late cohort)
+            # Units 10-19: never treated
+            for period in range(5):
+                if unit < 5:
+                    treated = 1 if period >= 2 else 0
+                elif unit < 10:
+                    treated = 1 if period >= 3 else 0
+                else:
+                    treated = 0
+                post = 1 if period >= 2 else 0
+                y = 10.0 + unit * 0.1 + period * 0.5 + treated * 3.0 + np.random.normal(0, 0.5)
+                data.append({
+                    "unit": unit, "period": period, "post": post,
+                    "treated": treated, "outcome": y,
+                })
+        df = pd.DataFrame(data)
+
+        twfe = TwoWayFixedEffects(robust=True)
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            # With binary time="post", staggering is undetectable
+            twfe.fit(df, outcome="outcome", treatment="treated", time="post", unit="unit")
+
+        staggered_warnings = [x for x in w if "Staggered treatment" in str(x.message)]
+        assert len(staggered_warnings) == 0, (
+            "Staggered warning should NOT fire with binary time (known limitation)"
+        )
+
+    def test_multiperiod_time_warning(self):
+        """Multi-period time column triggers UserWarning advising binary post indicator."""
+        data = generate_twfe_panel(n_units=20, n_periods=4, seed=42)
+
+        twfe = TwoWayFixedEffects(robust=True)
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            twfe.fit(data, outcome="outcome", treatment="treated", time="period", unit="unit")
+
+        multiperiod_warnings = [x for x in w if "unique values" in str(x.message)]
+        assert len(multiperiod_warnings) > 0, (
+            "Expected multi-period time warning when time has >2 unique values"
+        )
+        msg = str(multiperiod_warnings[0].message)
+        assert "binary" in msg, "Warning should mention binary post indicator"
+        assert "post" in msg, "Warning should mention post indicator"
+
+    def test_binary_time_no_multiperiod_warning(self):
+        """Binary time column does NOT trigger multi-period time warning."""
+        data = generate_hand_calculable_panel()
+
+        twfe = TwoWayFixedEffects(robust=True)
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            twfe.fit(data, outcome="outcome", treatment="treated", time="post", unit="unit")
+
+        multiperiod_warnings = [x for x in w if "unique values" in str(x.message)]
+        assert len(multiperiod_warnings) == 0, (
+            "Multi-period time warning should NOT fire with binary time"
+        )
+
+    def test_non_binary_time_values_warning(self):
+        """Non-{0,1} binary time values emit warning but ATT is correct."""
+        data = generate_hand_calculable_panel()
+        data["year"] = data["post"].map({0: 2020, 1: 2021})
+
+        twfe = TwoWayFixedEffects(robust=True)
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            results = twfe.fit(
+                data, outcome="outcome", treatment="treated", time="year", unit="unit"
+            )
+
+        non_binary_warnings = [x for x in w if "instead of {0, 1}" in str(x.message)]
+        assert len(non_binary_warnings) > 0, (
+            "Expected warning about non-{0,1} binary time values"
+        )
+        assert np.isfinite(results.att), "ATT should be finite"
+        np.testing.assert_allclose(results.att, 3.0, rtol=1e-10)
+
+    def test_boolean_time_no_warning(self):
+        """Boolean time values ({False, True}) do NOT emit non-{0,1} warning."""
+        data = generate_hand_calculable_panel()
+        data["post_bool"] = data["post"].astype(bool)
+
+        twfe = TwoWayFixedEffects(robust=True)
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            twfe.fit(
+                data, outcome="outcome", treatment="treated",
+                time="post_bool", unit="unit",
+            )
+
+        non_binary_warnings = [x for x in w if "instead of {0, 1}" in str(x.message)]
+        assert len(non_binary_warnings) == 0, (
+            "Boolean time values should NOT trigger non-{0,1} warning"
+        )
+
+    def test_att_invariant_to_time_encoding(self):
+        """ATT, SE, and p-value are identical for {0,1} vs {2020,2021} time encoding."""
+        data = generate_hand_calculable_panel()
+
+        # Fit with binary {0,1}
+        twfe = TwoWayFixedEffects(robust=True)
+        results_binary = twfe.fit(
+            data, outcome="outcome", treatment="treated", time="post", unit="unit"
+        )
+
+        # Fit with year encoding {2020, 2021}
+        data["year"] = data["post"].map({0: 2020, 1: 2021})
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore")
+            results_year = twfe.fit(
+                data, outcome="outcome", treatment="treated", time="year", unit="unit"
+            )
+
+        np.testing.assert_allclose(
+            results_binary.att, results_year.att, rtol=1e-10,
+            err_msg="ATT should be invariant to time encoding",
+        )
+        np.testing.assert_allclose(
+            results_binary.se, results_year.se, rtol=1e-10,
+            err_msg="SE should be invariant to time encoding",
+        )
+        np.testing.assert_allclose(
+            results_binary.p_value, results_year.p_value, rtol=1e-10,
+            err_msg="P-value should be invariant to time encoding",
+        )
+
+    def test_auto_clusters_at_unit_level(self):
+        """SE with cluster=None (default) equals SE when explicitly passing cluster='unit'."""
+        data = generate_twfe_panel(n_units=20, n_periods=4, seed=42)
+
+        # Default (auto-clusters at unit)
+        twfe_default = TwoWayFixedEffects(robust=True)
+        results_default = twfe_default.fit(
+            data, outcome="outcome", treatment="treated", time="post", unit="unit"
+        )
+
+        # Explicit cluster at unit
+        twfe_explicit = TwoWayFixedEffects(robust=True, cluster="unit")
+        results_explicit = twfe_explicit.fit(
+            data, outcome="outcome", treatment="treated", time="post", unit="unit"
+        )
+
+        np.testing.assert_allclose(
+            results_default.se, results_explicit.se, rtol=1e-12,
+        )
+        # Config should be immutable
+        assert twfe_default.cluster is None
+
+    def test_df_adjustment_for_absorbed_fe(self):
+        """
+        Verify degrees-of-freedom adjustment for absorbed fixed effects.
+
+        TWFE applies df_adjustment = n_units + n_times - 2 to account for
+        absorbed FE. Verify the SE matches a manual LinearRegression with
+        the same df adjustment.
+        """
+        data = generate_twfe_panel(n_units=20, n_periods=2, noise_sd=0.5, seed=42)
+
+        # Run TWFE
+        twfe = TwoWayFixedEffects(robust=True)
+        results = twfe.fit(
+            data, outcome="outcome", treatment="treated", time="post", unit="unit"
+        )
+
+        # Manual: demean both y and the interaction, then run LinearRegression
+        data_with_tp = data.copy()
+        data_with_tp["tp"] = data["treated"] * data["post"]
+        demeaned = within_transform(data_with_tp, ["outcome", "tp"], "unit", "period")
+        y = demeaned["outcome_demeaned"].values
+        tp = demeaned["tp_demeaned"].values
+        X = np.column_stack([np.ones(len(y)), tp])
+
+        n_units = data["unit"].nunique()
+        n_times = data["period"].nunique()
+        df_adjustment = n_units + n_times - 2
+        cluster_ids = data["unit"].values
+
+        reg = LinearRegression(
+            include_intercept=False,
+            robust=True,
+            cluster_ids=cluster_ids,
+            rank_deficient_action="silent",
+        ).fit(X, y, df_adjustment=df_adjustment)
+        manual_se = reg.get_inference(1).se
+
+        np.testing.assert_allclose(
+            results.se, manual_se, rtol=1e-10,
+            err_msg=f"SE df-adjustment mismatch: TWFE={results.se:.8f}, manual={manual_se:.8f}",
+        )
+
+    def test_covariate_collinear_with_interaction_raises_error(self):
+        """Covariate identical to treatment*post interaction causes rank deficiency.
+
+        Adding bad_cov = treated * post duplicates the internal _treatment_post
+        variable, making the demeaned design matrix rank-deficient.
+        """
+        data = pd.DataFrame({
+            "unit": [0, 0, 1, 1, 2, 2, 3, 3],
+            "period": [0, 1, 0, 1, 0, 1, 0, 1],
+            "treated": [1, 1, 1, 1, 0, 0, 0, 0],
+            "post": [0, 1, 0, 1, 0, 1, 0, 1],
+            "outcome": [10.0, 11.0, 12.0, 13.0, 8.0, 9.0, 6.0, 7.0],
+        })
+
+        # bad_cov = treated * post duplicates the internal _treatment_post column
+        data["bad_cov"] = data["treated"] * data["post"]
+
+        twfe = TwoWayFixedEffects(robust=True, rank_deficient_action="error")
+        with pytest.raises(ValueError):
+            twfe.fit(
+                data, outcome="outcome", treatment="treated", time="post",
+                unit="unit", covariates=["bad_cov"],
+            )
+
+    def test_covariate_collinearity_warns_not_errors(self):
+        """Collinear covariate emits warning but ATT is still finite."""
+        data = generate_twfe_panel(n_units=20, n_periods=2, seed=42)
+        # Add a covariate that's collinear with treatment*post
+        data["bad_cov"] = data["treated"] * data["post"]
+
+        twfe = TwoWayFixedEffects(robust=True, rank_deficient_action="warn")
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            results = twfe.fit(
+                data,
+                outcome="outcome",
+                treatment="treated",
+                time="post",
+                unit="unit",
+                covariates=["bad_cov"],
+            )
+
+        collinear_warnings = [x for x in w if "collinear" in str(x.message).lower()]
+        assert len(collinear_warnings) > 0, "Expected collinearity warning"
+        assert np.isfinite(results.att), "ATT should be finite despite collinearity"
+        # ATT should be in reasonable range of true effect (3.0)
+        assert abs(results.att - 3.0) < 1.5, f"ATT={results.att} far from true effect 3.0"
+
+    def test_rank_deficient_action_error_raises(self):
+        """rank_deficient_action='error' raises ValueError on rank-deficient data."""
+        data = generate_twfe_panel(n_units=20, n_periods=2, seed=42)
+        data["bad_cov"] = data["treated"] * data["post"]
+
+        twfe = TwoWayFixedEffects(robust=True, rank_deficient_action="error")
+        with pytest.raises(ValueError):
+            twfe.fit(
+                data,
+                outcome="outcome",
+                treatment="treated",
+                time="post",
+                unit="unit",
+                covariates=["bad_cov"],
+            )
+
+    def test_rank_deficient_action_silent_no_warning(self):
+        """rank_deficient_action='silent' emits no warnings."""
+        data = generate_twfe_panel(n_units=20, n_periods=2, seed=42)
+        data["bad_cov"] = data["treated"] * data["post"]
+
+        twfe = TwoWayFixedEffects(robust=True, rank_deficient_action="silent")
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            results = twfe.fit(
+                data,
+                outcome="outcome",
+                treatment="treated",
+                time="post",
+                unit="unit",
+                covariates=["bad_cov"],
+            )
+
+        collinear_warnings = [x for x in w if "collinear" in str(x.message).lower()]
+        assert len(collinear_warnings) == 0, "Expected no collinearity warnings with silent"
+        assert np.isfinite(results.att)
+
+    def test_unbalanced_panel_produces_valid_results(self):
+        """Dropping some unit-period observations still gives valid results."""
+        data = generate_twfe_panel(n_units=20, n_periods=4, seed=42)
+
+        # Drop some observations to create unbalanced panel
+        drop_indices = [3, 7, 15, 22, 45, 60]
+        data = data.drop(index=drop_indices).reset_index(drop=True)
+
+        twfe = TwoWayFixedEffects(robust=True)
+        results = twfe.fit(
+            data, outcome="outcome", treatment="treated", time="post", unit="unit"
+        )
+
+        assert np.isfinite(results.att), "ATT should be finite for unbalanced panel"
+        assert results.se > 0, "SE should be positive"
+        assert results.n_obs == len(data)
+
+    def test_unit_column_missing_raises_error(self):
+        """Missing unit column raises ValueError."""
+        data = generate_hand_calculable_panel()
+
+        twfe = TwoWayFixedEffects(robust=True)
+        with pytest.raises(ValueError, match="not found"):
+            twfe.fit(
+                data, outcome="outcome", treatment="treated",
+                time="post", unit="nonexistent_unit",
+            )
+
+    def test_decompose_integration(self):
+        """decompose() returns BaconDecompositionResults for staggered data."""
+        from diff_diff.bacon import BaconDecompositionResults
+
+        np.random.seed(42)
+        data = []
+        for unit in range(30):
+            if unit < 10:
+                first_treat = 3
+            elif unit < 20:
+                first_treat = 4
+            else:
+                first_treat = 0  # never treated
+
+            for period in range(1, 6):
+                treated = 1 if (first_treat > 0 and period >= first_treat) else 0
+                y = 10.0 + unit * 0.1 + period * 0.5 + treated * 2.0 + np.random.normal(0, 0.5)
+                data.append({
+                    "unit": unit,
+                    "period": period,
+                    "outcome": y,
+                    "first_treat": first_treat,
+                })
+
+        df = pd.DataFrame(data)
+
+        twfe = TwoWayFixedEffects(robust=True)
+        decomp = twfe.decompose(
+            df, outcome="outcome", unit="unit", time="period", first_treat="first_treat"
+        )
+
+        assert isinstance(decomp, BaconDecompositionResults)
+        assert len(decomp.comparisons) > 0
+
+
+# =============================================================================
+# Phase 4: SE Verification
+# =============================================================================
+
+
+class TestTWFESEVerification:
+    """Verify standard error properties."""
+
+    def test_cluster_se_differs_from_hc1_se(self):
+        """
+        Cluster-robust SE differs from HC1 SE, verifying auto-clustering is active.
+
+        TWFE auto-clusters at unit level. We manually compute HC1 SE on the
+        same demeaned data (demeaned by unit + post, matching TWFE) and verify
+        the SEs are different, proving clustering changes inference.
+        """
+        data = generate_twfe_panel(n_units=20, n_periods=4, seed=42)
+
+        # TWFE: cluster-robust at unit (automatic)
+        twfe = TwoWayFixedEffects(robust=True)
+        twfe_results = twfe.fit(
+            data, outcome="outcome", treatment="treated", time="post", unit="unit"
+        )
+
+        # Manual HC1 SE on same demeaned regression (no clustering)
+        # Demean by unit + post to match TWFE's within-transform
+        data_with_tp = data.copy()
+        data_with_tp["tp"] = data["treated"] * data["post"]
+        demeaned = within_transform(data_with_tp, ["outcome", "tp"], "unit", "post")
+        y = demeaned["outcome_demeaned"].values
+        tp = demeaned["tp_demeaned"].values
+        X = np.column_stack([np.ones(len(y)), tp])
+        n_units = data["unit"].nunique()
+        n_times = data["post"].nunique()
+        df_adjustment = n_units + n_times - 2
+
+        hc1_reg = LinearRegression(
+            include_intercept=False,
+            robust=True,
+            cluster_ids=None,  # HC1, no clustering
+            rank_deficient_action="silent",
+        ).fit(X, y, df_adjustment=df_adjustment)
+        hc1_se = hc1_reg.get_inference(1).se
+
+        # Verify SEs are different (auto-clustering is active)
+        assert twfe_results.se != hc1_se, (
+            f"Cluster SE ({twfe_results.se:.6f}) should differ from "
+            f"HC1 SE ({hc1_se:.6f}) — auto-clustering must be active"
+        )
+
+        # Also verify TWFE SE matches a manually computed cluster SE
+        cluster_reg = LinearRegression(
+            include_intercept=False,
+            robust=True,
+            cluster_ids=data["unit"].values,
+            rank_deficient_action="silent",
+        ).fit(X, y, df_adjustment=df_adjustment)
+        manual_cluster_se = cluster_reg.get_inference(1).se
+
+        np.testing.assert_allclose(
+            twfe_results.se, manual_cluster_se, rtol=1e-10,
+            err_msg="TWFE SE should match manually computed cluster SE"
+        )
+
+    def test_vcov_positive_semidefinite(self):
+        """VCoV matrix should be positive semi-definite."""
+        data = generate_twfe_panel(n_units=20, n_periods=4, seed=42)
+
+        twfe = TwoWayFixedEffects(robust=True)
+        results = twfe.fit(
+            data, outcome="outcome", treatment="treated", time="post", unit="unit"
+        )
+
+        eigenvalues = np.linalg.eigvalsh(results.vcov)
+        assert np.all(eigenvalues >= -1e-10), (
+            f"VCoV has negative eigenvalues: {eigenvalues[eigenvalues < -1e-10]}"
+        )
+
+
+# =============================================================================
+# Phase 5: Wild Bootstrap
+# =============================================================================
+
+
+class TestTWFEWildBootstrap:
+    """Verify wild cluster bootstrap inference."""
+
+    def test_wild_bootstrap_produces_valid_inference(self, ci_params):
+        """Wild bootstrap produces finite SE and valid p-value."""
+        data = generate_twfe_panel(n_units=20, n_periods=2, seed=42)
+        n_boot = ci_params.bootstrap(999, min_n=199)
+
+        twfe = TwoWayFixedEffects(
+            robust=True, inference="wild_bootstrap", n_bootstrap=n_boot, seed=42
+        )
+        results = twfe.fit(
+            data, outcome="outcome", treatment="treated", time="post", unit="unit"
+        )
+
+        assert np.isfinite(results.se) and results.se > 0
+        assert 0 <= results.p_value <= 1
+        assert results.inference_method == "wild_bootstrap"
+
+    @pytest.mark.parametrize("weight_type", ["rademacher", "mammen", "webb"])
+    def test_wild_bootstrap_weight_types(self, ci_params, weight_type):
+        """Each bootstrap weight type produces valid inference."""
+        data = generate_twfe_panel(n_units=20, n_periods=2, seed=42)
+        n_boot = ci_params.bootstrap(199, min_n=99)
+
+        twfe = TwoWayFixedEffects(
+            robust=True,
+            inference="wild_bootstrap",
+            n_bootstrap=n_boot,
+            bootstrap_weights=weight_type,
+            seed=42,
+        )
+        results = twfe.fit(
+            data, outcome="outcome", treatment="treated", time="post", unit="unit"
+        )
+
+        assert np.isfinite(results.se) and results.se > 0
+        assert 0 <= results.p_value <= 1
+
+    def test_inference_parameter_routing(self):
+        """inference='wild_bootstrap' routes to wild bootstrap method."""
+        data = generate_twfe_panel(n_units=20, n_periods=2, seed=42)
+
+        twfe = TwoWayFixedEffects(
+            robust=True, inference="wild_bootstrap", n_bootstrap=99, seed=42
+        )
+        results = twfe.fit(
+            data, outcome="outcome", treatment="treated", time="post", unit="unit"
+        )
+
+        assert results.inference_method == "wild_bootstrap"
+
+
+# =============================================================================
+# Phase 6: Params & Results
+# =============================================================================
+
+
+class TestTWFEParamsAndResults:
+    """Verify sklearn-like parameter interface and results completeness."""
+
+    def test_get_params_returns_all_parameters(self):
+        """All inherited constructor params present in get_params()."""
+        twfe = TwoWayFixedEffects(robust=True)
+        params = twfe.get_params()
+
+        expected_keys = {
+            "robust", "cluster", "alpha", "inference",
+            "n_bootstrap", "bootstrap_weights", "seed",
+            "rank_deficient_action",
+        }
+        assert expected_keys.issubset(params.keys()), (
+            f"Missing params: {expected_keys - params.keys()}"
+        )
+
+    def test_set_params_modifies_attributes(self):
+        """set_params() modifies estimator attributes."""
+        twfe = TwoWayFixedEffects(robust=True)
+        twfe.set_params(alpha=0.10, robust=False)
+
+        assert twfe.alpha == 0.10
+        assert twfe.robust is False
+
+    def test_summary_contains_key_info(self):
+        """summary() output contains ATT."""
+        data = generate_hand_calculable_panel()
+        twfe = TwoWayFixedEffects(robust=True)
+        results = twfe.fit(
+            data, outcome="outcome", treatment="treated", time="post", unit="unit"
+        )
+
+        summary = results.summary()
+        assert "ATT" in summary
+
+    def test_to_dict_contains_all_fields(self):
+        """to_dict() contains required fields."""
+        data = generate_hand_calculable_panel()
+        twfe = TwoWayFixedEffects(robust=True)
+        results = twfe.fit(
+            data, outcome="outcome", treatment="treated", time="post", unit="unit"
+        )
+
+        d = results.to_dict()
+        for key in ["att", "se", "t_stat", "p_value", "n_obs"]:
+            assert key in d, f"Missing key '{key}' in to_dict()"
+
+    def test_residuals_plus_fitted_equals_demeaned_outcome(self):
+        """Check residuals + fitted = demeaned outcome (not raw outcome).
+
+        TWFE demeans by unit + time (where time is the `time` parameter).
+        The demeaned outcome is the within-transformed y.
+        """
+        data = generate_twfe_panel(n_units=20, n_periods=4, seed=42)
+
+        twfe = TwoWayFixedEffects(robust=True)
+        results = twfe.fit(
+            data, outcome="outcome", treatment="treated", time="post", unit="unit"
+        )
+
+        # Within-transform by unit + post (same as TWFE internally does)
+        demeaned = within_transform(data, ["outcome"], "unit", "post")
+        y_demeaned = demeaned["outcome_demeaned"].values
+
+        reconstructed = results.residuals + results.fitted_values
+        np.testing.assert_allclose(
+            reconstructed, y_demeaned, rtol=1e-10,
+            err_msg="residuals + fitted_values should equal demeaned outcome",
+        )