Here we show how to deal with confounding events, see Tsai (2024) for the full treatment. We first simulate some data with confounding events.
library(fastdid)
library(ggplot2)
library(data.table)
simdt <- sim_did(1e+04, 5, cov = "cont", hetero = "all", balanced = TRUE, seed = 1,
second_cohort = TRUE, second_het = "no") #comfounding event
dt <- simdt$dt #dataset
#ground truth att
att <- simdt$att |> merge(dt[,.(w = .N),by = "G"], by = "G")
att[, event_time := time-G]
att <- att[event == 1,.(att = weighted.mean(attgt, w)), by = "event_time"]
Using the default estimator, the estimates (black line) is biased from the ground truth (red line).
naive_result <-fastdid(data = dt,
timevar = "time", cohortvar = "G", unitvar = "unit",
outcomevar = "y", result_type = "dynamic")
plot_did_dynamics(naive_result) +
geom_line(aes(y = att, x = event_time),
data = att, color = "red") + theme_bw()
Diagnostics can be obtained by using time >= G2
as
outcome. As we set the effect of the confounding event at 10 constantly,
we can see that the bias of the default estimator is roughly 10 times
the diagnostics.
dt[, D2 := time >= G2]
diag <- fastdid(data = dt,
timevar = "time", cohortvar = "G", unitvar = "unit",
outcomevar = "D2", result_type = "dynamic")
plot_did_dynamics(diag) + theme_bw()
Using the double did estimator with cohortvar2
, the
estimator recovers the ground truth.
double_result <-fastdid(data = dt,
timevar = "time", cohortvar = "G", unitvar = "unit",
outcomevar = "y", result_type = "dynamic",
cohortvar2 = "G2", event_specific = TRUE)
plot_did_dynamics(double_result) +
geom_line(aes(y = att, x = event_time),
data = att, color = "red") + theme_bw()
Double DiD also allow for two additional aggregation scheme: group-group-time (“group_group_time”) and dynamic-staggered (“dynamic_stagger”, event time by event stagger, G1-G2).
double_result_ds <-fastdid(data = dt,
timevar = "time", cohortvar = "G", unitvar = "unit",
outcomevar = "y", result_type = "dynamic_stagger",
cohortvar2 = "G2", event_specific = TRUE)
double_result_ggt <-fastdid(data = dt,
timevar = "time", cohortvar = "G", unitvar = "unit",
outcomevar = "y", result_type = "group_group_time",
cohortvar2 = "G2", event_specific = TRUE)