zepid.causal.doublyrobust.AIPW.AIPTW

class zepid.causal.doublyrobust.AIPW.AIPTW(df, exposure, outcome, weights=None, alpha=0.05)

Augmented inverse probability of treatment weight estimator. This implementation calculates AIPTW for a time-fixed exposure and a single time-point outcome. AIPTW supports correcting for informative censoring (missing outcome data) through inverse probability of censoring/missingness weights.

AIPTW is a doubly robust estimator, with a desirable property. Both of the the g-formula and IPTW require that our parametric regression models are correctly specified. Instead, AIPTW allows us to have two ‘chances’ at getting the model correct. If either our outcome-model or treatment-model is correctly specified, then our estimate will be unbiased. This property does not hold for the variance (i.e. the variance will not be doubly robust)

The augment-inverse probability weight estimator is calculated from the following formula

\[\widehat{DR}(a) = \frac{YA}{\widehat{\Pr}(A=a|L)} - \frac{\hat{Y}^a*(A-\widehat{\Pr}(A=a|L)}{ \widehat{\Pr}(A=a|L)}\]

The risk difference and risk ratio are calculated using the following formulas, respectively

\[\widehat{RD} = \widehat{DR}(a=1) - \widehat{DR}(a=0)\]
\[\widehat{RR} = \frac{\widehat{DR}(a=1)}{\widehat{DR}(a=0)}\]

Confidence intervals for the risk difference come from the influence curve. Confidence intervals for the risk ratio are less straight-forward. To get confidence intervals for the risk ratio, a bootstrap procedure should be used.

Parameters:
  • df (DataFrame) – Pandas DataFrame object containing all variables of interest
  • exposure (str) – Column name of the exposure variable. Currently only binary is supported
  • outcome (str) – Column name of the outcome variable. Currently only binary is supported
  • weights (str, optional) – Column name of weights. Weights allow for items like sampling weights to be used to estimate effects
  • alpha (float, optional) – Alpha for confidence interval level. Default is 0.05, returning the 95% CL

Examples

Set up the environment and the data set

>>> from zepid import load_sample_data, spline
>>> from zepid.causal.doublyrobust import AIPTW
>>> df = load_sample_data(timevary=False).drop(columns=['cd4_wk45'])
>>> df[['cd4_rs1','cd4_rs2']] = spline(df,'cd40',n_knots=3,term=2,restricted=True)
>>> df[['age_rs1','age_rs2']] = spline(df,'age0',n_knots=3,term=2,restricted=True)

Estimate the base AIPTW model

>>> aipw = AIPTW(df, exposure='art', outcome='dead')
>>> aipw.exposure_model('male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0')
>>> aipw.outcome_model('art + male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0')
>>> aipw.fit()
>>> aipw.summary()

Estimate AIPTW accounting for missing outcome data

>>> aipw = AIPTW(df, exposure='art', outcome='dead')
>>> aipw.exposure_model('male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0')
>>> aipw.missing_model('art + male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0')
>>> aipw.outcome_model('art + male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0')
>>> aipw.fit()
>>> aipw.summary()

AIPTW for continuous outcomes

>>> df = load_sample_data(timevary=False).drop(columns=['dead'])
>>> df[['cd4_rs1','cd4_rs2']] = spline(df,'cd40',n_knots=3,term=2,restricted=True)
>>> df[['age_rs1','age_rs2']] = spline(df,'age0',n_knots=3,term=2,restricted=True)
>>> aipw = AIPTW(df, exposure='art', outcome='cd4_wk45')
>>> aipw.exposure_model('male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0')
>>> aipw.missing_model('art + male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0')
>>> aipw.outcome_model('art + male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0')
>>> aipw.fit()
>>> aipw.summary()
>>> aipw = AIPTW(df, exposure='art', outcome='cd4_wk45')
>>> ymodel = 'art + male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0'
>>> aipw.exposure_model('male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0')
>>> aipw.missing_model(ymodel)
>>> aipw.outcome_model(ymodel, continuous_distribution='poisson')
>>> aipw.fit()
>>> aipw.summary()

References

Funk MJ, Westreich D, Wiesen C, Stürmer T, Brookhart MA, & Davidian M. (2011). Doubly robust estimation of causal effects. American Journal of Epidemiology, 173(7), 761-767.

Lunceford JK, Davidian M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Statistics in medicine, 23(19), 2937-2960.

__init__(df, exposure, outcome, weights=None, alpha=0.05)

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(df, exposure, outcome[, weights, alpha]) Initialize self.
exposure_model(model[, custom_model, bound, …]) Specify the propensity score / inverse probability weight model.
fit() Calculate the augmented inverse probability weights and effect measures from the predicted exposure probabilities and predicted outcome values.
missing_model(model[, custom_model, bound, …]) Estimation of Pr(M=0|A,L), which is the missing data mechanism for the outcome.
outcome_model(model[, custom_model, …]) Specify the outcome model.
plot_kde(to_plot[, bw_method, fill, color, …]) Generates density plots that can be used to check predictions qualitatively.
plot_love([color_unweighted, …]) Generates a Love-plot to detail covariate balance based on the IPTW weights.
positivity([decimal]) Use this to assess whether positivity is a valid assumption for the exposure model / calculated IPTW.
run_diagnostics([decimal]) Run all currently implemented diagnostics for the exposure and outcome models.
standardized_mean_differences() Calculates the standardized mean differences for all variables based on the inverse probability weights.
summary([decimal]) Prints a summary of the results for the doubly robust estimator.