zepid.causal.doublyrobust.crossfit.SingleCrossfitAIPTW

class zepid.causal.doublyrobust.crossfit.SingleCrossfitAIPTW(df, exposure, outcome, alpha=0.05)

Implementation of the Augmented Inverse Probability Weighting estimator with a cross-fit procedure. The purpose of the cross-fit procedure is to all for non-Donsker nuisance function estimators. Some of machine learning algorithms are non-Donsker. In practice this means that confidence interval coverage can be incorrect when certain nuisance function estimators are used. Additionally, bias may persist as well. Cross-fitting is meant to alleviate this issue, therefore cross-fitting with a doubly-robust estimator is recommended when using machine learning.

SingleCrossfitAIPTW uses a single cross-fit procedure, where the data set is paritioned into at least two non-overlapping splits. The nuisance function estimators are then estimated in each split. The estimated nuisance functions are then used to predict values in a non-overlapping split. This decouple the nuisance function estimation from the data used to estimate it

Note

Because of the repetitions of the procedure are needed to reduce variance determined by a particular partition, it can take a long time to run this code.

Parameters:
  • df (DataFrame) – Pandas dataframe containing all necessary variables
  • exposure (str) – Label for treatment column in the pandas data frame
  • outcome (str) – Label for outcome column in the pandas data frame
  • alpha (float, optional) – Alpha for confidence interval level. Default is 0.05

Examples

Setting up environment

>>> from sklearn.linear_model import LogisticRegression
>>> from zepid import load_sample_data
>>> from zepid.causal.doublyrobust import SingleCrossfitAIPTW
>>> df = load_sample_data(False).drop(columns='cd4_wk45').dropna()

Estimating the single cross-fit AIPTW

>>> scaipw = SingleCrossfitAIPTW(df, exposure='art', outcome='dead')
>>> scaipw.exposure_model("male + age0 + cd40 + dvl0", estimator=LogisticRegression(solver='lbfgs'))
>>> scaipw.outcome_model("art + male + age0 + cd40 + dvl0", estimator=LogisticRegression(solver='lbfgs'))
>>> scaipw.fit(n_splits=5, n_partitions=100)
>>> scaipw.summary()

References

Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W, & Robins J. (2018). “Double/debiased machine learning for treatment and structural parameters”. The Econometrics Journal 21:1; pC1–C6

__init__(df, exposure, outcome, alpha=0.05)

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(df, exposure, outcome[, alpha]) Initialize self.
exposure_model(covariates, estimator[, bound]) Specify the treatment nuisance model variables and estimator(s) to use.
fit([n_splits, n_partitions, method, …]) Runs the crossfit estimation procedure with augmented inverse probability weighting estimator.
outcome_model(covariates, estimator) Specify the outcome nuisance model variables and estimator(s) to use.
run_diagnostics([color]) Runs available diagnostics for the plots.
summary([decimal]) Prints summary of model results