zepid.causal.doublyrobust.crossfit.SingleCrossfitAIPTW¶
-
class
zepid.causal.doublyrobust.crossfit.
SingleCrossfitAIPTW
(df, exposure, outcome, alpha=0.05)¶ Implementation of the Augmented Inverse Probability Weighting estimator with a cross-fit procedure. The purpose of the cross-fit procedure is to all for non-Donsker nuisance function estimators. Some of machine learning algorithms are non-Donsker. In practice this means that confidence interval coverage can be incorrect when certain nuisance function estimators are used. Additionally, bias may persist as well. Cross-fitting is meant to alleviate this issue, therefore cross-fitting with a doubly-robust estimator is recommended when using machine learning.
SingleCrossfitAIPTW uses a single cross-fit procedure, where the data set is paritioned into at least two non-overlapping splits. The nuisance function estimators are then estimated in each split. The estimated nuisance functions are then used to predict values in a non-overlapping split. This decouple the nuisance function estimation from the data used to estimate it
Note
Because of the repetitions of the procedure are needed to reduce variance determined by a particular partition, it can take a long time to run this code.
Parameters: - df (DataFrame) – Pandas dataframe containing all necessary variables
- exposure (str) – Label for treatment column in the pandas data frame
- outcome (str) – Label for outcome column in the pandas data frame
- alpha (float, optional) – Alpha for confidence interval level. Default is 0.05
Examples
Setting up environment
>>> from sklearn.linear_model import LogisticRegression >>> from zepid import load_sample_data >>> from zepid.causal.doublyrobust import SingleCrossfitAIPTW >>> df = load_sample_data(False).drop(columns='cd4_wk45').dropna()
Estimating the single cross-fit AIPTW
>>> scaipw = SingleCrossfitAIPTW(df, exposure='art', outcome='dead') >>> scaipw.exposure_model("male + age0 + cd40 + dvl0", estimator=LogisticRegression(solver='lbfgs')) >>> scaipw.outcome_model("art + male + age0 + cd40 + dvl0", estimator=LogisticRegression(solver='lbfgs')) >>> scaipw.fit(n_splits=5, n_partitions=100) >>> scaipw.summary()
References
Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W, & Robins J. (2018). “Double/debiased machine learning for treatment and structural parameters”. The Econometrics Journal 21:1; pC1–C6
-
__init__
(df, exposure, outcome, alpha=0.05)¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
(df, exposure, outcome[, alpha])Initialize self. exposure_model
(covariates, estimator[, bound])Specify the treatment nuisance model variables and estimator(s) to use. fit
([n_splits, n_partitions, method, …])Runs the crossfit estimation procedure with augmented inverse probability weighting estimator. outcome_model
(covariates, estimator)Specify the outcome nuisance model variables and estimator(s) to use. run_diagnostics
([color])Runs available diagnostics for the plots. summary
([decimal])Prints summary of model results