zepid.causal.doublyrobust.crossfit.SingleCrossfitAIPTW¶

class zepid.causal.doublyrobust.crossfit.SingleCrossfitAIPTW(df, exposure, outcome, alpha=0.05)¶

Implementation of the Augmented Inverse Probability Weighting estimator with a cross-fit procedure. The purpose of the cross-fit procedure is to all for non-Donsker nuisance function estimators. Some of machine learning algorithms are non-Donsker. In practice this means that confidence interval coverage can be incorrect when certain nuisance function estimators are used. Additionally, bias may persist as well. Cross-fitting is meant to alleviate this issue, therefore cross-fitting with a doubly-robust estimator is recommended when using machine learning.

SingleCrossfitAIPTW uses a single cross-fit procedure, where the data set is paritioned into at least two non-overlapping splits. The nuisance function estimators are then estimated in each split. The estimated nuisance functions are then used to predict values in a non-overlapping split. This decouple the nuisance function estimation from the data used to estimate it

Note

Because of the repetitions of the procedure are needed to reduce variance determined by a particular partition, it can take a long time to run this code.

Parameters:	df (DataFrame) – Pandas dataframe containing all necessary variables exposure (str) – Label for treatment column in the pandas data frame outcome (str) – Label for outcome column in the pandas data frame alpha (float, optional) – Alpha for confidence interval level. Default is 0.05

Examples

Setting up environment

>>> from sklearn.linear_model import LogisticRegression
>>> from zepid import load_sample_data
>>> from zepid.causal.doublyrobust import SingleCrossfitAIPTW
>>> df = load_sample_data(False).drop(columns='cd4_wk45').dropna()

Estimating the single cross-fit AIPTW

>>> scaipw = SingleCrossfitAIPTW(df, exposure='art', outcome='dead')
>>> scaipw.exposure_model("male + age0 + cd40 + dvl0", estimator=LogisticRegression(solver='lbfgs'))
>>> scaipw.outcome_model("art + male + age0 + cd40 + dvl0", estimator=LogisticRegression(solver='lbfgs'))
>>> scaipw.fit(n_splits=5, n_partitions=100)
>>> scaipw.summary()

References

Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W, & Robins J. (2018). “Double/debiased machine learning for treatment and structural parameters”. The Econometrics Journal 21:1; pC1–C6

__init__(df, exposure, outcome, alpha=0.05)¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`__init__`(df, exposure, outcome[, alpha])	Initialize self.
`exposure_model`(covariates, estimator[, bound])	Specify the treatment nuisance model variables and estimator(s) to use.
`fit`([n_splits, n_partitions, method, …])	Runs the crossfit estimation procedure with augmented inverse probability weighting estimator.
`outcome_model`(covariates, estimator)	Specify the outcome nuisance model variables and estimator(s) to use.
`run_diagnostics`([color])	Runs available diagnostics for the plots.
`summary`([decimal])	Prints summary of model results