zepid.causal.doublyrobust.TMLE.StochasticTMLE¶

class
zepid.causal.doublyrobust.TMLE.
StochasticTMLE
(df, exposure, outcome, alpha=0.05, continuous_bound=0.0005, verbose=False)¶ Implementation of target maximum likelihood estimator for stochastic treatment plans. This implementation calculates TMLE for a timefixed exposure and a single timepoint outcome under a stochastic treatment plan of interest. By default, standard parametric regression models are used to calculate the estimate of interest. The StochasticTMLE estimator allows users to instead use machine learning algorithms from sklearn and PyGAM.
Note
Valid confidence intervals are only attainable with certain machine learning algorithms. These algorithms must be Donsker class for valid confidence intervals. GAM and LASSO are examples of alogorithms that are Donsker class
Parameters:  df (DataFrame) – Pandas dataframe containing the variables of interest
 exposure (str) – Column label for the exposure of interest
 outcome (str) – Column label for the outcome of interest
 alpha (float, optional) – Alpha for confidence interval level. Default is 0.05
 continuous_bound (float, optional) – Optional argument to control the bounding feature for continuous outcomes. The bounding process may result in values of 0,1 which are undefined for logit(x). This parameter adds or substracts from the scenarios of 0,1 respectively. Default value is 0.0005
 verbose (bool, optional) – Optional argument for verbose estimation. With verbose estimation, the model fits for each result are printed to the console. It is highly recommended to turn this parameter to True when conducting model diagnostics
Note
TMLE is a doublyrobust substitution estimator. TMLE obtains the target estimate in a single step. The singlestep TMLE is described further by van der Laan. For further details, see the listed references.
Continuous outcomes must be bounded between 0 and 1. TMLE does this automatically for the user. Additionally, the average treatment effect is estimate is back converted to the original scale of Y. When scaling Y as Y*, some values may take the value of 0 or 1, which breaks a logit(Y*) transformation. To avoid this issue, Y* is bounded by the continuous_bound argument. The default is 0.0005, the same as R’s tmle
Following is a general narrative of the estimation procedure for TMLE with stochastic treatments
1. Initial estimators for gmodel (IPTW) and Qmodel (gformula) are fit. By default these estimators are based on parametric regression models. Additionally, machine learning algorithms can be used to estimate the gmodel and Qmodel.
 The auxiliary covariate is calculated (i.e. IPTW).
\[H = \frac{p}{\widehat{\Pr}(A=a)}\]where p is the probability of treatment a under the stochastic intervention of interest.
3. Targeting step occurs through estimation of e via a logistic regression model. Briefly a weighted logistic regression model (weighted by the auxiliary covariates) with the dependent variable as the observed outcome and an offset term of the Qmodel predictions under the observed treatment (A).
\[\text{logit}(Y) = \text{logit}(Q(A, W)) + \epsilon\]4. Stochastic interventions are evaluated through Monte Carlo integration for binary treatments. The different treatment plans are randomly applied and evaluated through the Qmodel and then the targeting step via
\[E[\text{logit}(Q(A=a, W)) + \hat{\epsilon}]\]This process is repeated a large number of times and the point estimate is the average of those individual treatment plans.
Examples
Setting up environment
>>> from zepid import load_sample_data, spline >>> from zepid.causal.doublyrobust import StochasticTMLE >>> df = load_sample_data(False).dropna() >>> df[['cd4_rs1', 'cd4_rs2']] = spline(df, 'cd40', n_knots=3, term=2, restricted=True)
Estimating TMLE for 0.2 being treated with ART
>>> tmle = StochasticTMLE(df, exposure='art', outcome='dead') >>> tmle.exposure_model('male + age0 + cd40 + cd4_rs1 + cd4_rs2 + dvl0') >>> tmle.outcome_model('art + male + age0 + cd40 + cd4_rs1 + cd4_rs2 + dvl0') >>> tmle.fit(p=0.2) >>> tmle.summary()
Estimating TMLE for conditional plan
>>> tmle = StochasticTMLE(df, exposure='art', outcome='dead') >>> tmle.exposure_model('male + age0 + cd40 + cd4_rs1 + cd4_rs2 + dvl0') >>> tmle.outcome_model('art + male + age0 + cd40 + cd4_rs1 + cd4_rs2 + dvl0') >>> tmle.fit(p=[0.6, 0.4], conditional=["df['male']==1", "df['male']==0"]) >>> tmle.summary()
Estimating TMLE with machine learning algorithm from sklearn
>>> from sklearn.linear_model import LogisticRegression >>> log1 = LogisticRegression(penalty='l1', random_state=201) >>> tmle = StochasticTMLE(df, 'art', 'dead') >>> tmle.exposure_model('male + age0 + cd40 + cd4_rs1 + cd4_rs2 + dvl0', custom_model=log1) >>> tmle.outcome_model('male + age0 + cd40 + cd4_rs1 + cd4_rs2 + dvl0', custom_model=log1) >>> tmle.fit(p=0.75)
References
Muñoz ID, and Van Der Laan MJ. Population intervention causal effects based on stochastic interventions. Biometrics 68.2 (2012): 541549.
van der Laan MJ, and Sherri R. Targeted learning in data science: causal inference for complex longitudinal studies. Springer Science & Business Media, 2011.

__init__
(df, exposure, outcome, alpha=0.05, continuous_bound=0.0005, verbose=False)¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
(df, exposure, outcome[, alpha, …])Initialize self. est_conditional_variance
(haw, y_obs, y_pred)est_marginal_variance
(haw, y_obs, y_pred, …)exposure_model
(model[, custom_model, bound])Estimation of Pr(A=1L), which is termed as g(A=1L) in the literature. fit
(p[, conditional, samples, seed])Calculate the effect from the predicted exposure probabilities and predicted outcome values using the TMLE procedure. outcome_model
(model[, custom_model, bound, …])Estimation of E(YA,L), which is also written sometimes as Q(A,W) or Pr(Y=1A,W). run_diagnostics
([decimal])Provides some summary diagnostics for StochasticTMLE. summary
([decimal])Prints summary of the estimated incidence under the specified treatment plan targeting_step
(y, q_init, iptw, verbose)