class zepid.causal.doublyrobust.TMLE.StochasticTMLE(df, exposure, outcome, alpha=0.05, continuous_bound=0.0005, verbose=False)

Implementation of target maximum likelihood estimator for stochastic treatment plans. This implementation calculates TMLE for a time-fixed exposure and a single time-point outcome under a stochastic treatment plan of interest. By default, standard parametric regression models are used to calculate the estimate of interest. The StochasticTMLE estimator allows users to instead use machine learning algorithms from sklearn and PyGAM.


Valid confidence intervals are only attainable with certain machine learning algorithms. These algorithms must be Donsker class for valid confidence intervals. GAM and LASSO are examples of alogorithms that are Donsker class

  • df (DataFrame) – Pandas dataframe containing the variables of interest
  • exposure (str) – Column label for the exposure of interest
  • outcome (str) – Column label for the outcome of interest
  • alpha (float, optional) – Alpha for confidence interval level. Default is 0.05
  • continuous_bound (float, optional) – Optional argument to control the bounding feature for continuous outcomes. The bounding process may result in values of 0,1 which are undefined for logit(x). This parameter adds or substracts from the scenarios of 0,1 respectively. Default value is 0.0005
  • verbose (bool, optional) – Optional argument for verbose estimation. With verbose estimation, the model fits for each result are printed to the console. It is highly recommended to turn this parameter to True when conducting model diagnostics


TMLE is a doubly-robust substitution estimator. TMLE obtains the target estimate in a single step. The single-step TMLE is described further by van der Laan. For further details, see the listed references.

Continuous outcomes must be bounded between 0 and 1. TMLE does this automatically for the user. Additionally, the average treatment effect is estimate is back converted to the original scale of Y. When scaling Y as Y*, some values may take the value of 0 or 1, which breaks a logit(Y*) transformation. To avoid this issue, Y* is bounded by the continuous_bound argument. The default is 0.0005, the same as R’s tmle

Following is a general narrative of the estimation procedure for TMLE with stochastic treatments

1. Initial estimators for g-model (IPTW) and Q-model (g-formula) are fit. By default these estimators are based on parametric regression models. Additionally, machine learning algorithms can be used to estimate the g-model and Q-model.

  1. The auxiliary covariate is calculated (i.e. IPTW).
\[H = \frac{p}{\widehat{\Pr}(A=a)}\]

where p is the probability of treatment a under the stochastic intervention of interest.

3. Targeting step occurs through estimation of e via a logistic regression model. Briefly a weighted logistic regression model (weighted by the auxiliary covariates) with the dependent variable as the observed outcome and an offset term of the Q-model predictions under the observed treatment (A).

\[\text{logit}(Y) = \text{logit}(Q(A, W)) + \epsilon\]

4. Stochastic interventions are evaluated through Monte Carlo integration for binary treatments. The different treatment plans are randomly applied and evaluated through the Q-model and then the targeting step via

\[E[\text{logit}(Q(A=a, W)) + \hat{\epsilon}]\]

This process is repeated a large number of times and the point estimate is the average of those individual treatment plans.


Setting up environment

>>> from zepid import load_sample_data, spline
>>> from zepid.causal.doublyrobust import StochasticTMLE
>>> df = load_sample_data(False).dropna()
>>> df[['cd4_rs1', 'cd4_rs2']] = spline(df, 'cd40', n_knots=3, term=2, restricted=True)

Estimating TMLE for 0.2 being treated with ART

>>> tmle = StochasticTMLE(df, exposure='art', outcome='dead')
>>> tmle.exposure_model('male + age0 + cd40 + cd4_rs1 + cd4_rs2 + dvl0')
>>> tmle.outcome_model('art + male + age0 + cd40 + cd4_rs1 + cd4_rs2 + dvl0')
>>> tmle.fit(p=0.2)
>>> tmle.summary()

Estimating TMLE for conditional plan

>>> tmle = StochasticTMLE(df, exposure='art', outcome='dead')
>>> tmle.exposure_model('male + age0 + cd40 + cd4_rs1 + cd4_rs2 + dvl0')
>>> tmle.outcome_model('art + male + age0 + cd40 + cd4_rs1 + cd4_rs2 + dvl0')
>>> tmle.fit(p=[0.6, 0.4], conditional=["df['male']==1", "df['male']==0"])
>>> tmle.summary()

Estimating TMLE with machine learning algorithm from sklearn

>>> from sklearn.linear_model import LogisticRegression
>>> log1 = LogisticRegression(penalty='l1', random_state=201)
>>> tmle = StochasticTMLE(df, 'art', 'dead')
>>> tmle.exposure_model('male + age0 + cd40 + cd4_rs1 + cd4_rs2 + dvl0', custom_model=log1)
>>> tmle.outcome_model('male + age0 + cd40 + cd4_rs1 + cd4_rs2 + dvl0', custom_model=log1)
>>> tmle.fit(p=0.75)


Muñoz ID, and Van Der Laan MJ. Population intervention causal effects based on stochastic interventions. Biometrics 68.2 (2012): 541-549.

van der Laan MJ, and Sherri R. Targeted learning in data science: causal inference for complex longitudinal studies. Springer Science & Business Media, 2011.

__init__(df, exposure, outcome, alpha=0.05, continuous_bound=0.0005, verbose=False)

Initialize self. See help(type(self)) for accurate signature.


__init__(df, exposure, outcome[, alpha, …]) Initialize self.
est_conditional_variance(haw, y_obs, y_pred)
est_marginal_variance(haw, y_obs, y_pred, …)
exposure_model(model[, custom_model, bound]) Estimation of Pr(A=1|L), which is termed as g(A=1|L) in the literature.
fit(p[, conditional, samples, seed]) Calculate the effect from the predicted exposure probabilities and predicted outcome values using the TMLE procedure.
outcome_model(model[, custom_model, bound, …]) Estimation of E(Y|A,L), which is also written sometimes as Q(A,W) or Pr(Y=1|A,W).
run_diagnostics([decimal]) Provides some summary diagnostics for StochasticTMLE.
summary([decimal]) Prints summary of the estimated incidence under the specified treatment plan
targeting_step(y, q_init, iptw, verbose)