class zepid.causal.gformula.TimeVary.IterativeCondGFormula(df, exposures, outcomes)

Iterative conditional g-formula estimator. This time-varying parametric g-formula uses the iterative conditional approach (also referred to as the sequential regression). The iterative conditional estimator is useful for longitudinal data and requires less model specification than the Monte Carlo g-formula. The iterative conditional uses a mathematical trick to estimate the marginal outcome distribution at the end of follow-up

Unlike other implementations of the g-formula, the IterativeCondGFormula takes input data in a wide format. Additionally, treatments are specified by explicitly specifying the treatment plan array. See the examples for details

Currently, only binary exposures and a binary outcomes are supported. Logistic regression models are used to predict exposures and outcomes via statsmodels. See Kreif et al. 2017 for a good description of the iterative conditional g-formula. See http://zepid.readthedocs.io/en/latest/ for an example

  • df (DataFrame) – Pandas dataframe containing the variables of interest
  • exposures (list, array) – Treatment column label
  • outcomes (list, array) – Outcome column label


Process for the sequential regression g-formula 1) Identify individuals who followed the counterfactual treatment plan and had the outcome 2) Fit a regression model for the outcome at time t for Y 3) Predict outcomes under the observed treatment and the counterfactual treatment 4) Repeat regression model fitting for t-1 to min(t) 5) Take the mean predicted Y at the end to obtain the cumulative probability


Setting up the environment

>>> from zepid import load_longitudinal_data
>>> from zepid.causal.gformula import IterativeCondGFormula
>>> df = load_longitudinal_data()

Estimating the g-formula with the Monte Carlo estimator

>>> icgf = IterativeCondGFormula(df, exposures=['A1', 'A2', 'A3'], outcomes=['Y1', 'Y2', 'Y3'])
>>> # Specifying regression models for each treatment-outcome pair
>>> icgf.outcome_model(models=['A1 + L1', 'A2 + A1 + L2', 'A3 + A2 + L3'], print_results=False)
>>> # Estimating marginal 'Y3' under treat-all at every time
>>> icgf.fit(treatments=[1, 1, 1])
>>> print(icgf.marginal_outcome)
>>> # Estimating marginal 'Y3' under treat-none at every time
>>> icgf.fit(treatments=[0, 0, 0])
>>> print(icgf.marginal_outcome)

Custom treatments can be specified. Below is an example of treating everyone at the first and last time points

>>> # Estimating marginal 'Y3' under custom treatment plan
>>> icgf.fit(treatments=[1, 0, 1])
>>> print(icgf.marginal_outcome)

To estimate ‘Y2’, we can use a similar procedure but restrict our list of exposures and outcomes

>>> icgf = IterativeCondGFormula(df, exposures=['A1', 'A2'], outcomes=['Y1', 'Y2'])
>>> icgf.outcome_model(models=['A1 + L1', 'A2 + A1 + L2'], print_results=False)
>>> icgf.fit(treatments=[1, 1])
>>> print(icgf.marginal_outcome)


Kreif, N., Tran, L., Grieve, R., De Stavola, B., Tasker, R. C., & Petersen, M. (2017). Estimating the comparative effectiveness of feeding interventions in the pediatric intensive care unit: a demonstration of longitudinal targeted maximum likelihood estimation. American Journal of Epidemiology, 186(12), 1370-1379.

__init__(df, exposures, outcomes)

Initialize self. See help(type(self)) for accurate signature.


__init__(df, exposures, outcomes) Initialize self.
fit(treatments) Estimate the counterfactual outcomes under the specified treatment plan using the previously specified regression models
outcome_model(models[, print_results]) Add a specified regression model for the outcome.