zepid.causal.gformula.TimeFixed.TimeFixedGFormula¶

class
zepid.causal.gformula.TimeFixed.
TimeFixedGFormula
(df, exposure, outcome, exposure_type='binary', outcome_type='binary', standardize='population', weights=None)¶ Gformula for timefixed exposure and single endpoint, also referred to as the gcomputation algorithm formula. Uses the Snowden trick to calculate the marginal treatment under the specified exposure plan
The gformula can be expressed as
\[E[Y^a] = \sum_l E[YA=a,L=l] \times \Pr(L=l)\]When L is continuous, the summation becomes an integral.
Currently, TimeFixedGFormula only supports binary or continuous outcomes. For binary outcomes a logistic regression model to predict probabilities of outcomes via statsmodels. For continuous outcomes a linear regression or a Poisson regression model can be used to predict outcomes.
Binary and multivariate exposures are supported. For binary exposures, a string object of the column name for the exposure of interest should be provided. For multivariate exposures, a list of string objects corresponding to disjoint indicator terms for the exposure should be provided. Multivariate exposures require the user to custom specify treatments when fitting the gformula. A list of the custom treatment must be provided and be the same length as the number of disjoint indicator columns. See https://github.com/pzivich/PythonforEpidemiologists/tree/master/3_Epidemiology_Analysis/c_causal_inference/1_timefixedtreatments for examples (highly recommended)
Key options for treatments:
 ‘all’ all individuals are given treatment
 ‘none’ no individuals are given treatment
 custom treatments create a custom treatment. When specifying this, the dataframe must be referred to as ‘g’.
The following is an example that selects those whose age is 30 or younger and are females:
treatment="((g['age0']<=30) & (g['male']==0))
Note
Custom treatments use a “magicg” parameter. Internally, the gformula implementation names the data set as g. Therefore, when using custom treatment specifications, the data set must be referred to as g when following the pandas selection syntax
Parameters:  df (DataFrame) – Pandas dataframe containing the variables of interest
 exposure (str, list) – Column name for exposure variable label or a list of disjoint indicator exposures
 outcome (str) – Column name for outcome variable
 outcome_type (str, optional) – Outcome variable type. Currently only ‘binary’, ‘normal’, and ‘poisson variable types are supported
 standardize (str, optional) – Who the estimate corresponds to. Options are the entire population, the exposed, or the unexposed. See Sato & Matsuyama Epidemiology (2003) for details on weighting to exposed/unexposed. Weighting to the exposed or unexposed is also referred to as SMR weighting. Options for standardization are: * ‘population’ : weight to entire population * ‘exposed’ : weight to exposed individuals * ‘unexposed’ : weight to unexposed individuals
 weights (str, optional) – Column name for weights. Default is None, which assumes every observations has the same weight (i.e. 1)
Examples
Setting up the environment
>>> from zepid import load_sample_data, spline >>> from zepid.causal.gformula import TimeFixedGFormula >>> df = load_sample_data(timevary=False) >>> df[['cd4_rs1', 'cd4_rs2']] = spline(df, 'cd40', n_knots=3, term=2, restricted=True) >>> df[['age_rs1', 'age_rs2']] = spline(df, 'age0', n_knots=3, term=2, restricted=True)
Gformula with a binary treatment and outcome
>>> g = TimeFixedGFormula(df, exposure='art', outcome='dead') >>> g.outcome_model(model='art + male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0')
>>> # Return the estimated marginal outcome under treatall >>> g.fit(treatment='all') >>> g.marginal_outcome
>>> # Return the estimated marginal outcome under treatnone >>> g.fit(treatment='all') >>> g.marginal_outcome
>>> # Return the estimated marginal outcome under custom treatment (treat all females under 40) >>> g.fit(treatment="((g['male']==0) & (g['age0']<=40))") >>> g.marginal_outcome
Gformula with a categorical treatment and binary outcome
>>> # Creating categorical variable for CD4 count >>> df['cd4_1'] = np.where(((df['cd40'] >= 200) & (df['cd40'] < 400)), 1, 0) >>> df['cd4_2'] = np.where(df['cd40'] >= 400, 1, 0)
>>> g = TimeFixedGFormula(df,exposure=['art_male', 'art_female'], outcome='dead', exposure_type='categorical') >>> g.outcome_model(model='cd4_1 + cd4_2 + art + male + age0 + age_rs1 + age_rs2 + dvl0')
>>> # Return marginal outcome under all in reference category (CD4 < 200) >>> g.fit(treatment=["False", "False"])
>>> # Return marginal outcome under all in category 1 (CD4 >= 200 & CD4 < 400) >>> g.fit(treatment=["True", "False"])
>>> # Return marginal outcome under all in category 2 (CD4 > 400) >>> g.fit(treatment=["False", "True"])
Gformula with binary exposure and continuous (normaldistributed) outcome
>>> g = TimeFixedGFormula(df,exposure='art', outcome='cd4', outcome_type='normal') >>> g.outcome_model(model='art + male + age0 + age_rs1 + age_rs2 + dvl0 + cd40 + cd4_rs1 + cd4_rs2')
Gformula with binary exposure and continuous (Poissondistributed) outcome
>>> g = TimeFixedGFormula(df,exposure='art', outcome='cd4', outcome_type='poisson') >>> g.outcome_model(model='art + male + age0 + age_rs1 + age_rs2 + dvl0 + cd40 + cd4_rs1 + cd4_rs2')
Gformula with binary outcome and exposure. With a stochastic treatment/intervention
>>> g = TimeFixedGFormula(df,exposure='art', outcome='cd4', outcome_type='poisson') >>> g.outcome_model(model='art + male + age0 + age_rs1 + age_rs2 + dvl0 + cd40 + cd4_rs1 + cd4_rs2') >>> g.fit_stochastic(p=0.75)
Gformula with binary outcome and exposure. With a conditional stochastic treatment/intervention
>>> g = TimeFixedGFormula(df,exposure='art', outcome='cd4') >>> g.outcome_model(model='art + male + age0 + age_rs1 + age_rs2 + dvl0 + cd40 + cd4_rs1 + cd4_rs2') >>> g.fit_stochastic(p=[0.65, 0.85], conditional=["g['male']==1", "g['male']==0"])
References
JM Snowden, S Rose, and KM Mortimer. “Implementation of Gcomputation on a simulated data set: demonstration of a causal inference technique.” American Journal of Epidemiology 173.7 (2011): 731738.
J Ahern, KE Colson, C MargersonZilko, A Hubbard, & S Galea. (2016). Predicting the population health impacts of community interventions: the case of alcohol outlets and binge drinking. American Journal of Public Health, 106(11), 19381943.
J Ahern, A Hubbard, & S Galea. (2009). Estimating the effects of potential public health interventions on population disease burden: a stepbystep illustration of causal inference methods. American Journal of Epidemiology, 169(9), 11401147.

__init__
(df, exposure, outcome, exposure_type='binary', outcome_type='binary', standardize='population', weights=None)¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
(df, exposure, outcome[, …])Initialize self. fit
(treatment[, predict_missing])Fit the parametric gformula as specified. fit_stochastic
(p[, conditional, samples, …])Fits the gformula for a stochastic intervention. outcome_model
(model[, print_results])Build the outcome regression model. plot_kde
([bw_method, fill, color])Generates a Kernel Density plot of the accuracy of the model predicted outcomes. run_diagnostics
([decimal])Runs diagnostics for the gformula regression model used.