zepid.causal.gformula.TimeFixed.SurvivalGFormula¶

class
zepid.causal.gformula.TimeFixed.
SurvivalGFormula
(df, idvar, exposure, outcome, time, weights=None)¶ Gformula for timetoevent data where the exposure is fixed at baseline. Only supports binary exposures and outcomes. Outcomes are predicted using a logistic regression model. Input data set should be in a long format, where each row corresponds to an individual observed for one unit of time
Key options for treatments:
 ‘all’ all individuals are given treatment
 ‘none’ no individuals are given treatment
 custom treatments create a custom treatment. When specifying this, the dataframe must be referred to as ‘g’.
The following is an example that selects those whose age is 30 or younger and are females:
treatment="((g['age0']<=30) & (g['male']==0))
Parameters:  df (DataFrame) – Pandas dataframe containing the variables of interest
 idvar (str) – Column name for the ID label
 exposure (str, list) – Column name for exposure variable label or a list of disjoint indicator exposures
 outcome (str) – Column name for outcome variable
 time (str) – Column name for time variable
 weights (str, optional) – Column name for weights. Default is None, which assumes every observations has the same weight (i.e. 1)
Note
Custom treatments use a “magicg” parameter. Internally, the gformula implementation names the data set as g. Therefore, when using custom treatment specifications, the data set must be referred to as g when following the pandas selection syntax
Examples
Setting up data in long format
>>> from zepid import load_sample_data >>> from zepid.causal.gformula import SurvivalGFormula >>> import matplotlib.pyplot as plt >>> df = load_sample_data(False).drop(columns=['cd4_wk45'])
>>> df['t'] = np.round(df['t']).astype(int) >>> df = pd.DataFrame(np.repeat(df.values, df['t'], axis=0), columns=df.columns) >>> df['t'] = df.groupby('id')['t'].cumcount() + 1 >>> df.loc[((df['dead'] == 1) & (df['id'] != df['id'].shift(1))), 'd'] = 1 >>> df['d'] = df['d'].fillna(0) >>> df['t_sq'] = df['t']**2 >>> df['t_cu'] = df['t']**3
Estimating the timetoevent mean effect under treatall plan
>>> sgf = SurvivalGFormula(df.drop(columns=['dead']), idvar='id', exposure='art', outcome='d', time='t') >>> sgf.outcome_model(model='art + male + age0 + cd40 + dvl0 + t + t_sq + t_cu') >>> sgf.fit(treatment='all') >>> print(sgf.marginal_outcome)
Plotting cumulative incidence function
>>> sgf.plot(color='r') >>> plt.show()
Estimating the timetoevent mean effect under treatnone plan
>>> sgf = SurvivalGFormula(df.drop(columns=['dead']), idvar='id', exposure='art', outcome='d', time='t') >>> sgf.outcome_model(model='art + male + age0 + cd40 + dvl0 + t + t_sq + t_cu') >>> sgf.fit(treatment='none')
Estimating the timetoevent mean effect under custom treatment plan
>>> sgf = SurvivalGFormula(df.drop(columns=['dead']), idvar='id', exposure='art', outcome='d', time='t') >>> sgf.outcome_model(model='art + male + age0 + cd40 + dvl0 + t + t_sq + t_cu') >>> sgf.fit(treatment="((g['age0']>=25) & (g['male']==0))")
Notes
The following process is used to estimate the cumulative incidence function. (1) A pooled logistic regression model is fit to the data. The model should predict the outcome conditional on treatment, baseline confounders, and time. Time should be modeled using flexible functional forms (e.g. splines) (2) Survival probabilities are estimated by predicting values at each time from the pooled logistic model and taking the cumulative product. The survival probabilities are predicted under the treatment plan of interest (3) Average the cumulative incidence function for each time period from all the subjects.
References
Hernán MA. (2010). The hazards of hazard ratios. Epidemiology, 21(1), 13–15. doi:10.1097/EDE.0b013e3181c1ea43

__init__
(df, idvar, exposure, outcome, time, weights=None)¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
(df, idvar, exposure, outcome, time)Initialize self. fit
(treatment)Fit the parametric gformula for timetoevent data. outcome_model
(model[, print_results])Build the pooled logistic model. plot
(**plot_kwargs)Plots the estimated cumulative incidence function