zepid.causal.generalize.estimators.GTransportFormula

class zepid.causal.generalize.estimators.GTransportFormula(df, exposure, outcome, selection, outcome_type='binary', generalize=True, weights=None)

Calculate the g-transport-formula using a observed study sample and a sample from the target population. Broadly, the process for fitting the g-transport-formula is similar to the g-formula (as implemented in TimeFixedGFormula). Instead of predicting the potential outcomes of only the sample, the g-transport-formula predicts potential outcomes for the full target population

For generalizability, we first fit a Q-model predicting the outcome as a function of the treatment and any modifiers (along with confounders if in observation data). Afterwards, we predict the potential outcomes for the entire population (S=1 and S=0). To obtain the marginal effect measure, we take the mean of the entire population (S=1 and S=0)

For transportability, we similarly fit a Q-model in the observed sample and generate predictions for the entire sample. However, for transportability our sample is not part of the target population. Therefore, we only take the marginal of the S=0 group.

Confidence intervals should be obtained by using a non-parametric bootstrapping procedure

Parameters:
  • df (DataFrame) – Pandas dataframe containing all variables required for generalization/transportation. Should include all features related to sample selection, indicator for selection into the sample, and treatment/outcome information for the sample (selection == 1)
  • exposure (str) – Column label for exposure/treatment of interest. Can be nan for all those not in sample. Only binary exposures are currently supported
  • outcome (str) – Column label for outcome of interest. Can be nan for all those not in sample
  • selection (str) – Column label for indicator of selection into the sample. Should be 1 if individual comes from the study sample and 0 if individual is from random sample of source population
  • outcome_type (str, optional) – Outcome variable type. Currently only ‘binary’, ‘normal’, and ‘poisson variable types are supported
  • generalize (bool, optional) – Whether the problem is a generalizability (True) problem or a transportability (False) problem. See notes for further details on the difference between the two estimation methods
  • weights (None, str, optional) – Optional argument for weights. Can be used to input inverse probability of missing weights

Note

There are two related concepts; generalizability and transportability. Generalizability is when your study sample is part of your target population. For example, you want to generalize results from California to the entire United States. Transportability is when your study sample is not part of your target population. For example, we want to apply our results from California to Canada. Depending on the scenario, how the marginal risk difference is calculated is slightly different. GTransportFormula allows for both of these problems

Examples

Setting up the environment

>>> from zepid import load_generalize_data
>>> from zepid.causal.generalize import GTransportFormula
>>> df = load_generalize_data(False)

Generalizability

>>> gtf = GTransportFormula(df, exposure='A', outcome='Y', selection='S', generalize=True)
>>> gtf.outcome_model('A + L + L:A + W + W:A + W:A:L')
>>> gtf.fit()
>>> gtf.summary()

Transportability

>>> gtf = GTransportFormula(df, exposure='A', outcome='Y', selection='S', generalize=False)
>>> gtf.outcome_model('A + L + L:A + W + W:A + W:A:L')
>>> gtf.fit()
>>> gtf.summary()

For observational studies, confounders should be included in the Q-model

References

Lesko CR, Buchanan AL, Westreich D, Edwards JK, Hudgens MG, & Cole SR. (2017). Generalizing study results: a potential outcomes perspective. Epidemiology (Cambridge, Mass.), 28(4), 553.

Dahabreh IJ, Robertson SE, Stuart EA, Hernan MA (2018). Transporting inferences from a randomized trial to a new target population. arXiv preprint arXiv:1805.00550.

__init__(df, exposure, outcome, selection, outcome_type='binary', generalize=True, weights=None)

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(df, exposure, outcome, selection[, …]) Initialize self.
fit() Uses the g-transport formula to obtain the risk difference and risk ratio from the sample.
outcome_model(model[, print_results]) Build the model for the outcome.
summary([decimal]) Prints a summary of the results for the g-transport estimator