zepid.causal.generalize.estimators.IPSW

class zepid.causal.generalize.estimators.IPSW(df, exposure, outcome, selection, generalize=True, weights=None)

Calculate inverse probability of sampling weights through logistic regression. Inverse probability of sampling weights are an extension of inverse probability weights to allow for the generalizability or the transportability of results.

For generalizability, inverse probability of sampling weights take the following form

\[IPSW = \frac{1}{\Pr(S=1|W)}\]

where W is all the factors related to the sample selection process

For transportability, the inverse probability of sampling weights are actually inverse odds of sampling weights. They take the following form

\[IPSW = \frac{\Pr(S=0|W}{\Pr(S=1|W)}\]

Confidence intervals should be obtained by using a non-parametric bootstrapping procedure

Parameters:
  • df (DataFrame) – Pandas dataframe containing all variables required for generalization/transportation. Should include all features related to sample selection, indicator for selection into the sample, and treatment/outcome information for the sample (selection == 1)
  • exposure (str) – Column label for exposure/treatment of interest. Can be nan for all those not in sample
  • outcome (str) – Column label for outcome of interest. Can be nan for all those not in sample
  • selection (str) – Column label for indicator of selection into the sample. Should be 1 if individual comes from the study sample and 0 if individual is from random sample of source population
  • generalize (bool, optional) – Whether the problem is a generalizability (True) problem or a transportability (False) problem. See notes for further details on the difference between the two estimation methods
  • weights (None, str, optional) – For conditionally randomized trials, or observational research, inverse probability of treatment weights can be used to adjust for confounding. Before estimating the effect measures, this weight vector and the IPSW are multiplied to calculate new weights When weights is None, the data is assumed to come from a randomized trial, and does not need to be adjusted

Note

There are two related concepts; generalizability and transportability. Generalizability is when your study sample is part of your target population. For example, you want to generalize results from California to the entire United States. Transportability is when your study sample is not part of your target population. For example, we want to apply our results from California to Canada. Depending on the scenario, the IPSW have slightly different forms. IPSW allows for both of these problems

Examples

Setting up the environment

>>> from zepid import load_generalize_data
>>> from zepid.causal.generalize import IPSW
>>> df = load_generalize_data(False)

Generalizability for RCT results

>>> ipsw = IPSW(df, exposure='A', outcome='Y', selection='S', generalize=True)
>>> ipsw.sampling_model('L + W + L:W', print_results=False)
>>> ipsw.fit()
>>> ipsw.summary()

Transportability for RCT results

>>> ipsw = IPSW(df, exposure='A', outcome='Y', selection='S', generalize=False)
>>> ipsw.sampling_model('L + W + L:W', print_results=False)
>>> ipsw.fit()
>>> ipsw.summary()

For observational studies, IPTW can be used to account for confounders via the treatment_model() function

>>> ipsw = IPSW(df, exposure='A', outcome='Y', selection='S')
>>> ipsw.sampling_model('L + W + L:W', print_results=False)
>>> ipsw.treatment_model('L', print_results=False)
>>> ipsw.fit()
>>> ipsw.summary()

References

Lesko CR, Buchanan AL, Westreich D, Edwards JK, Hudgens MG, & Cole SR. (2017). Generalizing study results: a potential outcomes perspective. Epidemiology (Cambridge, Mass.), 28(4), 553.

Westreich D, Edwards JK, Lesko CR, Stuart E, & Cole SR. (2017). Transportability of trial results using inverse odds of sampling weights. AJE, 186(8), 1010-1014.

Dahabreh IJ, Robertson SE, Stuart EA, Hernan MA (2018). Transporting inferences from a randomized trial to a new target population. arXiv preprint arXiv:1805.00550.

__init__(df, exposure, outcome, selection, generalize=True, weights=None)

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(df, exposure, outcome, selection[, …]) Initialize self.
fit() Uses the calculated IPSW to obtain the risk difference and risk ratio from the sample.
sampling_model(model_denominator[, …]) Logistic regression model(s) for estimating sampling weights.
summary([decimal]) Prints a summary of the results for the IPSW estimator
treatment_model(model_denominator[, …]) Logistic regression model(s) for estimating inverse probability of treatment weights (IPTW).