zepid.causal.generalize.estimators.AIPSW

class zepid.causal.generalize.estimators.AIPSW(df, exposure, outcome, selection, generalize=True, weights=None)

Doubly robust estimator for generalizability. I haven’t found a good name for it in the literature yet, so I am naming it augmented-IPSW (in honor of other doubly robust estimators like AIPTW and AIPMW).

The process of estimating AIPSW follows other doubly robust estimators. We need to specify both the IPSW model and the g-transport model. From this information, the AIPSW is calculated via the following

\[\psi = \frac{1}{n} \sum \left(E[Y|A=a,L,S=1] + \frac{I(S=1, A=a)}{\Pr(S=1|L)} (Y - E[Y|A=a,L,S=1])\right)\]

For transportability problems, AIPSW takes the following form

\[\psi = \frac{\sum IPSW\times I(S=1, A=a)(Y - E[Y|A=a,L,S=1]) + (1-S)E[Y|A=a,L,S=1]}{\Pr(S=0)}\]

For generalizability, we first fit a Q-model predicting the outcome as a function of the treatment and any modifiers (along with confounders if in observation data). Next we calculate IPSW (with IPTW if there is any confounders). Afterwards, we predict the potential outcomes for the entire population (S=1 and S=0). We then use the above formula to calculate the marginal effect

A similar process is done for transportability. Instead we merge g-transport and inverse odds of sampling weights

Confidence intervals should be obtained by using a non-parametric bootstrapping procedure

Parameters:
  • df (DataFrame) – Pandas dataframe containing all variables required for generalization/transportation. Should include all features related to sample selection, indicator for selection into the sample, and treatment/outcome information for the sample (selection == 1)
  • exposure (str) – Column label for exposure/treatment of interest. Can be nan for all those not in sample. Only binary exposures are currently supported
  • outcome (str) – Column label for outcome of interest. Can be nan for all those not in sample
  • selection (str) – Column label for indicator of selection into the sample. Should be 1 if individual comes from the study sample and 0 if individual is from random sample of source population
  • generalize (bool, optional) – Whether the problem is a generalizability (True) problem or a transportability (False) problem. See notes for further details on the difference between the two estimation methods
  • weights (None, str, optional) – Optional argument for weights. Can be used to input inverse probability of missing weights

Note

There are two related concepts; generalizability and transportability. Generalizability is when your study sample is part of your target population. For example, you want to generalize results from California to the entire United States. Transportability is when your study sample is not part of your target population. For example, we want to apply our results from California to Canada. Depending on the scenario, how the marginal risk difference is calculated is slightly different. AIPSW allows for both of these problems

Examples

Setting up the environment

>>> from zepid import load_generalize_data
>>> from zepid.causal.generalize import AIPSW
>>> df = load_generalize_data(False)

Generalizability for RCT

>>> aipw = AIPSW(df, exposure='A', outcome='Y', selection='S', generalize=True)
>>> aipw.sampling_model('L + W_sq', print_results=False)
>>> aipw.outcome_model('A + L + L:A + W + W:A + W:A:L', print_results=False)
>>> aipw.fit()
>>> aipw.summary()

Transportability for RCT

>>> aipw = AIPSW(df, exposure='A', outcome='Y', selection='S', generalize=False)
>>> aipw.sampling_model('L + W_sq', print_results=False)
>>> aipw.outcome_model('A + L + L:A + W + W:A + W:A:L', print_results=False)
>>> aipw.fit()
>>> aipw.summary()

Transportability for Observational study

>>> df = load_generalize_data(True)
>>> aipw = AIPSW(df, exposure='A', outcome='Y', selection='S', generalize=False)
>>> aipw.sampling_model('L + W_sq', print_results=False)
>>> aipw.treatment_model('L', print_results=False)
>>> aipw.outcome_model('A + L + L:A + W + W:A + W:A:L', print_results=False)
>>> aipw.fit()
>>> aipw.summary()

References

Dahabreh IJ, Robertson SE, Stuart EA, Hernan MA (2018). Transporting inferences from a randomized trial to a new target population. arXiv preprint arXiv:1805.00550.

Dahabreh IJ, Hernan MA, Robertson SE, Buchanan A, Steingrimsson JA. (2019). Generalizing trial findings in nested trial designs with sub-sampling of non-randomized individuals. arXiv preprint arXiv:1902.06080.

__init__(df, exposure, outcome, selection, generalize=True, weights=None)

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(df, exposure, outcome, selection[, …]) Initialize self.
fit() Uses AIPSW formula to obtain the risk difference and risk ratio from the sample.
outcome_model(model[, outcome_type, …]) Build the g-transport model for the outcome.
sampling_model(model_denominator[, …]) Logistic regression model(s) for estimating IPSW.
summary([decimal]) Prints a summary of the results for the AIPSW estimator
treatment_model(model_denominator[, …]) Logistic regression model(s) for estimating inverse probability of treatment weights (IPTW).