zepid.causal.snm.g_estimation.GEstimationSNM

class zepid.causal.snm.g_estimation.GEstimationSNM(df, exposure, outcome, weights=None)

G-estimation for structural nested mean models. G-estimation is distinct from the other g-methods (inverse probability weights and g-formula) in the parameter it estimates. Rather than estimating the average causal effect of treating everyone versus treating no one, g-estimation estimates the average causal effect within strata of L. It does this by specifying a structural nested model. The structural nested mean model looks like the following for additive effects

\[E[Y^a |A=a, V] - E[Y^{a=0}|A=a, V] = \psi a + \psi a*V\]

There are two items to note in the structural nested model; (1) there is no intercept or term for L, and (2) we need the potential outcomes to solve for psi. The first item means that we are estimating fewer parameters, making g-estimation less susceptible to model misspecification than the g-formula. The second means we cannot solve the above equation directly.

Under the assumption of conditional exchangeability, we can solve for psi using another equation. Specifically, we can work to solve the following model

\[logit(\Pr(A=1|Y^{a=0}, L)) = alpha + alpha Y^{a=0} + alpha Y{a=0} V + alpha L\]

Under the assumption of conditional exchangeability, the alpha term for the potential outcome Y should be equal to zero! Therefore, we need to find the value of psi that results in that alpha term equaling zero. For the additive model, we can solve for psi in the first equation by

\[H(\psi) = Y - (\psi A + \psi A L)\]

meaning we solve for when alpha is approximately zero under

\[logit(\Pr(A=1|Y^{a=0}, L)) = alpha + alpha H(\psi) + alpha H(\psi) V + alpha L\]

To find the values for the psi’s where the alpha for those terms is approximately zero, we have two options; (1) grid-search or (2) closed form. The closed form is ultimately faster since we are only required to do some basic matrix manipulation to solve. For the grid search, we need to search across the potential values that minimize the values of alphas. We use SciPy’s Nelder-Mead optimization procedure for the heavy lifting.

Parameters:
  • df (DataFrame) – Pandas DataFrame object containing all variables of interest
  • exposure (str) – Column name of the exposure variable. Currently only binary is supported
  • outcome (str) – Column name of the outcome variable. Either continuous or binary outcomes are supported
  • weights – Column name of weights. Weights allow for items like sampling weights, missing weights, and censoring weights to estimate effects

Notes

Similar to marginal structural models, g-estimation cannot inherently account for missing at random data. To account for missing outcome data, inverse probability of missing weights should be used

The grid-search approach does allow for some unique sensitivity analyses that are not incorporated into the closed-form. Specifically, we can imagine that there is some unobserved confounding. With unobserved confounding, we know that the alpha value will not exactly equal zero. We can optimize for slightly different alphas to see how sensitive our results are to some assumptions regarding unobserved confounding. For further details on translating unobserved confounding to alpha values, see Scharfstein et al. 1999 in the references

If you continuous variable takes on large values, you may see the closed-form and grid-search start to diverge in results. This is because of the tolerance value. If you have large outcome values, I recommend rescaling them to prevent any issues with the grid-search

Examples

Set up the environment and the data set

>>> from zepid import load_sample_data, spline
>>> from zepid.causal.snm import GEstimationSNM
>>> df = load_sample_data(timevary=False).drop(columns=['dead'])
>>> df[['cd4_rs1','cd4_rs2']] = spline(df,'cd40',n_knots=3,term=2,restricted=True)
>>> df[['age_rs1','age_rs2']] = spline(df,'age0',n_knots=3,term=2,restricted=True)

One-parameter structural nested mean model via closed-form solution

>>> snm = GEstimationSNM(df, exposure='art', outcome='cd4_wk45')
>>> snm.exposure_model('male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0')
>>> snm.structural_nested_model(model='art')
>>> snm.fit()
>>> snm.summary()

One-parameter structural nested mean model via grid-search

>>> snm = GEstimationSNM(df, exposure='art', outcome='cd4_wk45')
>>> snm.exposure_model('male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0')
>>> snm.structural_nested_model(model='art')
>>> snm.fit(solver='search')

One-parameter structural nested mean model via grid-search with different alphas

>>> snm = GEstimationSNM(df, exposure='art', outcome='cd4_wk45')
>>> snm.exposure_model('male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0')
>>> snm.structural_nested_model(model='art')
>>> snm.fit(solver='search', alpha_value=0.03)

Two-parameter structural nested mean model via closed-form

>>> snm = GEstimationSNM(df, exposure='art', outcome='cd4_wk45')
>>> snm.exposure_model('male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0')
>>> snm.structural_nested_model(model='art + art:male')
>>> snm.fit()

Two-parameter structural nested mean model via grid-search and starting values

>>> snm = GEstimationSNM(df, exposure='art', outcome='cd4_wk45')
>>> snm.exposure_model('male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0')
>>> snm.structural_nested_model(model='art + art:male')
>>> snm.fit(solver='search', starting_value=[-0.05, 0.0])

References

Naimi AI, Cole SR, Kennedy EH. (2017). An introduction to g methods. International journal of epidemiology, 46(2), 756-762.

Robins JM. (2000). Marginal structural models versus structural nested models as tools for causal inference. In Statistical models in epidemiology, the environment, and clinical trials (pp. 95-133). Springer, New York, NY.

Vansteelandt S, Joffe M. (2014). Structural nested models and G-estimation: the partially realized promise. Statistical Science, 29(4), 707-731.

Wallace MP, Moodie EE, Stephens DA. (2017). An R package for G-estimation of structural nested mean models. Epidemiology, 28(2), e18-e20.

Scharfstein DO, Rotnitzky A, Robins JM. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. Journal of the American Statistical Association, 94(448), 1096-1120.

__init__(df, exposure, outcome, weights=None)

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(df, exposure, outcome[, weights]) Initialize self.
exposure_model(model[, print_results]) Specify the treatment model to satisfy conditional exchangeability.
fit([solver, starting_value, alpha_value, …]) Using the treatment model and the format of the structural nested mean model, the solutions for psi are calculated.
missing_model(model_denominator[, …]) Estimation of Pr(M=0|A=a,L), which is the missing data mechanism for the outcome.
structural_nested_model(model) Specify the structural nested mean model to fit.
summary([decimal]) Summary of results