zepid.causal.snm.g_estimation.GEstimationSNM¶

class
zepid.causal.snm.g_estimation.
GEstimationSNM
(df, exposure, outcome, weights=None)¶ Gestimation for structural nested mean models. Gestimation is distinct from the other gmethods (inverse probability weights and gformula) in the parameter it estimates. Rather than estimating the average causal effect of treating everyone versus treating no one, gestimation estimates the average causal effect within strata of L. It does this by specifying a structural nested model. The structural nested mean model looks like the following for additive effects
\[E[Y^a A=a, V]  E[Y^{a=0}A=a, V] = \psi a + \psi a*V\]There are two items to note in the structural nested model; (1) there is no intercept or term for L, and (2) we need the potential outcomes to solve for psi. The first item means that we are estimating fewer parameters, making gestimation less susceptible to model misspecification than the gformula. The second means we cannot solve the above equation directly.
Under the assumption of conditional exchangeability, we can solve for psi using another equation. Specifically, we can work to solve the following model
\[logit(\Pr(A=1Y^{a=0}, L)) = alpha + alpha Y^{a=0} + alpha Y{a=0} V + alpha L\]Under the assumption of conditional exchangeability, the alpha term for the potential outcome Y should be equal to zero! Therefore, we need to find the value of psi that results in that alpha term equaling zero. For the additive model, we can solve for psi in the first equation by
\[H(\psi) = Y  (\psi A + \psi A L)\]meaning we solve for when alpha is approximately zero under
\[logit(\Pr(A=1Y^{a=0}, L)) = alpha + alpha H(\psi) + alpha H(\psi) V + alpha L\]To find the values for the psi’s where the alpha for those terms is approximately zero, we have two options; (1) gridsearch or (2) closed form. The closed form is ultimately faster since we are only required to do some basic matrix manipulation to solve. For the grid search, we need to search across the potential values that minimize the values of alphas. We use SciPy’s NelderMead optimization procedure for the heavy lifting.
Parameters:  df (DataFrame) – Pandas DataFrame object containing all variables of interest
 exposure (str) – Column name of the exposure variable. Currently only binary is supported
 outcome (str) – Column name of the outcome variable. Either continuous or binary outcomes are supported
 weights – Column name of weights. Weights allow for items like sampling weights, missing weights, and censoring weights to estimate effects
Notes
Similar to marginal structural models, gestimation cannot inherently account for missing at random data. To account for missing outcome data, inverse probability of missing weights should be used
The gridsearch approach does allow for some unique sensitivity analyses that are not incorporated into the closedform. Specifically, we can imagine that there is some unobserved confounding. With unobserved confounding, we know that the alpha value will not exactly equal zero. We can optimize for slightly different alphas to see how sensitive our results are to some assumptions regarding unobserved confounding. For further details on translating unobserved confounding to alpha values, see Scharfstein et al. 1999 in the references
If you continuous variable takes on large values, you may see the closedform and gridsearch start to diverge in results. This is because of the tolerance value. If you have large outcome values, I recommend rescaling them to prevent any issues with the gridsearch
Examples
Set up the environment and the data set
>>> from zepid import load_sample_data, spline >>> from zepid.causal.snm import GEstimationSNM >>> df = load_sample_data(timevary=False).drop(columns=['dead']) >>> df[['cd4_rs1','cd4_rs2']] = spline(df,'cd40',n_knots=3,term=2,restricted=True) >>> df[['age_rs1','age_rs2']] = spline(df,'age0',n_knots=3,term=2,restricted=True)
Oneparameter structural nested mean model via closedform solution
>>> snm = GEstimationSNM(df, exposure='art', outcome='cd4_wk45') >>> snm.exposure_model('male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0') >>> snm.structural_nested_model(model='art') >>> snm.fit() >>> snm.summary()
Oneparameter structural nested mean model via gridsearch
>>> snm = GEstimationSNM(df, exposure='art', outcome='cd4_wk45') >>> snm.exposure_model('male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0') >>> snm.structural_nested_model(model='art') >>> snm.fit(solver='search')
Oneparameter structural nested mean model via gridsearch with different alphas
>>> snm = GEstimationSNM(df, exposure='art', outcome='cd4_wk45') >>> snm.exposure_model('male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0') >>> snm.structural_nested_model(model='art') >>> snm.fit(solver='search', alpha_value=0.03)
Twoparameter structural nested mean model via closedform
>>> snm = GEstimationSNM(df, exposure='art', outcome='cd4_wk45') >>> snm.exposure_model('male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0') >>> snm.structural_nested_model(model='art + art:male') >>> snm.fit()
Twoparameter structural nested mean model via gridsearch and starting values
>>> snm = GEstimationSNM(df, exposure='art', outcome='cd4_wk45') >>> snm.exposure_model('male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0') >>> snm.structural_nested_model(model='art + art:male') >>> snm.fit(solver='search', starting_value=[0.05, 0.0])
References
Naimi AI, Cole SR, Kennedy EH. (2017). An introduction to g methods. International journal of epidemiology, 46(2), 756762.
Robins JM. (2000). Marginal structural models versus structural nested models as tools for causal inference. In Statistical models in epidemiology, the environment, and clinical trials (pp. 95133). Springer, New York, NY.
Vansteelandt S, Joffe M. (2014). Structural nested models and Gestimation: the partially realized promise. Statistical Science, 29(4), 707731.
Wallace MP, Moodie EE, Stephens DA. (2017). An R package for Gestimation of structural nested mean models. Epidemiology, 28(2), e18e20.
Scharfstein DO, Rotnitzky A, Robins JM. (1999). Adjusting for nonignorable dropout using semiparametric nonresponse models. Journal of the American Statistical Association, 94(448), 10961120.

__init__
(df, exposure, outcome, weights=None)¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
(df, exposure, outcome[, weights])Initialize self. exposure_model
(model[, print_results])Specify the treatment model to satisfy conditional exchangeability. fit
([solver, starting_value, alpha_value, …])Using the treatment model and the format of the structural nested mean model, the solutions for psi are calculated. missing_model
(model_denominator[, …])Estimation of Pr(M=0A=a,L), which is the missing data mechanism for the outcome. structural_nested_model
(model)Specify the structural nested mean model to fit. summary
([decimal])Summary of results