zepid.causal.ipw.IPCW.IPCW¶

class
zepid.causal.ipw.IPCW.
IPCW
(df, idvar, time, event, flat_df=False, enter=None)¶ Calculates inverse probability of censoring weights. Note that this function will accept either a flat file (one row per individual) or a long format (multiple rows per individual). If a flat file is provided, it must be converted to a long format. This will be done automatically if flat_df=True. Additionally, a warning and some comparison statistics are provided. Please verify that they match. In general, it is recommended to convert the data set yourself
IPCW are calculated via logistic regression and weights are cumulative products per unique ID. IPCW can be used to correct for missing at random data by the generated model in weighted KaplanMeier curves. The formula used to generate the unstabilized IPCW is
\[\pi_i(t) = \prod_{R_k \le t} \frac{1}{\Pr(C_i > R_k  \bar{L} = \bar{l}, C_i > R_{k1})}\]The stabilized IPCW substitutes predicted probabilities under the specified numerator model into the numerator of the previous equation. In general, it is recommended to stabilize IPCW by the time.
\[\pi_i(t) = \prod_{R_k \le t} \frac{\Pr(C_i > R_k)}{\Pr(C_i > R_k  \bar{L} = \bar{l}, C_i > R_{k1})}\]Note
IPCW no longer support lateentry. The reason is that the pooled logistic regression model approach does not correctly accumulate the weights. As such, either all occurrences of lateentries need to be dropped (called the newuser design) or rows need to be backpropagated (unobserved rows are filled in). The second approach requires filling in the missing observed covariates and for timevarying variables will require imputation. The newuser design is a safer bet and generally what I will currently recommend
Parameters:  df (DataFrame) – Pandas DataFrame object containing all the variables of interest
 idvar (str) – String that indicates the column name for a unique identifier for each individual
 time (str) – Column name for the ending observation time
 event (str) – Column name for the event of interest
 flat_df (bool, optional) – Whether the input dataframe only contains a single row per participant. If so, the flat dataframe is converted to a long dataframe. Default is False (for multiple rows per person)
 enter (str, optional) – Time participant began being observed. Default is None. This option is only needed when flat_df=True. Lateentries are no longer supported and specifying this will lead to a ValueError
Example
Setting up the environment
>>> from zepid import load_sample_data >>> from zepid.causal.ipw import IPCW >>> df = load_sample_data(timevary=True) >>> df['enter_q'] = df['enter'] ** 2 >>> df['enter_c'] = df['enter'] ** 3 >>> df['age0_q'] = df['age0'] ** 2 >>> df['age0_c'] = df['age0'] ** 3
Calculating stabilized IPCW with a long data set
>>> ipc = IPCW(df, idvar='id', time='enter', event='dead') >>> ipc.regression_models(model_denominator='enter + enter_q + enter_c + male + age0 + age0_q + age0_c', >>> model_numerator='enter + enter_q + enter_c') >>> ipc.fit()
Extracting calculated stabilized IPCW
>>> ipc.Weight
Calculating stabilized IPCW with a wide data set
>>> df = load_sample_data(False) >>> ipc = IPCW(df, idvar='id', time='t', event='dead', flat_df=True) >>> ipc.regression_models(model_denominator='enter + enter_q + enter_c + male + age0 + age0_q + age0_c', >>> model_numerator='enter + enter_q + enter_c') >>> ipc.fit()
References
Howe CJ et al. (2016) Selection bias due to loss to follow up in cohort studies. Epidemiology, 27(1), 9197.

__init__
(df, idvar, time, event, flat_df=False, enter=None)¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
(df, idvar, time, event[, flat_df, …])Initialize self. fit
()Calculates IPCW for each observation period for each observation. regression_models
(model_denominator, …[, …])Regression model to generate predicted probabilities of censoring, conditional on specified variables.