zepid.superlearner.estimators.StepwiseSL

class zepid.superlearner.estimators.StepwiseSL(family, selection='backward', order_interaction=0, verbose=False)

Step-wise model selection for Generalized Linear Model selection for use with SuperLearner. Briefly, each combination of models is compared by AIC with the best one selected. The model selection procedure continues until there are no improvements in the model by AIC. The optimal is the best model estimated by the step-wise selection procedure and the lowest AIC value.

Parameters:
  • family (statsmodels.families.family) – Family to use for the model. All statsmodels supported families are also supported
  • selection (str, optional) – Method of step-wise selection to use. Options are ‘forward’ and ‘backward’. Default is backward, which starts from the full model inclusion and removes terms one at a time.
  • order_interaction (int, optional) – Order of interactions to explore. For example, interaction_order=0 explores only the main effects.
  • verbose (bool, optional) –

Examples

Setup the environment and data set

>>> import statsmodels.api as sm
>>> from zepid import load_sample_data
>>> from zepid.superlearner import StepwiseSL
>>> df = load_sample_data(False).dropna()
>>> X = np.asarray(df[['art', 'male', 'age0']])
>>> y = np.asarray(df['dead'])

StepwiseSL estimation with no interactions

>>> f = sm.families.family.Binomial()
>>> step_sl = StepwiseSL(family=f, method="backward", order_interaction=0)
>>> step_sl.fit(X, y)

StepwiseSL prediction

>>> step_sl.predict(X=X)

StepwiseSL with all first-order interactions

>>> step_sl = StepwiseSL(family=f, method="backward", order_interaction=1)
>>> step_sl.fit(X, y)

StepwiseSL with forward selection and all second-order interactions

>>> step_sl = StepwiseSL(family=f, method="forward", order_interaction=2)
>>> step_sl.fit(X, y)

Methods

fit(X, y) Estimate the optimal GLM
predict(X) Predict using the optimal GLM, where optimal is defined as the lowest AIC for the step-wise selection procedure used.
fit(X, y)

Estimate the optimal GLM

Parameters:
  • X (numpy.array) – Training data
  • y (numpy.array) – Target values
Returns:

Return type:

None

get_params(deep=True)

For sklearn.base.clone() compatibility

predict(X)

Predict using the optimal GLM, where optimal is defined as the lowest AIC for the step-wise selection procedure used.

Parameters:X (numpy.array) – Samples following the same pattern as the X array input into the fit() statement. All order_interaction terms are created in this step for the input X (i.e. the user does not need to create any of the x-order interaction terms)
Returns:
Return type:Returns predicted values from the optimal GLM
set_params(**parameters)

For sklearn.base.clone() compatibility