zepid.superlearner.estimators.StepwiseSL¶

class
zepid.superlearner.estimators.
StepwiseSL
(family, selection='backward', order_interaction=0, verbose=False)¶ Stepwise model selection for Generalized Linear Model selection for use with SuperLearner. Briefly, each combination of models is compared by AIC with the best one selected. The model selection procedure continues until there are no improvements in the model by AIC. The optimal is the best model estimated by the stepwise selection procedure and the lowest AIC value.
Parameters:  family (statsmodels.families.family) – Family to use for the model. All statsmodels supported families are also supported
 selection (str, optional) – Method of stepwise selection to use. Options are ‘forward’ and ‘backward’. Default is backward, which starts from the full model inclusion and removes terms one at a time.
 order_interaction (int, optional) – Order of interactions to explore. For example, interaction_order=0 explores only the main effects.
 verbose (bool, optional) –
Examples
Setup the environment and data set
>>> import statsmodels.api as sm >>> from zepid import load_sample_data >>> from zepid.superlearner import StepwiseSL >>> df = load_sample_data(False).dropna() >>> X = np.asarray(df[['art', 'male', 'age0']]) >>> y = np.asarray(df['dead'])
StepwiseSL estimation with no interactions
>>> f = sm.families.family.Binomial() >>> step_sl = StepwiseSL(family=f, method="backward", order_interaction=0) >>> step_sl.fit(X, y)
StepwiseSL prediction
>>> step_sl.predict(X=X)
StepwiseSL with all firstorder interactions
>>> step_sl = StepwiseSL(family=f, method="backward", order_interaction=1) >>> step_sl.fit(X, y)
StepwiseSL with forward selection and all secondorder interactions
>>> step_sl = StepwiseSL(family=f, method="forward", order_interaction=2) >>> step_sl.fit(X, y)
Methods
fit
(X, y)Estimate the optimal GLM predict
(X)Predict using the optimal GLM, where optimal is defined as the lowest AIC for the stepwise selection procedure used. 
fit
(X, y)¶ Estimate the optimal GLM
Parameters:  X (numpy.array) – Training data
 y (numpy.array) – Target values
Returns: Return type: None

get_params
(deep=True)¶ For sklearn.base.clone() compatibility

predict
(X)¶ Predict using the optimal GLM, where optimal is defined as the lowest AIC for the stepwise selection procedure used.
Parameters: X (numpy.array) – Samples following the same pattern as the X array input into the fit() statement. All order_interaction terms are created in this step for the input X (i.e. the user does not need to create any of the xorder interaction terms) Returns: Return type: Returns predicted values from the optimal GLM

set_params
(**parameters)¶ For sklearn.base.clone() compatibility