zepid.base.spline¶

zepid.base.
spline
(df, var, n_knots=3, knots=None, term=1, restricted=False)¶ Creates spline dummy variables based on either user specified knot locations or automatically determines knot locations based on percentiles. Options are available to set the number of knots, location of knots (value), term (linear, quadratic, etc.), and restricted/unrestricted.
Parameters:  df (DataFrame) – Pandas dataframe containing the variables of interest
 var (string) – Continuous variable to generate spline for
 n_knots (integer, optional) – Number of knots requested. Options for knots include any positive integer if the location of knots are specified. If knot locations are not specified, n_knots must be an integer between 1 to 7. Default is 3 knots
 knots (list, optional) – Location of specified knots in a list. To specify the location of knots, put desired numbers for knots into a list. Be sure that the length of the list is the same as the specified number of knots. Default is None, so that the function will automatically determine knot locations without user specification
 term (integer, float, optional) – High order term for the spline terms. To calculate a quadratic spline change to 2, cubic spline change to 3, etc. Default is 1, i.e. a linear spline
 restricted (bool, optional) – Whether to return a restricted spline. Note that the restricted spline returns one less column than the number of knots. An unrestricted spline returns the same number of columns as the number of knots. Default is False, providing an unrestricted spline
Returns: Returns a pandas dataframe containing the spline variables (labeled 0 to n_knots)
Return type: pd.DataFrame
Notes
Example of output
rspline0 rspline1 rspline2 0 9839.409066 1234.154601 2.785600 1 446.391437 0.000000 0.000000 2 7107.550298 409.780251 0.000000 3 4465.272901 7.614501 0.000000 4 10972.041543 1655.208555 52.167821 .. ... ... ...
Examples
Calculate unrestricted linear spline with 3 automatic knots
>>> from zepid import spline, load_sample_data >>> df = load_sample_data(False) >>> spline(df, var='cd40', n_knots=3)
Calculate unrestricted quadratic spline with 3 automatic knots
>>> spline(df, var='cd40', n_knots=3, term=2)
Calculate restricted linear spline with 3 automatic knots
>>> spline(df, var='cd40', n_knot=3, restricted=True)
Calculate unrestricted linear spline with 3 specified knots
>>> spline(df, var='cd40', n_knots=3, knots=[200, 250, 750])
Calculate restricted cubic spline with 5 automatic knots
>>> spline(df, var='cd40', n_knots=5, term=3, restricted=True)