zepid.base.spline(df, var, n_knots=3, knots=None, term=1, restricted=False)

Creates spline dummy variables based on either user specified knot locations or automatically determines knot locations based on percentiles. Options are available to set the number of knots, location of knots (value), term (linear, quadratic, etc.), and restricted/unrestricted.

  • df (DataFrame) – Pandas dataframe containing the variables of interest
  • var (string) – Continuous variable to generate spline for
  • n_knots (integer, optional) – Number of knots requested. Options for knots include any positive integer if the location of knots are specified. If knot locations are not specified, n_knots must be an integer between 1 to 7. Default is 3 knots
  • knots (list, optional) – Location of specified knots in a list. To specify the location of knots, put desired numbers for knots into a list. Be sure that the length of the list is the same as the specified number of knots. Default is None, so that the function will automatically determine knot locations without user specification
  • term (integer, float, optional) – High order term for the spline terms. To calculate a quadratic spline change to 2, cubic spline change to 3, etc. Default is 1, i.e. a linear spline
  • restricted (bool, optional) – Whether to return a restricted spline. Note that the restricted spline returns one less column than the number of knots. An unrestricted spline returns the same number of columns as the number of knots. Default is False, providing an unrestricted spline

Returns a pandas dataframe containing the spline variables (labeled 0 to n_knots)

Return type:



Example of output

       rspline0     rspline1   rspline2
0   9839.409066  1234.154601   2.785600
1    446.391437     0.000000   0.000000
2   7107.550298   409.780251   0.000000
3   4465.272901     7.614501   0.000000
4  10972.041543  1655.208555  52.167821
..          ...          ...        ...


Calculate unrestricted linear spline with 3 automatic knots

>>> from zepid import spline, load_sample_data
>>> df = load_sample_data(False)
>>> spline(df, var='cd40', n_knots=3)

Calculate unrestricted quadratic spline with 3 automatic knots

>>> spline(df, var='cd40', n_knots=3, term=2)

Calculate restricted linear spline with 3 automatic knots

>>> spline(df, var='cd40', n_knot=3, restricted=True)

Calculate unrestricted linear spline with 3 specified knots

>>> spline(df, var='cd40', n_knots=3, knots=[200, 250, 750])

Calculate restricted cubic spline with 5 automatic knots

>>> spline(df, var='cd40', n_knots=5, term=3, restricted=True)