Graphics

Below is documentation for each of the implemented graphic generators.

Data Diagnostics

functional_form_plot(df, outcome, var[, …]) Creates a functional form plot to aid in functional form assessment for continuous/discrete variables.
spaghetti_plot(df, idvar, variable, time) Create a spaghetti plot by an ID variable.
roc(df, true, threshold[, youden_index]) Generate a Receiver Operator Curve from true values and predicted probabilities.

Displaying Results

EffectMeasurePlot(label, effect_measure, …) Used to generate effect measure (AKA forest) plots.
pvalue_plot(point, sd[, color, fill, null, …]) Creates a plot of the p-value distribution based on a point estimate and standard deviation.
dynamic_risk_plot(risk_exposed, risk_unexposed) Creates a plot of how the risk difference or risk ratio changes over time with survival data.
labbe_plot([r1, r0, scale, additive_tuner, …]) L’Abbe plots are useful for summarizing measure modification on the difference or ratio scale.
zipper_plot(truth, lcl, ucl[, colors]) Zipper plots are a way to present simulation data, particularly confidence intervals and their width.
class zepid.graphics.graphics.EffectMeasurePlot(label, effect_measure, lcl, ucl)

Used to generate effect measure (AKA forest) plots. Estimates and confidence intervals are plotted in a diagram on the left and a table of the corresponding estimates is provided in the same plot. See the Graphics page on ReadTheDocs examples of the plots

Parameters:
  • label (list) – List of labels to use for y-axis
  • effect_measure (list) – List of numbers for point estimates to plot. If point estimate has trailing zeroes, input as a character object rather than a float
  • lcl (list) – List of numbers for upper confidence limits to plot. If point estimate has trailing zeroes, input as a character object rather than a float
  • ucl (list) – List of numbers for upper confidence limits to plot. If point estimate has trailing zeroes, input as a character object rather than a float

Examples

Setting up the data to plot

>>> from matplotlib.pyplot as plt
>>> from zepid.graphics import EffectMeasurePlot
>>> lab = ['One','Two']
>>> emm = [1.01,1.31]
>>> lcl = ['0.90',1.01]  # String allows for trailing zeroes in the table
>>> ucl = [1.11,1.53]

Setting up the plot, measure labels, and point colors

>>> x = EffectMeasurePlot(lab, emm, lcl, ucl)
>>> x.labels(effectmeasure='RR')  # Changing label of measure
>>> x.colors(pointcolor='r')  # Changing color of the points

Generating matplotlib axes object of forest plot

>>> x.plot(t_adjuster=0.13)
>>> plt.show()
colors(**kwargs)

Function to change colors and shapes.

Parameters:
  • errorbarcolor (string, optional) – Changes the error bar colors
  • linecolor (string, optional) – Changes the color of the reference line
  • pointcolor (string, optional) – Changes the color of the points
  • pointshape (string, optional) – Changes the shape of points
labels(**kwargs)

Function to change the labels of the outputted table. Additionally, the scale and reference value can be changed.

Parameters:
  • effectmeasure (string, optional) – Changes the effect measure label
  • conf_int (string, optional) – Changes the confidence interval label
  • scale (string, optional) – Changes the scale to either log or linear
  • center (float, integer, optional) – Changes the reference line for the center
plot(figsize=(3, 3), t_adjuster=0.01, decimal=3, size=3, max_value=None, min_value=None, text_size=12)

Generates the matplotlib effect measure plot with the default or specified attributes. The following variables can be used to further fine-tune the effect measure plot

Parameters:
  • figsize (tuple, optional) – Adjust the size of the figure. Syntax is same as matplotlib figsize
  • t_adjuster (float, optional) – Used to refine alignment of the table with the line graphs. When generate plots, trial and error for this value are usually necessary. I haven’t come up with an algorithm to determine this yet…
  • decimal (integer, optional) – Number of decimal places to display in the table
  • size (integer,) – Option to adjust the size of the lines and points in the plot
  • max_value (float, optional) – Maximum value of x-axis scale. Default is None, which automatically determines max value
  • min_value (float, optional) – Minimum value of x-axis scale. Default is None, which automatically determines min value
  • text_size (int, float, optional) – Text size for the table. Default is 12.
Returns:

Return type:

matplotlib axes

zepid.graphics.graphics.dynamic_risk_plot(risk_exposed, risk_unexposed, measure='RD', loess=True, loess_value=0.25, point_color='darkblue', line_color='b', scale='linear')

Creates a plot of how the risk difference or risk ratio changes over time with survival data. See the references for an example of this plot. Input data should be pandas Series indexed by ‘timeline’ where ‘timeline’ is the time corresponding to the risk estimate

Parameters:
  • risk_exposed (Series) – Pandas Series with the probability of the outcome among the exposed group. Index by ‘timeline’ where ‘timeline’ is the time. If you directly output the 1 - survival_function_ from lifelines.KaplanMeierFitter(), this should create a valid input
  • risk_unexposed (Series) – Pandas Series with the probability of the outcome among the exposed group. Index by ‘timeline’ where ‘timeline’ is the time
  • measure (str, optional) – Whether to generate the risk difference (RD) or risk ratio (RR). Default is ‘RD’
  • loess (bool, optional) – Whether to generate LOESS curve fit to the calculated points. Default is True
  • loess_value (float, optional) – Fraction of values to fit LOESS curve to. Default is 0.25
  • point_color (str, optional) – Color of the points
  • line_color (str, optional) – Color of the LOESS line generated and plotted
  • scale (str, optional) – Change the y-axis scale. Options are ‘linear’ (default), ‘log’, ‘log-transform’. ‘log’ and ‘log-transform’ is only a valid option for Risk Ratio plots
Returns:

Return type:

matplotlib axes

Examples

See graphics documentation or causal documentation for a detailed example.

>>> import matplotlib.pyplot as plt
>>> from zepid.graphics import dynamic_risk_plot
>>> dynamic_risk_plot(a, b, loess=True)
>>> plt.show()

References

Cole SR, et al. (2014). Estimation of the standardized risk difference and ratio in a competing risks framework: application to injection drug use and progression to AIDS after initiation of antiretroviral therapy. AJE, 181(4), 238-245.

zepid.graphics.graphics.functional_form_plot(df, outcome, var, f_form=None, outcome_type='binary', discrete=False, link_dist=None, loess=True, loess_value=0.25, legend=True, model_results=True, points=False)

Creates a functional form plot to aid in functional form assessment for continuous/discrete variables. Plots can be created for binary and continuous outcomes. Default options are set to create a functional form plot for a binary outcome. To convert to a continuous outcome, outcome_type needs to be changed, in addition to the link_dist

Parameters:
  • df (DataFrame) – Pandas dataframe that contains the variables of interest
  • outcome (string) – Column name of the outcome variable of interest
  • var (string) – Column name of the variable of interest for the functional form assessment
  • f_form (string, optional) – Regression equation of the functional form to assess. Default is None, which will produce a linear functional form. Input the regression equation following the patsy syntax. For example, ‘var + var_sq’
  • outcome_type (string, optional) – Variable type of the outcome variable. Currently, only binary and continuous variables are supported. Default is ‘binary’
  • link_dist (optional) – Link and distribution for the GLM regression equation. Change this to any valid link and distributions supported by statsmodels. Default is None, which defaults to logistic regression
  • loess_value (float, optional) – Fraction of observations to use to fit the LOESS curve. This may need to be changed iteratively to determine which percent works best for the data. Default is 0.25
  • legend (bool, optional) – Turn the legend on or off. Default is True, displaying the legend in the graph
  • model_results (bool, optional) – Whether to produce the model results. Default is True, which provides model results
  • loess (bool, optional) – Whether to plot the LOESS curve along with the functional form. Default is True
  • points (bool, optional) – Whether to plot the data points, where size is relative to the number of observations. Default is False
  • discrete (bool, optional) – If your data is truly continuous, leave setting to auto bin the dat. Will automatically bin observations into categories for fitting a model with a disjoint indicator. If data is discrete, you can set this to True to use the actual values for the disjoint indicator. If you get a perfect SeparationError from statsmodels, it means you might have to reshift your categories.
Returns:

Returns a matplotlib graph with a LOESS line (dashed red-line), regression line (sold blue-line), and confidence interval (shaded blue)

Return type:

matplotlib axes

Examples

Setting up the environment

>>> from zepid import load_sample_data
>>> from zepid.graphics import functional_form_plot
>>> import matplotlib.pyplot as plt
>>> df = load_sample_data(timevary=False)
>>> df['cd4_sq'] = df['cd4']**2

Creating a functional form plot for a linear functional form

>>> functional_form_plot(df, outcome='dead', var='cd4')
>>> plt.show()

Functional form assessment for a quadractic functional form

>>> functional_form_plot(df, outcome='dead', var='cd4', f_form='cd4 + cd4_sq')
>>> plt.show()

Varying the LOESS value (increased LOESS value to smooth LOESS curve further)

>>> functional_form_plot(df, outcome='dead', var='cd4', loess_value=0.5)
>>> plt.show()

Removing the LOESS curve and the legend from the plot

>>> functional_form_plot(df, outcome='dead', var='cd4', loess=False, legend=False)
>>> plt.show()

Adding summary points to the plot. Points are grouped together and their size reflects their relative n

>>> functional_form_plot(df, outcome='dead', var='cd4', loess=False, legend=False, points=True)
>>> plt.show()

Functional form assessment for a discrete variable (age)

>>> functional_form_plot(df, outcome='dead', var='age0', discrete=True)
>>> plt.show()
zepid.graphics.graphics.labbe_plot(r1=None, r0=None, scale='both', additive_tuner=12, multiplicative_tuner=12, figsize=(7, 4), **plot_kwargs)

L’Abbe plots are useful for summarizing measure modification on the difference or ratio scale. Primarily invented for meta-analysis usage, these plots display risk differences (or ratios) by their individual risks by an exposure. I find them most useful for a visualization of why if there is an association and there is no modfication on one scale (additive or multiplicative), then there must be modification on the other scale.

Parameters:
  • r1 (float, list, optional) – Single probability or a list of probabilities when exposure is 1. Default is None, which does not display points
  • r0 (float, list, optional) – Single probability or a list of probabilities when exposure is 0. Default is None, which does not display points
  • scale (str, optional) – Which scale to plot. The default is ‘both’, which generates side-by-side plots of additive scale and multiplicative scale. Other options are; ‘additive’ to display the additive plot, and ‘multiplicative’ to display the multiplicative plot
  • additive_tuner (int, optional) – Optional parameter to change the number of lines displayed in the additive L’Abbe plot. Higher integer produces more reference lines
  • multiplicative_tuner (int, optional) – Optional parameter to change the number of lines displayed in the multiplicative L’Abbe plot. Higher integer produces more reference lines
  • figsize (set, optional) – Optional parameter to change the L’Abbe plot size. Only changes the plot size when scale=’both’
  • **plot_kwargs (optional) – Optional keyword arguments for matplotlib. kwargs will pass matplotlib.pyploy.plot kwargs are accepted. See matplotlib ‘plot()’ function documentation for further details
Returns:

Return type:

matplotlib axes

Examples

Setting up environment

>>> import matplotlib.pyplot as plt
>>> from zepid.graphics import labbe_plot

Creating a blank plot

>>> labbe_plot()
>>> plt.show()

Adding customized points to the plot

>>> labbe_plot(r1=[0.3, 0.5], r0=[0.2, 0.7], scale='additive', color='r', marker='D', markersize=10, linestyle='')
>>> plt.show()

Only returning the additive plot

>>> labbe_plot(r1=[0.3, 0.5], r0=[0.2, 0.7], scale='additive', markersize=10)
>>> plt.show()

Only returning the multiplicative plot

>>> labbe_plot(r1=[0.3, 0.5], r0=[0.2, 0.7], scale='multiplicative', markersize=10)
>>> plt.show()
zepid.graphics.graphics.pvalue_plot(point, sd, color='b', fill=True, null=0, alpha=None)

Creates a plot of the p-value distribution based on a point estimate and standard deviation. I find this plot to be useful to explain p-values and how much evidence weight you have in a specific value. I think it is useful to explain what exactly a p-value tells you. Note that this plot only works for measures on a linear scale (i.e. it will plot exp(log(RR)) incorrectly). It also helps to understand what exactly confidence intervals are telling you. These plots are based on Rothman’s Epidemiology 2nd Edition pg 152-153 and explained more fully within.

Parameters:
  • point (float) – Point estimate. Must be on a linear scale (RD / log(RR))
  • sd (float) – Standard error of the estimate. Must for linear scale (SE(RD) / SE(log(RR)))
  • color (str, optional) – Change color of p-value plot
  • fill (bool, optional) – Hhether to fill the curve under the p-value distribution. Setting to False prevents fill
  • null (float, integer, optional) – The main value to compare to. The default is zero
  • alpha (float, optional) – Whether to draw a line designating significance level area. Default is None, which does not draw this line. Generally, would be set to 0.05 to correspond to the widely used alpha of 0.05
Returns:

Return type:

matplotlib axes

Examples

Setting up the environment

>>> from zepid.graphics import pvalue_plot
>>> import matplotlib.pyplot as plt

Basic P-value plot

>>> pvalue_plot(point=-0.1, sd=0.061, color='r')
>>> plt.show()

P-value plot with significance line drawn at ‘alpha’

>>> pvalue_plot(point=-0.1, sd=0.061, color='r', alpha=0.025)
>>> plt.show()

P-value plot with different comparison value

>>> pvalue_plot(point=-0.1, sd=0.061, color='r', null=0.1)
>>> plt.show()

References

Rothman KJ. (2012). Epidemiology: an introduction. Oxford university press.

zepid.graphics.graphics.roc(df, true, threshold, youden_index=True)

Generate a Receiver Operator Curve from true values and predicted probabilities. Youden’s Index can also be calculated. Youden’s index is calculated as

\[P_{Yi} = max(Se_i + Sp_i - 1)\]
Parameters:
  • df (DataFrame) – Pandas dataframe containing variables of interest
  • true (str) – True designation of the outcome (1, 0)
  • threshold (str) – Predicted probabilities for the outcome
  • youden_index (bool, optional) – Whether to calculate Youden’s index. Youden’s index maximizes both sensitivity and specificity. The formula finds the maximum of (sensitivity + specificity - 1)
Returns:

Return type:

matplotlib axes

Examples

Creating a dataframe with true disease status (‘d’) and predicted probability of the outcome (‘p’)

>>> import pandas as pd
>>> import matplotlib.pyplot as plt
>>> from zepid.graphics import roc
>>> df = pd.DataFrame()
>>> df['d'] = [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1]
>>> df['p'] = [0.1, 0.15, 0.1, 0.7, 0.5, 0.9, 0.95, 0.5, 0.4, 0.8, 0.99, 0.99, 0.89, 0.95]

Creating ROC curve

>>> roc(df, true='d', threshold='p', youden_index=False)
>>> plt.show()
zepid.graphics.graphics.spaghetti_plot(df, idvar, variable, time)

Create a spaghetti plot by an ID variable. A spaghetti plot can be useful for visualizing trends or looking at longitudinal data patterns for individuals all at once.

Parameters:
  • df (DataFrame) – Pandas dataframe containing variables of interest
  • idvar (str) – ID variable for observations. This should indicate the group or individual followed over the time variable
  • variable (str) – Variable of interest to see how it varies over time
  • time (str) – Time or other variable in which the variable variation occurs
Returns:

Return type:

matplotlib axes

Examples

Setting up the environment

>>> from zepid import load_sample_data
>>> from zepid.graphics import spaghetti_plot
>>> df = load_sample_data(timevary=True)

Generating spaghetti plot for changing CD4 count

>>> spaghetti_plot(df, idvar='id', variable='cd4', time='enter')
>>> plt.show()
zepid.graphics.graphics.zipper_plot(truth, lcl, ucl, colors=('blue', 'red'))

Zipper plots are a way to present simulation data, particularly confidence intervals and their width. They are also useful for showing the confidence interval coverage of the true parameter.

Parameters:
  • truth (float) – The true value with which to compare the confidence interval coverage to
  • lcl (list, array, Series, container) – Container of lower confidence limits
  • ucl (list, array, Series, container) – Container of upper confidence limits
  • colors (set, list, container) – List of colors for confidence intervals. The first color is used to designate confidence intervals that cover the true value, and the second indicates confidence intervals
Returns:

Return type:

matplotlib axes

Examples

Setting up environment

>>> import matplotlib.pyplot as plt
>>> from zepid.graphics import zipper_plot

Adding customized points to the plot

>>> labbe_plot(r1=[0.3, 0.5], r0=[0.2, 0.7], scale='additive', color='r', marker='D', markersize=10, linestyle='')
>>> plt.show()