Graphics¶
Below is documentation for each of the implemented graphic generators.
Data Diagnostics¶
functional_form_plot (df, outcome, var[, …]) 
Creates a functional form plot to aid in functional form assessment for continuous/discrete variables. 
spaghetti_plot (df, idvar, variable, time) 
Create a spaghetti plot by an ID variable. 
roc (df, true, threshold[, youden_index]) 
Generate a Receiver Operator Curve from true values and predicted probabilities. 
Displaying Results¶
EffectMeasurePlot (label, effect_measure, …) 
Used to generate effect measure (AKA forest) plots. 
pvalue_plot (point, sd[, color, fill, null, …]) 
Creates a plot of the pvalue distribution based on a point estimate and standard deviation. 
dynamic_risk_plot (risk_exposed, risk_unexposed) 
Creates a plot of how the risk difference or risk ratio changes over time with survival data. 
labbe_plot ([r1, r0, scale, additive_tuner, …]) 
L’Abbe plots are useful for summarizing measure modification on the difference or ratio scale. 
zipper_plot (truth, lcl, ucl[, colors]) 
Zipper plots are a way to present simulation data, particularly confidence intervals and their width. 

class
zepid.graphics.graphics.
EffectMeasurePlot
(label, effect_measure, lcl, ucl)¶ Used to generate effect measure (AKA forest) plots. Estimates and confidence intervals are plotted in a diagram on the left and a table of the corresponding estimates is provided in the same plot. See the Graphics page on ReadTheDocs examples of the plots
Parameters:  label (list) – List of labels to use for yaxis
 effect_measure (list) – List of numbers for point estimates to plot. If point estimate has trailing zeroes, input as a character object rather than a float
 lcl (list) – List of numbers for upper confidence limits to plot. If point estimate has trailing zeroes, input as a character object rather than a float
 ucl (list) – List of numbers for upper confidence limits to plot. If point estimate has trailing zeroes, input as a character object rather than a float
Examples
Setting up the data to plot
>>> from matplotlib.pyplot as plt >>> from zepid.graphics import EffectMeasurePlot >>> lab = ['One','Two'] >>> emm = [1.01,1.31] >>> lcl = ['0.90',1.01] # String allows for trailing zeroes in the table >>> ucl = [1.11,1.53]
Setting up the plot, measure labels, and point colors
>>> x = EffectMeasurePlot(lab, emm, lcl, ucl) >>> x.labels(effectmeasure='RR') # Changing label of measure >>> x.colors(pointcolor='r') # Changing color of the points
Generating matplotlib axes object of forest plot
>>> x.plot(t_adjuster=0.13) >>> plt.show()

colors
(**kwargs)¶ Function to change colors and shapes.
Parameters:  errorbarcolor (string, optional) – Changes the error bar colors
 linecolor (string, optional) – Changes the color of the reference line
 pointcolor (string, optional) – Changes the color of the points
 pointshape (string, optional) – Changes the shape of points

labels
(**kwargs)¶ Function to change the labels of the outputted table. Additionally, the scale and reference value can be changed.
Parameters:  effectmeasure (string, optional) – Changes the effect measure label
 conf_int (string, optional) – Changes the confidence interval label
 scale (string, optional) – Changes the scale to either log or linear
 center (float, integer, optional) – Changes the reference line for the center

plot
(figsize=(3, 3), t_adjuster=0.01, decimal=3, size=3, max_value=None, min_value=None, text_size=12)¶ Generates the matplotlib effect measure plot with the default or specified attributes. The following variables can be used to further finetune the effect measure plot
Parameters:  figsize (tuple, optional) – Adjust the size of the figure. Syntax is same as matplotlib figsize
 t_adjuster (float, optional) – Used to refine alignment of the table with the line graphs. When generate plots, trial and error for this value are usually necessary. I haven’t come up with an algorithm to determine this yet…
 decimal (integer, optional) – Number of decimal places to display in the table
 size (integer,) – Option to adjust the size of the lines and points in the plot
 max_value (float, optional) – Maximum value of xaxis scale. Default is None, which automatically determines max value
 min_value (float, optional) – Minimum value of xaxis scale. Default is None, which automatically determines min value
 text_size (int, float, optional) – Text size for the table. Default is 12.
Returns: Return type: matplotlib axes

zepid.graphics.graphics.
dynamic_risk_plot
(risk_exposed, risk_unexposed, measure='RD', loess=True, loess_value=0.25, point_color='darkblue', line_color='b', scale='linear')¶ Creates a plot of how the risk difference or risk ratio changes over time with survival data. See the references for an example of this plot. Input data should be pandas Series indexed by ‘timeline’ where ‘timeline’ is the time corresponding to the risk estimate
Parameters:  risk_exposed (Series) – Pandas Series with the probability of the outcome among the exposed group. Index by ‘timeline’ where ‘timeline’
is the time. If you directly output the
1  survival_function_
from lifelines.KaplanMeierFitter(), this should create a valid input  risk_unexposed (Series) – Pandas Series with the probability of the outcome among the exposed group. Index by ‘timeline’ where ‘timeline’ is the time
 measure (str, optional) – Whether to generate the risk difference (RD) or risk ratio (RR). Default is ‘RD’
 loess (bool, optional) – Whether to generate LOESS curve fit to the calculated points. Default is True
 loess_value (float, optional) – Fraction of values to fit LOESS curve to. Default is 0.25
 point_color (str, optional) – Color of the points
 line_color (str, optional) – Color of the LOESS line generated and plotted
 scale (str, optional) – Change the yaxis scale. Options are ‘linear’ (default), ‘log’, ‘logtransform’. ‘log’ and ‘logtransform’ is only a valid option for Risk Ratio plots
Returns: Return type: matplotlib axes
Examples
See graphics documentation or causal documentation for a detailed example.
>>> import matplotlib.pyplot as plt >>> from zepid.graphics import dynamic_risk_plot
>>> dynamic_risk_plot(a, b, loess=True) >>> plt.show()
References
Cole SR, et al. (2014). Estimation of the standardized risk difference and ratio in a competing risks framework: application to injection drug use and progression to AIDS after initiation of antiretroviral therapy. AJE, 181(4), 238245.
 risk_exposed (Series) – Pandas Series with the probability of the outcome among the exposed group. Index by ‘timeline’ where ‘timeline’
is the time. If you directly output the

zepid.graphics.graphics.
functional_form_plot
(df, outcome, var, f_form=None, outcome_type='binary', discrete=False, link_dist=None, loess=True, loess_value=0.25, legend=True, model_results=True, points=False)¶ Creates a functional form plot to aid in functional form assessment for continuous/discrete variables. Plots can be created for binary and continuous outcomes. Default options are set to create a functional form plot for a binary outcome. To convert to a continuous outcome, outcome_type needs to be changed, in addition to the link_dist
Parameters:  df (DataFrame) – Pandas dataframe that contains the variables of interest
 outcome (string) – Column name of the outcome variable of interest
 var (string) – Column name of the variable of interest for the functional form assessment
 f_form (string, optional) – Regression equation of the functional form to assess. Default is None, which will produce a linear functional form. Input the regression equation following the patsy syntax. For example, ‘var + var_sq’
 outcome_type (string, optional) – Variable type of the outcome variable. Currently, only binary and continuous variables are supported. Default is ‘binary’
 link_dist (optional) – Link and distribution for the GLM regression equation. Change this to any valid link and distributions supported by statsmodels. Default is None, which defaults to logistic regression
 loess_value (float, optional) – Fraction of observations to use to fit the LOESS curve. This may need to be changed iteratively to determine which percent works best for the data. Default is 0.25
 legend (bool, optional) – Turn the legend on or off. Default is True, displaying the legend in the graph
 model_results (bool, optional) – Whether to produce the model results. Default is True, which provides model results
 loess (bool, optional) – Whether to plot the LOESS curve along with the functional form. Default is True
 points (bool, optional) – Whether to plot the data points, where size is relative to the number of observations. Default is False
 discrete (bool, optional) – If your data is truly continuous, leave setting to auto bin the dat. Will automatically bin observations into categories for fitting a model with a disjoint indicator. If data is discrete, you can set this to True to use the actual values for the disjoint indicator. If you get a perfect SeparationError from statsmodels, it means you might have to reshift your categories.
Returns: Returns a matplotlib graph with a LOESS line (dashed redline), regression line (sold blueline), and confidence interval (shaded blue)
Return type: matplotlib axes
Examples
Setting up the environment
>>> from zepid import load_sample_data >>> from zepid.graphics import functional_form_plot >>> import matplotlib.pyplot as plt >>> df = load_sample_data(timevary=False) >>> df['cd4_sq'] = df['cd4']**2
Creating a functional form plot for a linear functional form
>>> functional_form_plot(df, outcome='dead', var='cd4') >>> plt.show()
Functional form assessment for a quadractic functional form
>>> functional_form_plot(df, outcome='dead', var='cd4', f_form='cd4 + cd4_sq') >>> plt.show()
Varying the LOESS value (increased LOESS value to smooth LOESS curve further)
>>> functional_form_plot(df, outcome='dead', var='cd4', loess_value=0.5) >>> plt.show()
Removing the LOESS curve and the legend from the plot
>>> functional_form_plot(df, outcome='dead', var='cd4', loess=False, legend=False) >>> plt.show()
Adding summary points to the plot. Points are grouped together and their size reflects their relative n
>>> functional_form_plot(df, outcome='dead', var='cd4', loess=False, legend=False, points=True) >>> plt.show()
Functional form assessment for a discrete variable (age)
>>> functional_form_plot(df, outcome='dead', var='age0', discrete=True) >>> plt.show()

zepid.graphics.graphics.
labbe_plot
(r1=None, r0=None, scale='both', additive_tuner=12, multiplicative_tuner=12, figsize=(7, 4), **plot_kwargs)¶ L’Abbe plots are useful for summarizing measure modification on the difference or ratio scale. Primarily invented for metaanalysis usage, these plots display risk differences (or ratios) by their individual risks by an exposure. I find them most useful for a visualization of why if there is an association and there is no modfication on one scale (additive or multiplicative), then there must be modification on the other scale.
Parameters:  r1 (float, list, optional) – Single probability or a list of probabilities when exposure is 1. Default is None, which does not display points
 r0 (float, list, optional) – Single probability or a list of probabilities when exposure is 0. Default is None, which does not display points
 scale (str, optional) – Which scale to plot. The default is ‘both’, which generates sidebyside plots of additive scale and multiplicative scale. Other options are; ‘additive’ to display the additive plot, and ‘multiplicative’ to display the multiplicative plot
 additive_tuner (int, optional) – Optional parameter to change the number of lines displayed in the additive L’Abbe plot. Higher integer produces more reference lines
 multiplicative_tuner (int, optional) – Optional parameter to change the number of lines displayed in the multiplicative L’Abbe plot. Higher integer produces more reference lines
 figsize (set, optional) – Optional parameter to change the L’Abbe plot size. Only changes the plot size when scale=’both’
 **plot_kwargs (optional) – Optional keyword arguments for matplotlib. kwargs will pass matplotlib.pyploy.plot kwargs are accepted. See matplotlib ‘plot()’ function documentation for further details
Returns: Return type: matplotlib axes
Examples
Setting up environment
>>> import matplotlib.pyplot as plt >>> from zepid.graphics import labbe_plot
Creating a blank plot
>>> labbe_plot() >>> plt.show()
Adding customized points to the plot
>>> labbe_plot(r1=[0.3, 0.5], r0=[0.2, 0.7], scale='additive', color='r', marker='D', markersize=10, linestyle='') >>> plt.show()
Only returning the additive plot
>>> labbe_plot(r1=[0.3, 0.5], r0=[0.2, 0.7], scale='additive', markersize=10) >>> plt.show()
Only returning the multiplicative plot
>>> labbe_plot(r1=[0.3, 0.5], r0=[0.2, 0.7], scale='multiplicative', markersize=10) >>> plt.show()

zepid.graphics.graphics.
pvalue_plot
(point, sd, color='b', fill=True, null=0, alpha=None)¶ Creates a plot of the pvalue distribution based on a point estimate and standard deviation. I find this plot to be useful to explain pvalues and how much evidence weight you have in a specific value. I think it is useful to explain what exactly a pvalue tells you. Note that this plot only works for measures on a linear scale (i.e. it will plot exp(log(RR)) incorrectly). It also helps to understand what exactly confidence intervals are telling you. These plots are based on Rothman’s Epidemiology 2nd Edition pg 152153 and explained more fully within.
Parameters:  point (float) – Point estimate. Must be on a linear scale (RD / log(RR))
 sd (float) – Standard error of the estimate. Must for linear scale (SE(RD) / SE(log(RR)))
 color (str, optional) – Change color of pvalue plot
 fill (bool, optional) – Hhether to fill the curve under the pvalue distribution. Setting to False prevents fill
 null (float, integer, optional) – The main value to compare to. The default is zero
 alpha (float, optional) – Whether to draw a line designating significance level area. Default is None, which does not draw this line. Generally, would be set to 0.05 to correspond to the widely used alpha of 0.05
Returns: Return type: matplotlib axes
Examples
Setting up the environment
>>> from zepid.graphics import pvalue_plot >>> import matplotlib.pyplot as plt
Basic Pvalue plot
>>> pvalue_plot(point=0.1, sd=0.061, color='r') >>> plt.show()
Pvalue plot with significance line drawn at ‘alpha’
>>> pvalue_plot(point=0.1, sd=0.061, color='r', alpha=0.025) >>> plt.show()
Pvalue plot with different comparison value
>>> pvalue_plot(point=0.1, sd=0.061, color='r', null=0.1) >>> plt.show()
References
Rothman KJ. (2012). Epidemiology: an introduction. Oxford university press.

zepid.graphics.graphics.
roc
(df, true, threshold, youden_index=True)¶ Generate a Receiver Operator Curve from true values and predicted probabilities. Youden’s Index can also be calculated. Youden’s index is calculated as
\[P_{Yi} = max(Se_i + Sp_i  1)\]Parameters:  df (DataFrame) – Pandas dataframe containing variables of interest
 true (str) – True designation of the outcome (1, 0)
 threshold (str) – Predicted probabilities for the outcome
 youden_index (bool, optional) – Whether to calculate Youden’s index. Youden’s index maximizes both sensitivity and specificity. The formula finds the maximum of (sensitivity + specificity  1)
Returns: Return type: matplotlib axes
Examples
Creating a dataframe with true disease status (‘d’) and predicted probability of the outcome (‘p’)
>>> import pandas as pd >>> import matplotlib.pyplot as plt >>> from zepid.graphics import roc >>> df = pd.DataFrame() >>> df['d'] = [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1] >>> df['p'] = [0.1, 0.15, 0.1, 0.7, 0.5, 0.9, 0.95, 0.5, 0.4, 0.8, 0.99, 0.99, 0.89, 0.95]
Creating ROC curve
>>> roc(df, true='d', threshold='p', youden_index=False) >>> plt.show()

zepid.graphics.graphics.
spaghetti_plot
(df, idvar, variable, time)¶ Create a spaghetti plot by an ID variable. A spaghetti plot can be useful for visualizing trends or looking at longitudinal data patterns for individuals all at once.
Parameters:  df (DataFrame) – Pandas dataframe containing variables of interest
 idvar (str) – ID variable for observations. This should indicate the group or individual followed over the time variable
 variable (str) – Variable of interest to see how it varies over time
 time (str) – Time or other variable in which the variable variation occurs
Returns: Return type: matplotlib axes
Examples
Setting up the environment
>>> from zepid import load_sample_data >>> from zepid.graphics import spaghetti_plot >>> df = load_sample_data(timevary=True)
Generating spaghetti plot for changing CD4 count
>>> spaghetti_plot(df, idvar='id', variable='cd4', time='enter') >>> plt.show()

zepid.graphics.graphics.
zipper_plot
(truth, lcl, ucl, colors=('blue', 'red'))¶ Zipper plots are a way to present simulation data, particularly confidence intervals and their width. They are also useful for showing the confidence interval coverage of the true parameter.
Parameters:  truth (float) – The true value with which to compare the confidence interval coverage to
 lcl (list, array, Series, container) – Container of lower confidence limits
 ucl (list, array, Series, container) – Container of upper confidence limits
 colors (set, list, container) – List of colors for confidence intervals. The first color is used to designate confidence intervals that cover the true value, and the second indicates confidence intervals
Returns: Return type: matplotlib axes
Examples
Setting up environment
>>> import matplotlib.pyplot as plt >>> from zepid.graphics import zipper_plot
Adding customized points to the plot
>>> labbe_plot(r1=[0.3, 0.5], r0=[0.2, 0.7], scale='additive', color='r', marker='D', markersize=10, linestyle='') >>> plt.show()