class zepid.base.RiskDifference(reference=0, alpha=0.05)

Estimate of Risk Difference with a (1-alpha)*100% Confidence interval from a pandas DataFrame. Missing data is ignored. Exposure categories should be mutually exclusive

Risk difference is calculated as

\[RD = \Pr(Y|A=1) - \Pr(Y|A=0)\]

Risk difference standard error is calculated as

\[SE = \left(\frac{R_1 \times (1 - R_1)}{a+b} + \frac{R_0 \times (1-R_0)}{c+d}\right)^{\frac{1}{2}}\]

In addition to confidence intervals, the Frechet bounds are calculated as well. These probability bounds are useful for a comparison. Within these bounds, the true causal risk difference in the sample must live. The only assumptions these bounds require are no measurement error, causal consistency, no selection bias, and any missing data is MCAR. These bounds are always unit width (width of one), but they do not require any assumptions regarding confounding / conditional exchangeability. They are calculated via the following formula

\[\begin{split}Lower = \Pr(Y|A=a)\Pr(A=a) - \Pr(Y|A \ne a)\Pr(A \ne a) - \Pr(A=a)\\ Upper = \Pr(Y|A=a)\Pr(A=a) + \Pr(A \ne a) - \Pr(Y|A \ne a)\Pr(A \ne a)\end{split}\]

For further details on these bounds, see the references


Outcome must be coded as (1: yes, 0:no). Only supports binary outcomes

  • reference (integer, optional) – -reference category for comparisons. Default reference category is 0
  • alpha (float, optional) – -Alpha value to calculate two-sided Wald confidence intervals. Default is 95% confidence interval


Cole SR et al. (2019) Nonparametric Bounds for the Risk Function. American Journal of Epidemiology. 188(4), 632-636


Calculate the risk difference in a data set

>>> from zepid import RiskDifference, load_sample_data
>>> df = load_sample_data(False)
>>> rd = RiskDifference()
>>> rd.fit(df, exposure='art', outcome='dead')
>>> rd.summary()

Calculate the risk difference with exposure of ‘1’ as the reference category

>>> rd = RiskDifference(reference=1)
>>> rd.fit(df, exposure='art', outcome='dead')
>>> rd.summary()

Generate a plot of the calculated risk difference(s)

>>> import matplotlib.pyplot as plt
>>> rd = RiskDifference()
>>> rd.fit(df, exposure='art', outcome='dead')
>>> rd.plot()
>>> plt.show()
__init__(reference=0, alpha=0.05)

Initialize self. See help(type(self)) for accurate signature.


__init__([reference, alpha]) Initialize self.
fit(df, exposure, outcome) Calculates the Risk Difference
plot([measure, center]) Plot the risk differences or the risks along with their corresponding confidence intervals.
summary([decimal]) Prints the summary results