zepid.datasets.load_gvhd_data

zepid.datasets.load_gvhd_data()

Loads bone marrow transplant recipient data from Keil AP, Edwards JK, Richardson DB, Naimi AI, Cole SR. The parametric g-formula for time-to-event data: intuition and a worked example. Epidemiology. 2014;25(6):889-97. Patients were followed until death or administrative censoring at 5-years.

Notes

Variables are formatted exactly as described in Keil et al. 2014
  • id: unique ID for each participant
  • age: participant baseline age
  • agesq: squared baseline age
  • agecurs1: restricted cubic spline knot 1 for baseline age
  • agecurs2: restricted cubic spline knot 2 for basline age
  • male: participant gender (1 is male, 0 is female)
  • cmv: cytomegalovirus baseline immune status (1 is yes, 0 is no)
  • all: at this time, I am unsure what this variable indicates (1, 0)
  • wait: wait time from diagnosis to transplantation (months)
  • day: day since transplantation
  • daysq: squared day since transplantation
  • daycu: cubic day since transplantation
  • daycurs1: restricted cubic spline knot 1 for days since transplantation
  • daycurs2: restricted cubic spline knot 2 for days since transplantation
  • yesterday: previous day
  • tomorrow: day after
  • gvhd: indicator for Graph-versus-Host Disease (1 is yes, 0 is no)
  • d: indicator of death (1 is yes, 0 is no)
  • relapse: indicator for relapse (1 is yes, 0 is no)
  • platnorm: indicator for normal platelet count (1 is yes, 0 is no)
  • censlost: indicator for censoring due to loss-to-follow-up (1 is yes, 0 is no)
  • gvhdm1: indicator for previous day diagnosis of GvHD (1 is yes, 0 is no)
  • relapsem1: indicator for previous day relapse (1 is yes, 0 is no)
  • platnormm1: indicator for previous day normal platelet count (1 is yes, 0 is no)
  • daysnogvhd: number of consecutive days without a GvHD diagnosis
  • daysnorelapse: number of consecutive days without relapse
  • daysnoplatnorm: number of consecutive days without normal platelet count
  • daysgvhd: number of consecutive days with GvHD
  • daysrelapse: number of consecutive days after relapse
  • daysplatnorm: number of consecutive days with normal platelet count
Returns:Returns pandas DataFrame
Return type:DataFrame