zepid.base.table1_generator¶

zepid.base.
table1_generator
(df, cols, variable_type, continuous_measure='median', strat_by=None, decimal=3)¶ Code to automatically generate a descriptive table of your study population (often referred to as a Table 1). Personally, I hate copying SAS/R/Python output from the interpreter to an Excel or other spreadsheet software. This code will generate a pandas dataframe object. This object will be a formatted table which can be exported as a CSV, opened in Excel, then final formatting changes/renaming can be done. Variables with np.nan values are counted as missing
Categorical variables will be divided into the unique numbers and have a percent calculated. Additionally, missing data will be counted (but is not included in the percent). Additionally, a single categorical variable can be used to present the results
Continuous variables either have median/IQR or mean/SE calculated depending on what is requested. Missing are counted as a separate category
Parameters:  df (DataFrame) – Pandas dataframe object containing all variables of interest
 cols (list) – List of columns of variable names to include in the table. Ex) [‘X’,var1’,’var2’]
 variable_type (list) – List of strings indicating the variable types. For example, [‘category’,’continuous’,’continuous’]. Variable types accepted are * ‘category’ variable with categories only * ‘continuous’ continuous variable
 continuous_measure (string, optional) – Whether to use the medians or the means. Default is median. Options are * ‘median’ returns medians and IQR for continuous variables * ‘mean’ returns means and SE for continuous variables
 strat_by (string, optional) – Categorical variable to stratify by. Default is None (no stratification)
 decimal (integer, optional) – Decimal places to display in the table. Default is 3
Returns: Returns a pandas dataframe object containing a formatted Table 1. It is not recommended that this table is used in any part of later analysis, since is id difficult to parse through the table. This function is only meant to reduce the amount of copying from output needed.
Return type: pd.DataFrame