Reference
Data Sources
The worked examples in the StatsPlease blog are computed from real, publicly available clinical datasets — most from the Vanderbilt University Department of Biostatistics (hbiostat.org/data), with the Pima Indians set from the NIDDK via Plotly. Using real data means the numbers you see in our output cards are genuine statistical results, not invented figures. We note which dataset was used in each post.
1. Mayo Clinic Primary Biliary Cirrhosis (PBC)
A clinical trial of 418 patients with primary biliary cirrhosis conducted at the Mayo Clinic between 1974 and 1984. Includes liver function markers, clinical signs, and survival outcomes.
Variables used in our examples: serum bilirubin, urine copper, prothrombin time, hepatomegaly, sex, treatment arm.
Used in: Posts 1, 3, 4 (serum bilirubin, urine copper, hepatomegaly).
Source: https://hbiostat.org/data
Data obtained from http://hbiostat.org/data courtesy of the Vanderbilt University Department of Biostatistics.
2. Vanderbilt Diabetes Study
A cross-sectional screening study of 403 adults from Buckingham and Louisa counties, Virginia. Includes blood glucose measures, HbA1c, lipid panels, and demographics.
Variables used in our examples: stabilised blood glucose, glycosylated haemoglobin (HbA1c), cholesterol, HDL.
Used in: Posts 2, 6 (stabilised glucose vs HbA1c correlation).
Source: https://hbiostat.org/data
Data obtained from http://hbiostat.org/data courtesy of the Vanderbilt University Department of Biostatistics.
3. Heart Failure Clinical Records
Clinical records of 299 patients admitted with heart failure, including cardiac function markers, blood chemistry, and mortality outcomes.
Variables used in our examples: ejection fraction, serum creatinine, serum sodium, anaemia, smoking, death event.
Used in: Post 5 (serum creatinine in EF ≤ 20% subgroup) and Post 7 (anaemia vs smoking in severe subgroup, Fisher's exact example).
Source: https://hbiostat.org/data
Data obtained from http://hbiostat.org/data courtesy of the Vanderbilt University Department of Biostatistics.
4. Baystate Medical Centre Birthweight Study
A retrospective cohort of 189 mothers from Baystate Medical Centre, Springfield, Massachusetts. Examines risk factors for low birthweight (<2500g).
Variables used in our examples: infant birthweight, low birthweight, maternal smoking, maternal age.
Used in: Post 7 (maternal smoking vs low birthweight, chi-square example).
Source: https://hbiostat.org/data
Data obtained from http://hbiostat.org/data courtesy of the Vanderbilt University Department of Biostatistics.
5. Pima Indians Diabetes Dataset
A diabetes screening study of 768 women of Pima Indian heritage, examining glucose, BMI, and other predictors of diabetes diagnosis.
Variables used in our examples: plasma glucose, diabetes diagnosis.
Used in: Post 6 (blood glucose by diabetes diagnosis, effect-size contrast).
Original data: National Institute of Diabetes and Digestive and Kidney Diseases; accessed via the Plotly public datasets collection.