← Back to Blog

Workflow

Statistical Analysis for REDCap Data: A Practical Guide

Statistics for clinical researchers and surgical trainees

In short

Export your data from REDCap as a raw-data CSV file, check whether your numbers are normally distributed, then pick a test based on the type of each variable. StatsPlease reads REDCap column headers directly and chooses the correct test for you.

REDCap is the most widely used tool for collecting data in clinical research. It is excellent at building forms, validating entries, and tracking patients over time. What it does not do is analyse your data — that step is left entirely to you. The path from a REDCap export to a finished results section has more decisions in it than most trainees expect. This guide walks through all of them. As a running example, imagine a researcher studying primary biliary cirrhosis who collects urine copper levels and hepatomegaly (enlarged liver) status in REDCap.

Exporting your data from REDCap

In REDCap, use the Data Export Tool and choose the CSV / Microsoft Excel option in "raw data" format. Raw data keeps the underlying numbers, which is what statistical software needs; the "labels" option exports the text descriptions instead, which then have to be converted back before analysis. Keep the field names as your column headers. If your study follows patients over several visits, export with the event names included so you can tell the time points apart. Finally, use REDCap's built-in de-identification option to remove any identifying details before the file leaves the system.

Matching REDCap field types to statistics

Each REDCap field type lines up with a type of statistical variable, and that type determines how you summarise and test it.

REDCap field typeStatistical typeHow you summarise it
Text / NumberContinuousMean ± SD, or Median [IQR]
Radio button / DropdownCategoricalCount and percentage
CheckboxBinary / CategoricalCount and percentage
Slider / Visual analogue scaleContinuous or OrdinalMedian [IQR]
Calculated fieldContinuousMean ± SD, or Median [IQR]

("Continuous" means a true measurement like a serum level or a time. "Categorical" means a label like yes/no or which group. "Ordinal" means an ordered score like a symptom rating.)

Choosing the right test

Your dataWhat you are comparingTest to use
Continuous, normal, n above 30Two groupsIndependent t-test
Continuous, not normal or n below 30Two groupsMann-Whitney U
Continuous, normalThree or more groupsOne-way ANOVA
Continuous, not normal or n below 30Three or more groupsKruskal-Wallis
CategoricalAny groupsChi-square or Fisher's exact
Ordered score (e.g. symptom rating)Two groupsMann-Whitney U
ContinuousRelationship between twoSpearman or Pearson correlation

The choice between chi-square and Fisher's exact for categorical data follows a simple rule based on the expected counts in your table.1

The normality problem in REDCap data

Most clinical REDCap datasets have fewer than 50 patients per group. At that size you cannot assume your data are normally distributed — you have to test for it. Run a Shapiro-Wilk test on every continuous outcome before choosing a test.2 If its p-value is above 0.05, a parametric test is acceptable; if it is 0.05 or below, use the non-parametric version instead.3 When in doubt, choose the non-parametric test — being cautious here is never wrong.

Uploading REDCap data to StatsPlease

Export the raw-data CSV from REDCap and upload it straight to StatsPlease. There is no reformatting, no renaming of columns, and no manual recoding — StatsPlease reads REDCap column headers directly. Choose your outcome variable (here, urine copper) and your grouping variable (hepatomegaly) and run. In auto-select mode, StatsPlease runs the Shapiro-Wilk normality check and assigns the correct test before computing the result.

StatsPlease output — auto-selected test

Detected: continuous outcome, 2 independent groups
→ Shapiro-Wilk (hepatomegaly present): p < .001 → non-normal
→ Shapiro-Wilk (hepatomegaly absent): p < .001 → non-normal
Test selected: Mann-Whitney U

GroupnMedianIQR
Hepatomegaly present15988.00 µg/day52.00–159.00
Hepatomegaly absent15158.00 µg/day33.00–94.50

U = 15901.5 · p < .001 · r = 0.28 (medium)

Urine copper was significantly higher in patients with hepatomegaly (median 88.00 µg/day, IQR 52.00–159.00) than in those without (median 58.00, IQR 33.00–94.50), U = 15901.5, p < .001, r = .28.

Example output. Figures are illustrative.

Example data: Vanderbilt University Department of Biostatistics public teaching datasets (hbiostat.org/data). Figures computed with scipy from real data.

Why not SPSS?

Many researchers reach for SPSS out of habit. Here is how the two compare.

FeatureSPSSStatsPlease
Costapprox. $1,290/yearFree during open access
Underlying engineSPSS proprietaryscipy and statsmodels (open source)
REDCap importManual setupNative CSV upload
Output formattingBuild tables by handAuto-formatted for journals
ReproducibilityRequires a saved syntax fileAutomatic
Independent checkingSPSS onlyAny platform

Both use the same underlying statistical methods, so the results are the same.

SPSS pricing is approximate and varies by region, licence type, and institutional agreement. Shown for comparison only.

The methods statement

When you publish, name your software. For example: "Statistical analysis was performed using StatsPlease (statsplease.com), powered by scipy and statsmodels. Normality was assessed with the Shapiro-Wilk test, and non-parametric tests were used where normality was rejected (p < 0.05)." StatsPlease generates this statement automatically, with the correct version numbers, in your downloadable report.

Try it yourself

Reproduce this result — in StatsPlease or SPSS

The auto-selected result above comes from a public dataset, so you can run the same comparison yourself and confirm the numbers agree.

In StatsPlease

  1. Download the PBC dataset (see Data Sources) and save it as CSV.
  2. Upload it and choose urine copper as the outcome and hepatomegaly as the group.
  3. Run in auto-select mode. StatsPlease runs Shapiro-Wilk, selects Mann-Whitney U, and reports U, p, and r.

In SPSS

  1. Open the same CSV. Check normality first with Analyze ▸ Descriptive Statistics ▸ Explore (read the Shapiro-Wilk p-value).
  2. Because it is non-normal, run Analyze ▸ Nonparametric Tests ▸ Independent Samples (Mann-Whitney U).
  3. Read U and the p-value from the output.

Compare: both should return U = 15901.5 and p < .001. The difference is the work: in SPSS you ran the normality check and chose the test by hand; StatsPlease did both for you and added the effect size.

References

  1. Kim HY. Statistical notes for clinical researchers: Chi-squared test and Fisher's exact test. Restorative Dentistry & Endodontics. 2017;42(2):152–155. https://doi.org/10.5395/rde.2017.42.2.152
  2. Mishra P, Pandey CM, Singh U, Gupta A, Sahu C, Keshri A. Descriptive Statistics and Normality Tests for Statistical Data. Annals of Cardiac Anaesthesia. 2019;22(1):67–72. https://doi.org/10.4103/aca.ACA_157_18
  3. Nahm FS. Nonparametric statistical tests for the continuous data: the basic concept and the practical use. Korean Journal of Anesthesiology. 2016;69(1):8–14. https://doi.org/10.4097/kjae.2016.69.1.8

Upload your REDCap CSV and get the right test, the effect size, and formatted output automatically.

Analyse My REDCap Data →