Statistical Analysis for REDCap Data — Export, Analyse, Report

REDCap is the most widely used tool for collecting data in clinical research. It is excellent at building forms, validating entries, and tracking patients over time. What it does not do is analyse your data — that step is left entirely to you. The path from a REDCap export to a finished results section has more decisions in it than most trainees expect. This guide walks through all of them. As a running example, imagine a researcher studying primary biliary cirrhosis who collects urine copper levels and hepatomegaly (enlarged liver) status in REDCap.

Exporting your data from REDCap

In REDCap, use the Data Export Tool and choose the CSV / Microsoft Excel option in "raw data" format. Raw data keeps the underlying numbers, which is what statistical software needs; the "labels" option exports the text descriptions instead, which then have to be converted back before analysis. Keep the field names as your column headers. If your study follows patients over several visits, export with the event names included so you can tell the time points apart. Finally, use REDCap's built-in de-identification option to remove any identifying details before the file leaves the system.

Matching REDCap field types to statistics

Each REDCap field type lines up with a type of statistical variable, and that type determines how you summarise and test it.

REDCap field type	Statistical type	How you summarise it
Text / Number	Continuous	Mean ± SD, or Median [IQR]
Radio button / Dropdown	Categorical	Count and percentage
Checkbox	Binary / Categorical	Count and percentage
Slider / Visual analogue scale	Continuous or Ordinal	Median [IQR]
Calculated field	Continuous	Mean ± SD, or Median [IQR]

("Continuous" means a true measurement like a serum level or a time. "Categorical" means a label like yes/no or which group. "Ordinal" means an ordered score like a symptom rating.)

Choosing the right test

Your data	What you are comparing	Test to use
Continuous, normal, n above 30	Two groups	Independent t-test
Continuous, not normal or n below 30	Two groups	Mann-Whitney U
Continuous, normal	Three or more groups	One-way ANOVA
Continuous, not normal or n below 30	Three or more groups	Kruskal-Wallis
Categorical	Any groups	Chi-square or Fisher's exact
Ordered score (e.g. symptom rating)	Two groups	Mann-Whitney U
Continuous	Relationship between two	Spearman or Pearson correlation

The choice between chi-square and Fisher's exact for categorical data follows a simple rule based on the expected counts in your table.¹

The normality problem in REDCap data

Most clinical REDCap datasets have fewer than 50 patients per group. At that size you cannot assume your data are normally distributed — you have to test for it. Run a Shapiro-Wilk test on every continuous outcome before choosing a test.² If its p-value is above 0.05, a parametric test is acceptable; if it is 0.05 or below, use the non-parametric version instead.³ When in doubt, choose the non-parametric test — being cautious here is never wrong.

Uploading REDCap data to StatsPlease

Export the raw-data CSV from REDCap and upload it straight to StatsPlease. There is no reformatting, no renaming of columns, and no manual recoding — StatsPlease reads REDCap column headers directly. Choose your outcome variable (here, urine copper) and your grouping variable (hepatomegaly) and run. In auto-select mode, StatsPlease runs the Shapiro-Wilk normality check and assigns the correct test before computing the result.

StatsPlease output — auto-selected test

Detected: continuous outcome, 2 independent groups
→ Shapiro-Wilk (hepatomegaly present): p < .001 → non-normal
→ Shapiro-Wilk (hepatomegaly absent): p < .001 → non-normal
→ Test selected: Mann-Whitney U

Group	n	Median	IQR
Hepatomegaly present	159	88.00 µg/day	52.00–159.00
Hepatomegaly absent	151	58.00 µg/day	33.00–94.50

U = 15901.5 · p < .001 · r = 0.28 (medium)

Urine copper was significantly higher in patients with hepatomegaly (median 88.00 µg/day, IQR 52.00–159.00) than in those without (median 58.00, IQR 33.00–94.50), U = 15901.5, p < .001, r = .28.

Example output. Figures are illustrative.

Example data: Vanderbilt University Department of Biostatistics public teaching datasets (hbiostat.org/data). Figures computed with scipy from real data.

Why not SPSS?

Many researchers reach for SPSS out of habit. Here is how the two compare.

Feature	SPSS	StatsPlease
Cost	approx. $1,290/year	Free during open access
Underlying engine	SPSS proprietary	scipy and statsmodels (open source)
REDCap import	Manual setup	Native CSV upload
Output formatting	Build tables by hand	Auto-formatted for journals
Reproducibility	Requires a saved syntax file	Automatic
Independent checking	SPSS only	Any platform

Both use the same underlying statistical methods, so the results are the same.

SPSS pricing is approximate and varies by region, licence type, and institutional agreement. Shown for comparison only.

The methods statement

When you publish, name your software. For example: "Statistical analysis was performed using StatsPlease (statsplease.com), powered by scipy and statsmodels. Normality was assessed with the Shapiro-Wilk test, and non-parametric tests were used where normality was rejected (p < 0.05)." StatsPlease generates this statement automatically, with the correct version numbers, in your downloadable report.

Try it yourself

Reproduce this result — in StatsPlease or SPSS

The auto-selected result above comes from a public dataset, so you can run the same comparison yourself and confirm the numbers agree.

In StatsPlease

Download the PBC dataset (see Data Sources) and save it as CSV.
Upload it and choose urine copper as the outcome and hepatomegaly as the group.
Run in auto-select mode. StatsPlease runs Shapiro-Wilk, selects Mann-Whitney U, and reports U, p, and r.

In SPSS

Open the same CSV. Check normality first with Analyze ▸ Descriptive Statistics ▸ Explore (read the Shapiro-Wilk p-value).
Because it is non-normal, run Analyze ▸ Nonparametric Tests ▸ Independent Samples (Mann-Whitney U).
Read U and the p-value from the output.

Compare: both should return U = 15901.5 and p < .001. The difference is the work: in SPSS you ran the normality check and chose the test by hand; StatsPlease did both for you and added the effect size.

You might also read

References

Kim HY. Statistical notes for clinical researchers: Chi-squared test and Fisher's exact test. Restorative Dentistry & Endodontics. 2017;42(2):152–155. https://doi.org/10.5395/rde.2017.42.2.152
Mishra P, Pandey CM, Singh U, Gupta A, Sahu C, Keshri A. Descriptive Statistics and Normality Tests for Statistical Data. Annals of Cardiac Anaesthesia. 2019;22(1):67–72. https://doi.org/10.4103/aca.ACA_157_18
Nahm FS. Nonparametric statistical tests for the continuous data: the basic concept and the practical use. Korean Journal of Anesthesiology. 2016;69(1):8–14. https://doi.org/10.4097/kjae.2016.69.1.8

Upload your REDCap CSV and get the right test, the effect size, and formatted output automatically.

Analyse My REDCap Data →