Workflow
Statistical Analysis for REDCap Data: A Practical Guide
Statistics for clinical researchers and surgical trainees
In short
Export your data from REDCap as a raw-data CSV file, check whether your numbers are normally distributed, then pick a test based on the type of each variable. StatsPlease reads REDCap column headers directly and chooses the correct test for you.
REDCap is the most widely used tool for collecting data in clinical research. It is excellent at building forms, validating entries, and tracking patients over time. What it does not do is analyse your data — that step is left entirely to you. The path from a REDCap export to a finished results section has more decisions in it than most trainees expect. This guide walks through all of them. As a running example, imagine a researcher studying primary biliary cirrhosis who collects urine copper levels and hepatomegaly (enlarged liver) status in REDCap.
Exporting your data from REDCap
In REDCap, use the Data Export Tool and choose the CSV / Microsoft Excel option in "raw data" format. Raw data keeps the underlying numbers, which is what statistical software needs; the "labels" option exports the text descriptions instead, which then have to be converted back before analysis. Keep the field names as your column headers. If your study follows patients over several visits, export with the event names included so you can tell the time points apart. Finally, use REDCap's built-in de-identification option to remove any identifying details before the file leaves the system.
Matching REDCap field types to statistics
Each REDCap field type lines up with a type of statistical variable, and that type determines how you summarise and test it.
| REDCap field type | Statistical type | How you summarise it |
|---|---|---|
| Text / Number | Continuous | Mean ± SD, or Median [IQR] |
| Radio button / Dropdown | Categorical | Count and percentage |
| Checkbox | Binary / Categorical | Count and percentage |
| Slider / Visual analogue scale | Continuous or Ordinal | Median [IQR] |
| Calculated field | Continuous | Mean ± SD, or Median [IQR] |
("Continuous" means a true measurement like a serum level or a time. "Categorical" means a label like yes/no or which group. "Ordinal" means an ordered score like a symptom rating.)
Choosing the right test
| Your data | What you are comparing | Test to use |
|---|---|---|
| Continuous, normal, n above 30 | Two groups | Independent t-test |
| Continuous, not normal or n below 30 | Two groups | Mann-Whitney U |
| Continuous, normal | Three or more groups | One-way ANOVA |
| Continuous, not normal or n below 30 | Three or more groups | Kruskal-Wallis |
| Categorical | Any groups | Chi-square or Fisher's exact |
| Ordered score (e.g. symptom rating) | Two groups | Mann-Whitney U |
| Continuous | Relationship between two | Spearman or Pearson correlation |
The choice between chi-square and Fisher's exact for categorical data follows a simple rule based on the expected counts in your table.1
The normality problem in REDCap data
Most clinical REDCap datasets have fewer than 50 patients per group. At that size you cannot assume your data are normally distributed — you have to test for it. Run a Shapiro-Wilk test on every continuous outcome before choosing a test.2 If its p-value is above 0.05, a parametric test is acceptable; if it is 0.05 or below, use the non-parametric version instead.3 When in doubt, choose the non-parametric test — being cautious here is never wrong.
Uploading REDCap data to StatsPlease
Export the raw-data CSV from REDCap and upload it straight to StatsPlease. There is no reformatting, no renaming of columns, and no manual recoding — StatsPlease reads REDCap column headers directly. Choose your outcome variable (here, urine copper) and your grouping variable (hepatomegaly) and run. In auto-select mode, StatsPlease runs the Shapiro-Wilk normality check and assigns the correct test before computing the result.
Detected: continuous outcome, 2 independent groups
→ Shapiro-Wilk (hepatomegaly present): p < .001 → non-normal
→ Shapiro-Wilk (hepatomegaly absent): p < .001 → non-normal
→ Test selected: Mann-Whitney U
| Group | n | Median | IQR |
|---|---|---|---|
| Hepatomegaly present | 159 | 88.00 µg/day | 52.00–159.00 |
| Hepatomegaly absent | 151 | 58.00 µg/day | 33.00–94.50 |
U = 15901.5 · p < .001 · r = 0.28 (medium)
Urine copper was significantly higher in patients with hepatomegaly (median 88.00 µg/day, IQR 52.00–159.00) than in those without (median 58.00, IQR 33.00–94.50), U = 15901.5, p < .001, r = .28.
Example output. Figures are illustrative.
Example data: Vanderbilt University Department of Biostatistics public teaching datasets (hbiostat.org/data). Figures computed with scipy from real data.
Why not SPSS?
Many researchers reach for SPSS out of habit. Here is how the two compare.
| Feature | SPSS | StatsPlease |
|---|---|---|
| Cost | approx. $1,290/year | Free during open access |
| Underlying engine | SPSS proprietary | scipy and statsmodels (open source) |
| REDCap import | Manual setup | Native CSV upload |
| Output formatting | Build tables by hand | Auto-formatted for journals |
| Reproducibility | Requires a saved syntax file | Automatic |
| Independent checking | SPSS only | Any platform |
Both use the same underlying statistical methods, so the results are the same.
SPSS pricing is approximate and varies by region, licence type, and institutional agreement. Shown for comparison only.
The methods statement
When you publish, name your software. For example: "Statistical analysis was performed using StatsPlease (statsplease.com), powered by scipy and statsmodels. Normality was assessed with the Shapiro-Wilk test, and non-parametric tests were used where normality was rejected (p < 0.05)." StatsPlease generates this statement automatically, with the correct version numbers, in your downloadable report.
Try it yourself
Reproduce this result — in StatsPlease or SPSS
The auto-selected result above comes from a public dataset, so you can run the same comparison yourself and confirm the numbers agree.
In StatsPlease
- Download the PBC dataset (see Data Sources) and save it as CSV.
- Upload it and choose urine copper as the outcome and hepatomegaly as the group.
- Run in auto-select mode. StatsPlease runs Shapiro-Wilk, selects Mann-Whitney U, and reports U, p, and r.
In SPSS
- Open the same CSV. Check normality first with Analyze ▸ Descriptive Statistics ▸ Explore (read the Shapiro-Wilk p-value).
- Because it is non-normal, run Analyze ▸ Nonparametric Tests ▸ Independent Samples (Mann-Whitney U).
- Read U and the p-value from the output.
Compare: both should return U = 15901.5 and p < .001. The difference is the work: in SPSS you ran the normality check and chose the test by hand; StatsPlease did both for you and added the effect size.
You might also read
References
- Kim HY. Statistical notes for clinical researchers: Chi-squared test and Fisher's exact test. Restorative Dentistry & Endodontics. 2017;42(2):152–155. https://doi.org/10.5395/rde.2017.42.2.152
- Mishra P, Pandey CM, Singh U, Gupta A, Sahu C, Keshri A. Descriptive Statistics and Normality Tests for Statistical Data. Annals of Cardiac Anaesthesia. 2019;22(1):67–72. https://doi.org/10.4103/aca.ACA_157_18
- Nahm FS. Nonparametric statistical tests for the continuous data: the basic concept and the practical use. Korean Journal of Anesthesiology. 2016;69(1):8–14. https://doi.org/10.4097/kjae.2016.69.1.8
Upload your REDCap CSV and get the right test, the effect size, and formatted output automatically.
Analyse My REDCap Data →