Mann-Whitney U vs t-test: Which to Use for Clinical Data

Choosing between these two tests is one of the first decisions you make in any clinical analysis. It is also one of the most common things to get wrong. Clinical data are often collected from small groups of patients, the numbers are often skewed, and many outcomes are scores rather than true measurements. In all of these situations, picking the wrong test can lead a reviewer to reject your methods. This guide explains the difference in plain terms and tells you exactly when to use each test.

What the two tests actually do

A t-test is a "parametric" test. Parametric simply means the test assumes your data follow a particular shape — in this case, the normal distribution, the familiar symmetric bell curve. The t-test works with averages (means) and asks whether the average of one group is meaningfully different from the average of another.

The Mann-Whitney U test is "non-parametric." Non-parametric means it makes no assumption about the shape of your data.¹ Instead of using the actual values, it ranks every observation from smallest to largest and compares the ranks of the two groups. Both tests answer the same basic question — are these two independent groups different? — but they are valid under different conditions.

When to use the t-test

The t-test is appropriate when all of the following are true. First, you have more than about 30 people in each group. With groups this size, a statistical rule called the central limit theorem means mild departures from a normal shape stop mattering very much. Second, your data are actually normally distributed, which you confirm with a normality test (described below). Third, your outcome is a continuous measurement, such as serum bilirubin in mg/dL or a patient's age in years. When these conditions hold, the t-test is valid and is slightly better at detecting a real difference than the Mann-Whitney U test.

When to use Mann-Whitney U

Use the Mann-Whitney U test when any of the following is true. You have fewer than about 30 people per group, which is the usual situation in single-centre and subgroup research. Your outcome is an ordered score rather than a measurement, such as a pain score or a patient satisfaction scale. Your data are skewed, meaning most values cluster at one end with a few stretched out at the other — lab values like serum bilirubin and length of hospital stay often behave this way. Or a normality test tells you the data are not normal. In clinical research, these situations come up more often than not, so the Mann-Whitney U test is used very frequently.

Always check normality first

Before you choose, run a normality test on your outcome. The most common one for small groups is the Shapiro-Wilk test.² It produces a p-value. If that p-value is above 0.05, you cannot say the data are non-normal, so a parametric test like the t-test is acceptable. If the p-value is 0.05 or below, the data are not normal, so you should use the Mann-Whitney U test. (For groups larger than about 50, a different normality test called Kolmogorov-Smirnov is often used instead.) StatsPlease runs this check automatically and routes your data to the correct test before you see any result.

Consider a study comparing serum bilirubin between patients with and without hepatomegaly (an enlarged liver) in a primary biliary cirrhosis cohort (a public liver-disease teaching dataset). Bilirubin is strongly right-skewed, so a Shapiro-Wilk test rejects normality in both groups and StatsPlease routes the comparison to the Mann-Whitney U test. The output looks like this:

StatsPlease output — Mann-Whitney U test

Group	n	Median	IQR
Hepatomegaly present	160	2.55 mg/dL	1.10–5.80
Hepatomegaly absent	152	1.00 mg/dL	0.60–1.90

U = 17635.5 · p < .001 · r = 0.39 (medium)

Serum bilirubin was significantly higher in patients with hepatomegaly (median 2.55 mg/dL, IQR 1.10–5.80) than in those without (median 1.00 mg/dL, IQR 0.60–1.90), U = 17635.5, p < .001, r = .39.

Example output. Figures are illustrative.

Example data: Vanderbilt University Department of Biostatistics public teaching datasets (hbiostat.org/data). Figures computed with scipy from real data.

How to report each test

For a t-test, report the means and how spread out the data are (the standard deviation), then the test result and an effect size: "Mean ± SD; t(df) = X.XX, p = .0XX, Cohen's d = X.XX". For a Mann-Whitney U test, report the median and the interquartile range (the IQR, the middle 50% of values), then the test result and an effect size: "Median [IQR]; U = XXX, p = .0XX, r = .XX". Always include the effect size — a number that describes how large the difference is, separate from whether it is significant. Cohen's d is the effect size for the t-test and r is the effect size for Mann-Whitney U; the standard small/medium/large thresholds come from Cohen.³ Most clinical journals now require an effect size, so leaving it out invites a revision request.

The safe default

When you are unsure and your groups are small, use the Mann-Whitney U test. It is never statistically wrong to use a non-parametric test in a situation where a parametric test would also have worked. The opposite mistake — using a t-test when its assumptions are broken — is a genuine error. Default to the safer test and state your reasoning in the methods section.

Try it yourself

Reproduce this result — in StatsPlease or SPSS

Don't take our word for it. The example above comes from a public dataset, so you can compute the same numbers yourself in either tool and confirm they agree.

In StatsPlease

Download the PBC dataset (see Data Sources) and save it as CSV.
Upload it and choose serum bilirubin as the outcome and hepatomegaly as the grouping variable.
Run. StatsPlease checks normality, selects Mann-Whitney U, and reports U, p, and the effect size r.

In SPSS

Open the same CSV in SPSS.
Go to Analyze ▸ Nonparametric Tests ▸ Independent Samples, with bilirubin as the test field and hepatomegaly as the group.
Read the Mann-Whitney U and the p-value from the output.

Compare: both should return U = 17635.5 and p < .001 — identical numbers, because both compute the same test. SPSS reports U and p; StatsPlease adds the effect size r and the ready-to-paste sentence.

You might also read

References

Nahm FS. Nonparametric statistical tests for the continuous data: the basic concept and the practical use. Korean Journal of Anesthesiology. 2016;69(1):8–14. https://doi.org/10.4097/kjae.2016.69.1.8
Mishra P, Pandey CM, Singh U, Gupta A, Sahu C, Keshri A. Descriptive Statistics and Normality Tests for Statistical Data. Annals of Cardiac Anaesthesia. 2019;22(1):67–72. https://doi.org/10.4103/aca.ACA_157_18
Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum Associates; 1988.

StatsPlease selects Mann-Whitney U or the t-test for you, runs the normality check, and reports the effect size automatically.

Analyse My Data →