← Back to Blog

Interpretation

P-value vs Effect Size: Why Clinical Researchers Need Both

Statistics for clinical researchers and surgical trainees

In short

A p-value only tells you whether a difference probably exists. The effect size tells you how big that difference is. In a large study, a tiny and unimportant difference can still be "statistically significant", so always report both numbers together.

A p-value of 0.003 does not mean your finding is clinically important. An effect size of 0.8 does not mean it is statistically significant. On its own, each number tells an incomplete and sometimes misleading story. Together they answer the real question: does a difference exist, and is it big enough to matter?

What a p-value actually tells you

The p-value is the probability of seeing your data, or something more extreme, if there were truly no effect. That is all it is. It does not measure how big an effect is, it does not tell you whether a finding is clinically important, and it does not give the probability that your hypothesis is correct. Crucially, the p-value depends heavily on sample size: in a very large study, even a tiny, meaningless difference will produce a small p-value, while in a small study a genuinely important difference can produce a large one. This point was made formally by the American Statistical Association, whose official statement on p-values is the standard reference.1

What an effect size tells you

An effect size measures how big a difference or relationship is, and it does not depend on sample size. That is what makes it so useful: it lets you compare results across studies and is required to combine studies in a meta-analysis. Different tests use different effect-size measures; the small/medium/large thresholds below come from Cohen.2

TestEffect sizeSmallMediumLarge
t-testCohen's d0.20.50.8
Mann-Whitney Ur0.10.30.5
ANOVAeta-squared0.010.060.14
Chi-squareCramér's V0.10.30.5
Correlationr or ρ0.10.30.5

(For correlation specifically, the standard interpretation bands are set out by Schober and colleagues.3)

The four situations you can end up in

p < 0.05p > 0.05
Large effectSignificant and clinically meaningful — act on itA real effect, but too little power — consider a larger study
Small effectStatistically real but clinically trivialNo meaningful effect

The bottom-left box is the dangerous one in clinical research: a very large study can make a trivial difference "significant". Reporting that as an important finding is a known problem.

Why journals now require effect sizes

Following the ASA statement, most major clinical journals moved away from treating p < 0.05 as a simple pass/fail line.1 Reporting guidelines such as CONSORT, STROBE, and PRISMA all ask for effect sizes. A result reported without one cannot be included in a systematic review or meta-analysis, which sharply reduces its long-term value.

StatsPlease output — significance with effect size

Result A — real data (n = 268 + 500)

U = 105610 · p < .001 · r = 0.48 (medium)

Significant and clinically meaningful — the difference is real and large enough to matter.

Result B — illustrative scenario

p = .12 · r = .61 (large)

Not significant, but a large effect — likely underpowered, and worth a larger follow-up rather than dismissal.

Example output. Figures are illustrative. Result B is an illustrative scenario, not from real data.

Example data: Result A is computed with scipy from the Pima Indians Diabetes dataset (National Institute of Diabetes and Digestive and Kidney Diseases; see Data Sources). Result B is an illustrative scenario, not from real data.

A worked example

Imagine two studies. Scenario A involves a large diabetes screening study comparing blood glucose between people with and without a diabetes diagnosis, with 768 people in total. The result is highly significant (p < .001) and the effect size is r = 0.48 — a medium effect. Here significance and importance agree: the difference is real and large enough to matter. Scenario B represents a smaller study where a meaningful clinical difference exists but the sample is too small to detect it reliably — the result is illustrative. In Scenario B, p = 0.12 and the effect size is r = 0.61, a large effect. That is not significant at the 0.05 line, but an r of 0.61 is substantial. The study was almost certainly too small to detect it, so this finding deserves a larger follow-up study rather than being dismissed as nothing.

What StatsPlease gives you

Every StatsPlease result shows the p-value and the effect size together, with the effect size automatically labelled small, medium, or large. The plain-language output reads, for example: "There was a statistically significant difference between groups (U = 105610, p < .001, r = .48, medium effect)." This meets the CONSORT, STROBE, and PRISMA reporting requirements without any manual formatting.

Try it yourself

Reproduce this result — in StatsPlease or SPSS

Result A comes from a public dataset, so you can compute it yourself in either tool. (Result B is an illustrative scenario and is not from real data, so only Result A can be reproduced.)

In StatsPlease

  1. Download the Pima Indians Diabetes dataset (see Data Sources) and save it as CSV.
  2. Choose plasma glucose as the outcome and diabetes diagnosis as the group.
  3. Run. StatsPlease reports U, the exact p-value, and the effect size r with its label.

In SPSS

  1. Open the same CSV in SPSS.
  2. Go to Analyze ▸ Nonparametric Tests ▸ Independent Samples (Mann-Whitney U).
  3. Read U and the p-value; SPSS does not label the effect size for you.

Compare: both should return U = 105610 and p < .001. The point of this post is the missing piece: StatsPlease also reports r = .48 and labels it a medium effect, so you can judge whether the significant difference actually matters.

References

  1. Wasserstein RL, Lazar NA. The ASA Statement on p-Values: Context, Process, and Purpose. The American Statistician. 2016;70(2):129–133. https://doi.org/10.1080/00031305.2016.1154108
  2. Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum Associates; 1988.
  3. Schober P, Boer C, Schwarte LA. Correlation Coefficients: Appropriate Use and Interpretation. Anesthesia & Analgesia. 2018;126(5):1763–1768. https://doi.org/10.1213/ANE.0000000000002864

Every StatsPlease report includes the p-value and the effect size, interpreted and formatted for your journal.

Run My Analysis →