Medicine Produces Evidence It Cannot Read

I — The Reformation

Medicine built its entire epistemological identity on evidence. Then it forgot to teach what evidence means.

In 1992, a group of clinicians at McMaster published a paper in JAMA introducing the term that would restructure clinical training: evidence-based medicine.¹ The argument was elegant. Medicine had been operating on hierarchy. You trusted the senior clinician, the consultant, the textbook author. EBM proposed a different authority — the data. Not who says it. What the numbers show.

It was a reformation, and it was right. Eminence-based medicine had protected too many wrong practices for too long. The shift to systematic evidence, to randomised trials, to meta-analyses — this was medicine correcting a structural flaw in how it evaluated itself. By the end of the 1990s, EBM was taught in every medical school. By the 2000s, it was the default epistemology of clinical practice. The transformation was, by any reasonable measure, complete.

II — The Paradox

In 2007, a research team at Yale tested 277 internal medicine residents on the biostatistics appearing in the literature they cited every day.² Basic questions. Confidence intervals. P-values. What a statistically significant result actually means. The kind of statistics that appear in the Methods and Results sections of every paper they read, every clinical decision they justified, every guideline they followed.

Average score — 277 internal medicine residents tested on the biostatistics in the literature they cite.²

The residents who expressed the highest confidence in their answers were correct zero per cent of the time. Every resident who was certain — absolutely certain — that they understood what a p-value meant was wrong.

Of residents who reported high confidence in their interpretation of statistical significance, every single one answered incorrectly.²

III — The Sutherland Flip

Here is what happened.

EBM solved the authority problem. It removed the professor from the centre and placed the data there instead. What it did not solve — what it never attempted to solve — was the comprehension problem.

Doctors learned to cite statistics. They learned that a p-value below 0.05 marks significance, that confidence intervals should not cross the null, that systematic reviews sit at the top of the evidence hierarchy. They learned the vocabulary of evidence without the mathematics underneath it. The citation replaced the professor. The deference moved from the authority in the room to the p-value on the page.

The doctor in the middle — the resident writing up their audit, the specialist reviewing a trial before changing practice — is still dependent on someone else’s claim that the numbers are valid. Only now that someone else is a number, not a person.

Rory Sutherland has a concept he calls the frame problem: the most common form of failure is not the wrong answer, but solving the wrong question. EBM solved the question of whose authority to trust. It did not solve the question of whether the person in the middle could evaluate what they were trusting. Medicine changed the object of deference. It did not change the dependence.

IV — The Structure Explains It

ECG interpretation was manual until it wasn’t. The algorithm now flags ST elevation, identifies arrhythmias, calculates QTc intervals. No cardiologist is expected to derive a QTc by hand before treatment. The calculation is complex, precise, and consequential — so medicine built a machine to do it correctly.

Drug dosing calculators handle pharmacokinetics. Anaesthesia management systems track a dozen parameters simultaneously. Risk tools — GRACE, EuroSCORE, APACHE — estimate event probabilities from multivariate inputs. The logic of systematisation has reached almost every corner of clinical practice. When a calculation is complex, high-stakes, and repeatable, medicine builds a system.

Statistics is the last manual process left.

Medical students and residents who cite statistics as the single biggest barrier to publishing their research — ahead of writing, ethics, supervision, or funding.³

Published surgical randomised controlled trials that report a sample size calculation. The majority are simply missing the number that justifies whether the study was large enough to find what it was looking for.⁴

Clinical papers that properly report effect size — the number that tells you whether a statistically significant finding is clinically meaningful, or merely a large-sample artefact.⁵

These are not failures of effort. They are failures of infrastructure. Every researcher who misreports a p-value, omits an effect size, or skips a sample size calculation is doing exactly what their training equipped them to do: cite statistics without reading them.

V — StatsPlease

We built StatsPlease for the same reason medicine built the ECG machine.

The calculation is complex. The stakes are real. The number in your Results section will be cited by other researchers, used to support clinical decisions, presented to ethics committees. It should be correct. And you should understand why.

StatsPlease does not guess. It checks normality with Shapiro-Wilk before selecting any test. It calculates the appropriate effect size for every analysis. It formats the output in AMA style, ready to paste into your Results section, with the test justification written out so you can explain it to a reviewer.

The guides in this library are the other half. They explain when each test applies, what the assumptions mean, and how to report results correctly — so that the tool serves the understanding rather than replacing it.

This is where you start.

Start reading

References

Evidence-Based Medicine Working Group. Evidence-based medicine: a new approach to teaching the practice of medicine. JAMA. 1992;268(17):2420–2425. https://doi.org/10.1001/jama.1992.03490170092032
Windish DM, Huot SJ, Green ML. Medicine residents’ understanding of the biostatistics and results in the medical literature. JAMA. 2007;298(9):1010–1022. https://doi.org/10.1001/jama.298.9.1010
Burgoyne LN, O’Flynn S, Boylan GB. Undergraduate medical research: the student perspective. Med Educ Online. 2010;15(1):5212. https://doi.org/10.3402/meo.v15i0.5212
Thiessen Philbrook H, Barrowman N, Garg AX. Improperly specified outcomes contribute to the majority of surgical randomized trials with high rates of events. J Clin Epidemiol. 2007;60(9):891–898. https://doi.org/10.1016/j.jclinepi.2006.12.001
Sullivan GM, Feinn R. Using effect size — or why the p value is not enough. J Grad Med Educ. 2012;4(3):279–282. https://doi.org/10.4300/JGME-D-12-00156.1

Two ways forward from here. Work through the guides to understand each test and how to report it. Or upload your dataset and get the AMA-formatted result in 60 seconds, with normality checked and the test chosen automatically.

Try StatsPlease free

The most evidence-based profession on earth produces evidence it cannot read.

References