Numbers don’t lie … unless they’re statistics.
A colleague of mine was nice enough to point me to this news item from The Economist on the reliability of medical research papers. The article in question is “premium content”, but the article it discusses, ”Why Most Published Research Findings Are False”, is freely available. (Rock on, Public Library of Science!)
The paper is by John Ioannidis, an epidemiologist. It goes into loving detail, looking at factors like prior probability of your hypothesis being true, statistical power of the study, and the level of statistical significance, to show why certain common practices in medical research increase the probability that a “statistically significant” result is fairly meaningless. (If you’re not a math fan, it’s tough going. Even if you are a math fan but it’s been a while since you’ve mucked around with prob/stat, you may want to chew carefully.)
It’s an interesting read (and, I imagine, an important one for researchers who want to avoid some of the pitfalls Ioannidis indicates). Rather that recapping his argument here, which would necessarily involve either going into way more mathematical detail than you want me to, or dumbing it down, I’m just going to give you his corollaries:
Corollary 1: The smaller the studies conducted in a scientific field, the less likely the research findings are to be true.
Corollary 2: The smaller the effect sizes in a scientific field, the less likely the research findings are to be true.
Corollary 3: The greater the number and the lesser the selection of tested relationships in a scientific field, the less likely the research findings are to be true.
Corollary 4: The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true.
Corollary 5: The greater the financial and other interests and prejudices in a scientific field, the less likely the research findings are to be true.
Corollary 6: The hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true.
A number of these didn’t surprise me. (“Financial interest might bias my results? Really?”) But I wasn’t expecting Corollary 4 at all. It does seem reasonable that if people in a research area don’t agree on precisely what they are, or should be, measuring, it’s harder to find out what’s really going on in that area. As Ioannidis puts it,
Adherence to common standards is likely to increase the proportion of true findings. The same applies to outcomes. True findings may be more common when outcomes are unequivocal and universally agreed (e.g., death) rather than when multifarious outcomes are devised (e.g., scales for schizophrenia outcomes). Similarly, fields that use commonly agreed, stereotyped analytical methods (e.g., Kaplan-Meier plots and the log-rank test) may yield a larger proportion of true findings than fields where analytical methods are still under experimentation (e.g., artificial intelligence methods) and only “best” results are reported.
As Ioannidis notes, the stage of research at which you’re working out what pieces of the system are important, what kinds of experiments might tell you something useful, etc., is very important for hypothesis generation. But, it would seem, to test these hypotheses in reliable ways you have to move from flexibility to rigidity — in other words, you need to separate the process of generating hypotheses from the process of testing those hypotheses. (Sir Karl Popper, in his grave, does whatever one would do in situations where rolling is not appropriate.)
Similarly, Corollary 6 is not as obvious. It would seem that more people working on a question would lead to better results than fewer people working on it. But hot fields are ones that are often relatively new (with appropriate standards still being worked out — see Corollary 4 again) and, perhaps more importantly, they are fields in which research teams are in fierce competition:
With many teams working on the same field and with massive experimental data being produced, timing is of the essence in beating competition. Thus, each team may prioritize on pursuing and disseminating its most impressive “positive” results. “Negative” results may become attractive for dissemination only if some other team has found a “positive” association on the same question. In that case, it may be attractive to refute a claim made in some prestigious journal.
In other words, the stakes of the competition can influence the flow of information from the labs to the literature. Moreover, it wouldn’t be surprising if the desire to get good results into print first makes mildly promising experimental results look better. (“Holy cow, we found it! Write it up, stat!)
Of course, scientists are human. Scientists can’t help but be biased by what they expect to see and by what they want to see. But that’s where the community of science is supposed to come it. All the individual biases are supposed to somehow cancel out in the scientific knowledge the community produces and endorses. Even if you see something in the data, if other scientists can’t see it, the community won’t accept it. And, the way the story is usually told, competition between scientists for discoveries and recognition and use-through-citation and groupies is supposed to make each of them scrutinize their competitor’s research findings and try to find fault with them.
Ioannidis seems to be saying, though, that it doesn’t always work out that way. Whether because scientists are trying to observe phenomena that are hard to pin down, or because negative findings only turn into publications when they show someone else’s positive findings were mistaken, or because scientists don’t always understand statistical methods well enough to set up good experiments of their own or critique the statistical methods used by papers in the literature, the community of science is ending up with less clarity about what it knows and with what kind of certainty. But, perhaps the insights of Ioannidis and others will help improve the situation.
Also worth noting, of course, is the fact that Ioannidis is concerned with the medical literature, rather than all scientific literature across all fields. There may be special difficulties that come from studying wicked-hard complex systems (like sick humans) that you don’t encounter, say, dealing with a chemical reaction in a beaker. Beyond dealing with different kinds of phenomena and agreed upon (or disputed) ways to characterize and probe them, scientists in different fields deal with different expectations as to what “scholarly productivity” looks like; I’m told the pressure to publish multiple papers a year is especially high in biomedical fields. Finally, while most of us couldn’t find a useful real-life application for string theory if we had spinach between our teeth, biomedical research seems pretty relevant to everyday concerns. Not only does the public eagerly anticipate scientific solutions to a panoply of problems with human health, but there’s a big old pharmaceutical industry that wants to be a part of it. But hey, no pressure!
My favorite section title in Ioannidis’ paper is “Claimed Research Findings May Often Be Simply Accurate Measures of the Prevailing Bias.” This is where the prescription lies: good science will depend on vigilant attention to what prevailing biases there are — in a research group, a department, a scientific field, the community of science, and in the larger societal structures in which scientists are embedded — and serious attempts to ensure that those biases are stripped out of the knowledge that scientists produce.
(And, for an extreme example of how research can give us more reliable information about researcher bias than about the system under study, check out this survey about motherhood and career choices given to freshman and senior women at Yale. If scientific and social-scientific journals had fashion spreads, this would be the “don’t” picture.)
ETA: For some reason, Blogger doesn't want you to see that last link. Here's the URL:
Another URL or two:: There were Blogspot server problems the night this went up, so apparently the links are haunted. Here's where to find Dr. Ioannidis' article:
Here's the URL for the Public Library of Science - Medicine:
Technorati tags: research bias, scientific knowledge, statistics