14 Apr

The problems of low statistical power and publication bias

Lately, I have been a mood of scientific scepticism: I have the feeling that the worldwide academic system is more and more failing to produce useful research. Christophe Lalanne’s twitter feed lead me to an interesting article in a non-mainstream journal: A farewell to Bonferroni: the problems of low statistical power and publication bias, by Shinichi Nakagawa.

Each study performed has a probability of being wrong. Thus performing many studies will lead to some wrong conclusions by chance. This is known in statistics as the multiple comparisons problem. When a working hypothesis is not verified empirically in a study, this null finding is seldom reported, leading to what is called publication bias: discoveries are further studied; negative results are usually ignored (Y. Benjamini). Because only discoveries, called detections in statistical terms, are reported, published results contain more false detections than the individual experiments and very little false negatives. Arguably, the original investigators have corrected using the understanding that they gained the experiments performed and account in a post-hoc analysis for the fact that some of their working hypothesis could not have been correct. Such a correction can work only in a field where there is a good mechanistic understanding, or models, such as physics, but in my opinion not in life and social sciences.

Let me quote some relevant extracts of the article, as you may never have access to it thanks to the way scientific publishing works:

Recently, Jennions and Moller (2003) carried out a meta-analysis on statistical power in the field of behavioral ecology and animal behavior, reviewing 10 leading journals including Behavioral Ecology. Their results showed dismayingly low average statistical power (note that a meta-analytic review of statistical power is different from post hoc power analysis as criticized in Hoenig and Heisey, 2001). The statistical power of a null hypothesis (Ho) significance test is the probability that the test will reject Ho when a research hypothesis (Ha) is true.

The meta-analysis on statistical power by Jennions and Moller (2003) revealed that, in the field of behavioral ecology and animal behavior, statistical power of less than 20% to detect a small effect and power of less than 50% to detect a medium effect existed. This means, for example, that the average behavioral scientist performing a statistical test has a greater probability of making a Type II error (or beta) (i.e., not rejecting Ho when Ho is false; note that statistical power is equals to 1 - beta) than if they had flipped a coin, when an experiment effect is of medium size.

Imagine that we conduct a study where we measure as many relevant variables as possible, 10 variables, for example. We find only two variables statistically significant. Then, what should we do? We could decide to write a paper highlighting these two variables (and not reporting the other eight at all) as if we had hypotheses about the two significant variables in the first place. Subsequently, our paper would be published. Alternatively, we could write a paper including all 10 variables. When the paper is reviewed, referees might tell us that there were no significant results if we had “appropriately” employed Bonferroni corrections, so that our study would not be advisable for publication. However, the latter paper is scientifically more important than the former paper. For example, if one wants to conduct a meta-analysis to investigate an overall effect in a specific area of study, the latter paper is five times more informative than the former paper. In the long term, statistical significance of particular tests may be of trivial importance (if not always), although, in the short term, it makes papers publishable. Bonferroni procedures may, in part, be preventing the accumulation of knowledge in the field of behavioral ecology and animal behavior, thus hindering the progress of the field as science.

Some of the concerns raised here are partly a criticism of Bonferoni corrections, i.e. in technical terms correcting for family-wise error rate (FWER). It is actually the message that the author wants to convey in his paper. Proponents of controling for false discovery rate (FDR) argue that an investigator shouldn’t be penalized for asking more questions, and the fraction of errors in the answers should be controlled, rather than the absolute value. That said, FDR, while useful, does not answer the problems of publication bias.

5 Responses to “The problems of low statistical power and publication bias”

  1. Walter Reade Says:

    This cartoon nails much of your point:

    http://xkcd.com/882/

  2. gael Says:

    Yes, indeed. Without the technicalities on overall control on error rate, and FDR.

  3. Peter Hanley Says:

    Whew! For a minute there I thought you were questioning the validity of my research on Dungeons & Dragons as statistical play that evolved as a grassroots response to what Foucault described as ‘biopolitics’ — but as there were no studies performed for my research, nor statistics gathered, I’m sure we can all agree my conclusions were sound.

  4. Alex Says:

    A couple more articles you might find interesting:

    Why Most Published Research Findings Are False

    The Heart of Research is Sick. (This gets at the core problem, in my opinion. Statistics is being used as a Shibboleth to justify findings, rather than as a tool for genuine critical inquiry.)

  5. Josef Says:

    I think many or most papers are just drops of knowledge and the interpretation is up to the reader. (Ok there is a tiny bit of evidence given what I assume you did.)

    For papers or research that report more important results, it usually triggers a follow up research to either confirm the effect or show that it doesn’t exist or is not important. One I recently read:

    http://www.genomesunzipped.org/2012/03/questioning-the-evidence-for-non-canonical-rna-editing-in-humans.php

    In economics there are many controversial issues, some get resolved, for others the community loses interest because it is not possible to get clear evidence either way or, if the effect exists, it’s small, for some the debate gives work for many generations of economists and “Schools” of thought.

    So scientific progress occurs even if 80% of the papers are only read by the author, the editor and the reviewers. (no hard number)

Leave a Reply

111111