Lately, I have been a mood of scientific scepticism: I have the feeling that the worldwide academic system is more and more failing to produce useful research. Christophe Lalanne’s twitter feed lead me to an interesting article in a non-mainstream journal: A farewell to Bonferroni: the problems of low statistical power and publication bias, by Shinichi Nakagawa.

Each study performed has a probability of being wrong. Thus performing
many studies will lead to some wrong conclusions by chance. This is
known in statistics as the multiple comparisons problem. When a
working hypothesis is not verified empirically in a study, this null
finding is seldom reported, leading to what is called *publication
bias*: **discoveries are further studied; negative results are usually
ignored** (Y. Benjamini). Because only *discoveries*, called
*detections* in statistical terms, are reported, **published results
contain more false detections than the individual experiments and very
little false negatives**. Arguably, the original investigators have
corrected using the understanding that they gained the experiments
performed and account in a *post-hoc analysis* for the fact that some of
their working hypothesis could not have been correct. Such a correction
can work only in a field where there is a good mechanistic
understanding, or models, such as physics, but in my opinion not in life
and social sciences.

Let me quote some relevant extracts of the article, as you may never have access to it thanks to the way scientific publishing works:

Recently, Jennions and Moller (2003) carried out a meta-analysis on statistical power in the field of behavioral ecology and animal behavior, reviewing 10 leading journals including Behavioral Ecology. Their results showed dismayingly low average statistical power (note that a meta-analytic review of statistical power is different from post hoc power analysis as criticized in Hoenig and Heisey, 2001). The statistical power of a null hypothesis (Ho) significance test is the probability that the test will reject Ho when a research hypothesis (Ha) is true.

…

The meta-analysis on statistical power by Jennions and Moller (2003) revealed that, in the field of behavioral ecology and animal behavior, statistical power of less than 20% to detect a small effect and power of less than 50% to detect a medium effect existed. This means, for example, that the average behavioral scientist performing a statistical test has a greater probability of making a Type II error (or beta) (

i.e., not rejecting Ho when Ho is false; note that statistical power is equals to 1 - beta) than if they had flipped a coin, when an experiment effect is of medium size.…

Imagine that we conduct a study where we measure as many relevant variables as possible, 10 variables, for example. We find only two variables statistically significant. Then, what should we do? We could decide to write a paper highlighting these two variables (and not reporting the other eight at all) as if we had hypotheses about the two significant variables in the first place. Subsequently, our paper would be published. Alternatively, we could write a paper including all 10 variables. When the paper is reviewed, referees might tell us that there were no significant results if we had “appropriately” employed Bonferroni corrections, so that our study would not be advisable for publication. However, the latter paper is scientifically more important than the former paper. For example, if one wants to conduct a meta-analysis to investigate an overall effect in a specific area of study, the latter paper is five times more informative than the former paper. In the long term, statistical significance of particular tests may be of trivial importance (if not always), although, in the short term, it makes papers publishable. Bonferroni procedures may, in part, be preventing the accumulation of knowledge in the field of behavioral ecology and animal behavior, thus hindering the progress of the field as science.

Some of the concerns raised here are partly a criticism of Bonferoni
corrections, *i.e.* in technical terms correcting for family-wise error
rate (FWER). It is actually the message that the author wants to
convey in his paper. Proponents of controling for false discovery rate
(FDR) argue that an investigator shouldn’t be penalized for asking
more questions, and the fraction of errors in the answers should be
controlled, rather than the absolute value. That said, FDR, while
useful, does not answer the problems of publication bias.