Empirical evidence for low reproducibility indicates low pre-study odds

Button, Katherine S.; Ioannidis, John P. A.; Mokrysz, Claire; Nosek, Brian A.; Flint, Jonathan; Robinson, Emma S. J.; Munafò, Marcus R.

doi:10.1038/nrn3475-c6

Download PDF

Reply
Published: 23 October 2013

Empirical evidence for low reproducibility indicates low pre-study odds

Katherine S. Button^1,2,
John P. A. Ioannidis³,
Claire Mokrysz¹,
Brian A. Nosek⁴,
Jonathan Flint⁵,
Emma S. J. Robinson⁶ &
…
Marcus R. Munafò¹

Nature Reviews Neuroscience volume 14, page 877 (2013)Cite this article

2958 Accesses
16 Citations
3 Altmetric
Metrics details

Subjects

In response to our Analysis article (Power failure: why small sample size undermines the reliability of neuroscience. Nature Rev. Neurosci. 14, 365–376 (2013))¹, Hoppe (A test is not a test. Nature Rev. Neurosci.http://dx.doi.org/10.1038/nrn3475-c5 (2013))² correctly points out that the figures in our article cover only pre-study odds of an effect (R) up to 1 (that is, up to 50% prior prevalence of a non-null effect). In principle, of course, R can be higher — the prior prevalence of a non-null effect can vary from 0 to 100%. However, in the large majority of research studies, it will be 50% or lower.

When the prior probability of an effect is very high, such as that required to justify a large, confirmatory clinical trial, it will still only approach 50% (R = 1). Empirically, Djulbegovic et al.³ have recently shown this to be the case: only slightly more than 50% of Phase 3 clinical trials show superiority of the intervention arm over the comparator arm. As large-scale clinical trials are arguably the end stage of a research pipeline that begins in the basic sciences, such trials should represent the case when R (on average) can be expected to achieve the highest values. As R increases above 1 (that is, prior prevalence >50%), the incremental value of further research decreases; when it is very high (for example, prior prevalence >90%), further research is probably not necessary, because there is already high confidence in the outcome. Most neuroscience research is far removed from this situation, as most neuroscience involves testing exploratory hypotheses and addressing measurements with high complexity and extreme multiplicity. In other words, there are many variables that can be explored for associations and effects, often with little prior insight regarding which of them may be important.

Although the true value of R will vary somewhat from one field to another, empirically we know that it is likely to be low in many (or even most) fields. Recent attempts to replicate key findings from the biomedical science literature have indicated that the proportion of studies that replicate effects is usually less than 50%, and even this may be optimistic^4,5,6. If one assumes that the true prevalence of effects is approaching 100%, then one has to also assume that these ubiquitous effects are very small, otherwise they should be replicated routinely. However, when effect sizes are tiny, even very large studies will generate substantial type S errors⁷ (that is, many statistically significant effects will be in the opposite direction of the true effect). In this situation, the credibility of individual, meticulously performed, extremely large studies would still be close to 50% at best.

Finally, although the extent of the impact of statistical power on positive predictive value will vary for different values of R, ultimately statistical power is important for the whole range of R. There is growing evidence for the poor reproducibility of reported findings, and there is therefore a need for greater focus on possible reasons for this problem and on the identification of solutions. In our opinion, low statistical power is an important part of this equation.

References

Button, K. S. et al. Power failure: why small sample size undermines the reliability of neuroscience. Nature Rev. Neurosci. 14, 365–376 (2013).
Article CAS Google Scholar
Hoppe, C. A test is not a test. Nature Rev. Neurosci. http://dx.doi.org/10.1038/nrn3475-c5 (2013).
Djulbegovic, B., Kumar, A., Glasziou, P., Miladinovic, B. & Chalmers, I. Medical research: trial unpredictability yields predictable therapy gains. Nature 500, 395–396 (2013).
Article CAS Google Scholar
Begley, C. G. & Ellis, L. M. Drug development: raise standards for preclinical cancer research. Nature 483, 531–533 (2012).
Article CAS Google Scholar
Mobley, A., Linder, S. K., Braeuer, R., Ellis, L. M. & Zwelling, L. A survey on data reproducibility in cancer research provides insights into our limited ability to translate findings from the laboratory to the clinic. PLoS ONE 8, e63221 (2013).
Article CAS Google Scholar
Prinz, F., Schlange, T. & Asadullah, K. Believe it or not: how much can we rely on published data on potential drug targets? Nature Rev. Drug Discov. 10, 712 (2011).
Article CAS Google Scholar
Gelman, A. & Tuerlinckx, F. Type S error rates for classical and Bayesian single and multiple comparison procedures. Comput. Stat. 15, 373–390 (2000).
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Experimental Psychology, University of Bristol, BS8 1TU, UK
Katherine S. Button, Claire Mokrysz & Marcus R. Munafò
School of Social and Community Medicine, University of Bristol, BS8 2BN, UK
Katherine S. Button
Stanford University School of Medicine, 94305, California, USA
John P. A. Ioannidis
Department of Psychology, University of Virginia, Charlottesville, 22903, Virginia, USA
Brian A. Nosek
Wellcome Trust Centre for Human Genetics, University of Oxford, OX 7BN, UK
Jonathan Flint
School of Physiology and Pharmacology, University of Bristol, BS8 1TD, UK
Emma S. J. Robinson

Authors

Katherine S. Button
View author publications
You can also search for this author in PubMed Google Scholar
John P. A. Ioannidis
View author publications
You can also search for this author in PubMed Google Scholar
Claire Mokrysz
View author publications
You can also search for this author in PubMed Google Scholar
Brian A. Nosek
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Flint
View author publications
You can also search for this author in PubMed Google Scholar
Emma S. J. Robinson
View author publications
You can also search for this author in PubMed Google Scholar
Marcus R. Munafò
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcus R. Munafò.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Button, K., Ioannidis, J., Mokrysz, C. et al. Empirical evidence for low reproducibility indicates low pre-study odds. Nat Rev Neurosci 14, 877 (2013). https://doi.org/10.1038/nrn3475-c6

Download citation

Published: 23 October 2013
Issue Date: December 2013
DOI: https://doi.org/10.1038/nrn3475-c6

This article is cited by

The Tangled Knots of Neuroscientific Experimentation
- Stefan Frisch
Integrative Psychological and Behavioral Science (2022)
Potential Reporting Bias in Neuroimaging Studies of Sex Differences
- Sean P. David
- Florian Naudet
- John P. A. Ioannidis
Scientific Reports (2018)
Resting-state test–retest reliability of a priori defined canonical networks over different preprocessing steps
- Deepthi P. Varikuti
- Felix Hoffstaedter
- Simon B. Eickhoff
Brain Structure and Function (2017)

Empirical evidence for low reproducibility indicates low pre-study odds

Subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Rights and permissions

About this article

Cite this article

This article is cited by

The Tangled Knots of Neuroscientific Experimentation

Potential Reporting Bias in Neuroimaging Studies of Sex Differences

Resting-state test–retest reliability of a priori defined canonical networks over different preprocessing steps

Search

Quick links

Subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

The Tangled Knots of Neuroscientific Experimentation

Potential Reporting Bias in Neuroimaging Studies of Sex Differences

Resting-state test–retest reliability of a priori defined canonical networks over different preprocessing steps

Search

Quick links