References

Abelson, P. (2003). The Value of Life and Health for Public Policy. Economic Record, 79, S2–S13. https://doi.org/10.1111/1475-4932.00087
Aberson, C. L. (2019). Applied Power Analysis for the Behavioral Sciences (2nd ed.). Routledge.
Aert, R. C. M. van, & Assen, M. A. L. M. van. (2018). Correcting for Publication Bias in a Meta-Analysis with the P-uniform* Method. MetaArXiv. https://doi.org/10.31222/osf.io/zqjr9
Agnoli, F., Wicherts, J. M., Veldkamp, C. L. S., Albiero, P., & Cubelli, R. (2017). Questionable research practices among italian research psychologists. PLOS ONE, 12(3), e0172792. https://doi.org/10.1371/journal.pone.0172792
Akker, O. van den, Bakker, M., Assen, M. A. L. M. van, Pennington, C. R., Verweij, L., Elsherif, M., Claesen, A., Gaillard, S. D. M., Yeung, S. K., Frankenberger, J.-L., Krautter, K., Cockcroft, J. P., Kreuer, K. S., Evans, T. R., Heppel, F., Schoch, S. F., Korbmacher, M., Yamada, Y., Albayrak-Aydemir, N., … Wicherts, J. (2023). The effectiveness of preregistration in psychology: Assessing preregistration strictness and preregistration-study consistency. MetaArXiv. https://doi.org/10.31222/osf.io/h8xjw
Albers, C. J., Kiers, H. A. L., & Ravenzwaaij, D. van. (2018). Credible Confidence: A Pragmatic View on the Frequentist vs Bayesian Debate. Collabra: Psychology, 4(1), 31. https://doi.org/10.1525/collabra.149
Albers, C. J., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of Experimental Social Psychology, 74, 187–195. https://doi.org/10.1016/j.jesp.2017.09.004
Aldrich, J. (1997). R.A. Fisher and the making of maximum likelihood 1912-1922. Statistical Science, 12(3), 162–176. https://doi.org/10.1214/ss/1030037906
Allison, D. B., Allison, R. L., Faith, M. S., Paultre, F., & Pi-Sunyer, F. X. (1997). Power and money: Designing statistically powerful studies while minimizing financial costs. Psychological Methods, 2(1), 20–33. https://doi.org/10.1037/1082-989X.2.1.20
Altman, D. G., & Bland, J. M. (1995). Statistics notes: Absence of evidence is not evidence of absence. BMJ, 311(7003), 485. https://doi.org/10.1136/bmj.311.7003.485
Altoè, G., Bertoldo, G., Zandonella Callegher, C., Toffalini, E., Calcagnì, A., Finos, L., & Pastore, M. (2020). Enhancing Statistical Inference in Psychological Research via Prospective and Retrospective Design Analysis. Frontiers in Psychology, 10.
Anderson, M. S., Martinson, B. C., & De Vries, R. (2007). Normative dissonance in science: Results from a national survey of US scientists. Journal of Empirical Research on Human Research Ethics, 2(4), 3–14.
Anderson, M. S., Ronning, E. A., De Vries, R., & Martinson, B. C. (2007). The perverse effects of competition on scientists’ work and relationships. Science and Engineering Ethics, 13(4), 437–461.
Anderson, S. F., Kelley, K., & Maxwell, S. E. (2017). Sample-size planning for more accurate statistical power: A method adjusting sample effect sizes for publication bias and uncertainty. Psychological Science, 28(11), 1547–1562. https://doi.org/10.1177/0956797617723724
Anderson, S. F., & Maxwell, S. E. (2016). There’s more than one way to conduct a replication study: Beyond statistical significance. Psychological Methods, 21(1), 1–12. https://doi.org/10.1037/met0000051
Anvari, F., Kievit, R., Lakens, D., Pennington, C. R., Przybylski, A. K., Tiokhin, L., Wiernik, B. M., & Orben, A. (2021). Not all effects are indispensable: Psychological science requires verifiable lines of reasoning for whether an effect matters. Perspectives on Psychological Science. https://doi.org/10.31234/osf.io/g3vtr
Anvari, F., & Lakens, D. (2018). The replicability crisis and public trust in psychological science. Comprehensive Results in Social Psychology, 3(3), 266–286. https://doi.org/10.1080/23743603.2019.1684822
Anvari, F., & Lakens, D. (2021). Using anchor-based methods to determine the smallest effect size of interest. Journal of Experimental Social Psychology, 96, 104159. https://doi.org/10.1016/j.jesp.2021.104159
Appelbaum, M., Cooper, H., Kline, R. B., Mayo-Wilson, E., Nezu, A. M., & Rao, S. M. (2018). Journal article reporting standards for quantitative research in psychology: The APA Publications and Communications Board task force report. American Psychologist, 73(1), 3. https://doi.org/10.1037/amp0000191
Armitage, P., McPherson, C. K., & Rowe, B. C. (1969). Repeated significance tests on accumulating data. Journal of the Royal Statistical Society: Series A (General), 132(2), 235–244.
Arslan, R. C. (2019). How to Automatically Document Data With the codebook Package to Facilitate Data Reuse. Advances in Methods and Practices in Psychological Science, 2515245919838783. https://doi.org/10.1177/2515245919838783
Azrin, N. H., Holz, W., Ulrich, R., & Goldiamond, I. (1961). The control of the content of conversation through reinforcement. Journal of the Experimental Analysis of Behavior, 4, 25–30. https://doi.org/10.1901/jeab.1961.4-25
Babbage, C. (1830). Reflections on the Decline of Science in England: And on Some of Its Causes. B. Fellowes.
Bacchetti, P. (2010). Current sample size conventions: Flaws, harms, and alternatives. BMC Medicine, 8(1), 17. https://doi.org/10.1186/1741-7015-8-17
Baguley, T. (2004). Understanding statistical power in the context of applied research. Applied Ergonomics, 35(2), 73–80. https://doi.org/10.1016/j.apergo.2004.01.002
Baguley, T. (2009). Standardized or simple effect size: What should be reported? British Journal of Psychology, 100(3), 603–617. https://doi.org/10.1348/000712608X377117
Baguley, T. (2012). Serious stats: A guide to advanced statistics for the behavioral sciences. Palgrave Macmillan.
Bakan, D. (1966). The test of significance in psychological research. Psychological Bulletin, 66(6), 423–437. https://doi.org/10.1037/h0020412
Bakan, D. (1967). On method: Toward a reconstruction of psychological investigation. San Francisco, Jossey-Bass.
Bakker, B. N., Kokil, J., Dörr, T., Fasching, N., & Lelkes, Y. (2021). Questionable and Open Research Practices: Attitudes and Perceptions among Quantitative Communication Researchers. Journal of Communication, 71(5), 715–738. https://doi.org/10.1093/joc/jqab031
Ball, K., Berch, D. B., Helmers, K. F., Jobe, J. B., Leveck, M. D., Marsiske, M., Morris, J. N., Rebok, G. W., Smith, D. M., & Tennstedt, S. L. (2002). Effects of cognitive training interventions with older adults: A randomized controlled trial. Jama, 288(18), 2271–2281.
Barber, T. X. (1976). Pitfalls in Human Research: Ten Pivotal Points. Pergamon Press.
Bartoš, F., & Schimmack, U. (2020). Z-Curve.2.0: Estimating Replication Rates and Discovery Rates. https://doi.org/10.31234/osf.io/urgtn
Bauer, P., & Kieser, M. (1996). A unifying approach for confidence intervals and testing of equivalence and difference. Biometrika, 83(4), 934–937.
Bausell, R. B., & Li, Y.-F. (2002). Power Analysis for Experimental Research: A Practical Guide for the Biological, Medical and Social Sciences (1st edition). Cambridge University Press.
Beck, W. S. (1957). Modern Science and the nature of life (First Edition). Harcourt, Brace.
Becker, B. J. (2005). Failsafe N or File-Drawer Number. In Publication Bias in Meta-Analysis (pp. 111–125). John Wiley & Sons, Ltd. https://doi.org/10.1002/0470870168.ch7
Bem, D. J. (2011). Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology, 100(3), 407–425. https://doi.org/10.1037/a0021524
Bem, D. J., Utts, J., & Johnson, W. O. (2011). Must psychologists change the way they analyze their data? Journal of Personality and Social Psychology, 101(4), 716–719. https://doi.org/10.1037/a0024777
Bender, R., & Lange, S. (2001). Adjusting for multiple testingwhen and how? Journal of Clinical Epidemiology, 54(4), 343–349.
Benjamini, Y. (2016). It’s Not the p-values’ Fault. The American Statistician: Supplemental Material to the ASA Statement on P-Values and Statistical Significance, 70, 1–2.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 289–300. https://www.jstor.org/stable/2346101
Ben-Shachar, M. S., Lüdecke, D., & Makowski, D. (2020). Effectsize: Estimation of Effect Size Indices and Standardized Parameters. Journal of Open Source Software, 5(56), 2815. https://doi.org/10.21105/joss.02815
Berger, J. O., & Bayarri, M. J. (2004). The Interplay of Bayesian and Frequentist Analysis. Statistical Science, 19(1), 58–80. https://doi.org/10.1214/088342304000000116
Berkeley, G. (1735). A defence of free-thinking in mathematics, in answer to a pamphlet of Philalethes Cantabrigiensis entitled Geometry No Friend to Infidelity. Also an appendix concerning mr. Walton’s Vindication of the principles of fluxions against the objections contained in The analyst. By the author of The minute philosopher (Vol. 3).
Bird, S. B., & Sivilotti, M. L. A. (2008). Self-plagiarism, recycling fraud, and the intent to mislead. Journal of Medical Toxicology, 4(2), 69–70. https://doi.org/10.1007/BF03160957
Bishop, D. V. M. (2018). Fallibility in Science: Responding to Errors in the Work of Oneself and Others. Advances in Methods and Practices in Psychological Science, 2515245918776632. https://doi.org/10.1177/2515245918776632
Bland, M. (2015). An introduction to medical statistics (Fourth edition). Oxford University Press.
Borenstein, M. (Ed.). (2009). Introduction to meta-analysis. John Wiley & Sons.
Bosco, F. A., Aguinis, H., Singh, K., Field, J. G., & Pierce, C. A. (2015). Correlational effect size benchmarks. The Journal of Applied Psychology, 100(2), 431–449. https://doi.org/10.1037/a0038047
Bozarth, J. D., & Roberts, R. R. (1972). Signifying significant significance. American Psychologist, 27(8), 774.
Bretz, F., Hothorn, T., & Westfall, P. H. (2011). Multiple comparisons using R. CRC Press.
Bross, I. D. (1971). Critical levels, statistical language and scientific inference. In Foundations of statistical inference (pp. 500–513). Holt, Rinehart and Winston.
Brown, G. W. (1983). Errors, Types I and II. American Journal of Diseases of Children, 137(6), 586–591. https://doi.org/10.1001/archpedi.1983.02140320062014
Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM Test: A Simple Technique Detects Numerous Anomalies in the Reporting of Results in Psychology. Social Psychological and Personality Science, 8(4), 363–369. https://doi.org/10.1177/1948550616673876
Brunner, J., & Schimmack, U. (2020). Estimating Population Mean Power Under Conditions of Heterogeneity and Selection for Significance. Meta-Psychology, 4. https://doi.org/10.15626/MP.2018.874
Bryan, C. J., Tipton, E., & Yeager, D. S. (2021). Behavioural science is unlikely to change the world without a heterogeneity revolution. Nature Human Behaviour, 1–10. https://doi.org/10.1038/s41562-021-01143-3
Brysbaert, M. (2019). How many participants do we have to include in properly powered experiments? A tutorial of power analysis with reference tables. Journal of Cognition, 2(1), 16. https://doi.org/10.5334/joc.72
Brysbaert, M., & Stevens, M. (2018). Power Analysis and Effect Size in Mixed Effects Models: A Tutorial. Journal of Cognition, 1(1). https://doi.org/10.5334/joc.10
Buchanan, E. M., Scofield, J., & Valentine, K. D. (2017). MOTE: Effect Size and Confidence Interval Calculator.
Bulus, M., & Dong, N. (2021). Bound Constrained Optimization of Sample Sizes Subject to Monetary Restrictions in Planning Multilevel Randomized Trials and Regression Discontinuity Studies. The Journal of Experimental Education, 89(2), 379–401. https://doi.org/10.1080/00220973.2019.1636197
Burriss, R. P., Troscianko, J., Lovell, P. G., Fulford, A. J. C., Stevens, M., Quigley, R., Payne, J., Saxton, T. K., & Rowland, H. M. (2015). Changes in women’s facial skin color over the ovulatory cycle are not detectable by the human visual system. PLOS ONE, 10(7), e0130093. https://doi.org/10.1371/journal.pone.0130093
Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafò, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365–376. https://doi.org/10.1038/nrn3475
Button, K. S., Kounali, D., Thomas, L., Wiles, N. J., Peters, T. J., Welton, N. J., Ades, A. E., & Lewis, G. (2015). Minimal clinically important difference on the Beck Depression Inventory - II according to the patient’s perspective. Psychological Medicine, 45(15), 3269–3279. https://doi.org/10.1017/S0033291715001270
Caplan, A. L. (2021). How Should We Regard Information Gathered in Nazi Experiments? AMA Journal of Ethics, 23(1), 55–58. https://doi.org/10.1001/amajethics.2021.55
Carter, E. C., & McCullough, M. E. (2014). Publication bias and the limited strength model of self-control: Has the evidence for ego depletion been overestimated? Frontiers in Psychology, 5. https://doi.org/10.3389/fpsyg.2014.00823
Carter, E. C., Schönbrodt, F. D., Gervais, W. M., & Hilgard, J. (2019). Correcting for Bias in Psychology: A Comparison of Meta-Analytic Methods. Advances in Methods and Practices in Psychological Science, 2(2), 115–144. https://doi.org/10.1177/2515245919847196
Cascio, W. F., & Zedeck, S. (1983). Open a New Window in Rational Research Planning: Adjust Alpha to Maximize Statistical Power. Personnel Psychology, 36(3), 517–526. https://doi.org/10.1111/j.1744-6570.1983.tb02233.x
Ceci, S. J., & Bjork, R. A. (2000). Psychological Science in the Public Interest: The Case for Juried Analyses. Psychological Science, 11(3), 177–178. https://doi.org/10.1111/1467-9280.00237
Cevolani, G., Crupi, V., & Festa, R. (2011). Verisimilitude and belief change for conjunctive theories. Erkenntnis, 75(2), 183.
Chalmers, I., & Glasziou, P. (2009). Avoidable waste in the production and reporting of research evidence. The Lancet, 374(9683), 86–89.
Chamberlin, T. C. (1890). The Method of Multiple Working Hypotheses. Science, ns-15(366), 92–96. https://doi.org/10.1126/science.ns-15.366.92
Chambers, C. D., & Tzavella, L. (2022). The past, present and future of Registered Reports. Nature Human Behaviour, 6(1), 29–42. https://doi.org/10.1038/s41562-021-01193-7
Chang, H. (2022). Realism for Realistic People: A New Pragmatist Philosophy of Science. Cambridge University Press. https://doi.org/10.1017/9781108635738
Chang, M. (2016). Adaptive Design Theory and Implementation Using SAS and R (2nd edition). Chapman and Hall/CRC.
Chatziathanasiou, K. (2022). Beware the Lure of Narratives: Hungry Judges Should not Motivate the Use of Artificial Intelligence in Law ({{SSRN Scholarly Paper}} ID 4011603). Social Science Research Network. https://doi.org/10.2139/ssrn.4011603
Chin, J. M., Pickett, J. T., Vazire, S., & Holcombe, A. O. (2021). Questionable Research Practices and Open Science in Quantitative Criminology. Journal of Quantitative Criminology. https://doi.org/10.1007/s10940-021-09525-6
Cho, H.-C., & Abe, S. (2013). Is two-tailed testing for directional research hypotheses tests legitimate? Journal of Business Research, 66(9), 1261–1266. https://doi.org/10.1016/j.jbusres.2012.02.023
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed). L. Erlbaum Associates.
Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45(12), 1304–1312. https://doi.org/10.1037/0003-066X.45.12.1304
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997–1003. https://doi.org/10.1037/0003-066X.49.12.997
Coles, N. A., March, D. S., Marmolejo-Ramos, F., Larsen, J. T., Arinze, N. C., Ndukaihe, I. L. G., Willis, M. L., Foroni, F., Reggev, N., Mokady, A., Forscher, P. S., Hunter, J. F., Kaminski, G., Yüvrük, E., Kapucu, A., Nagy, T., Hajdu, N., Tejada, J., Freitag, R. M. K., … Liuzza, M. T. (2022). A multi-lab test of the facial feedback hypothesis by the Many Smiles Collaboration. Nature Human Behaviour, 6(12), 1731–1742. https://doi.org/10.1038/s41562-022-01458-9
Colling, L. J., Szűcs, D., De Marco, D., Cipora, K., Ulrich, R., Nuerk, H.-C., Soltanlou, M., Bryce, D., Chen, S.-C., Schroeder, P. A., Henare, D. T., Chrystall, C. K., Corballis, P. M., Ansari, D., Goffin, C., Sokolowski, H. M., Hancock, P. J. B., Millen, A. E., Langton, S. R. H., … McShane, B. B. (2020). Registered Replication Report on Fischer, Castel, Dodd, and Pratt (2003). Advances in Methods and Practices in Psychological Science, 3(2), 143–162. https://doi.org/10.1177/2515245920903079
Colquhoun, D. (2019). The False Positive Risk: A Proposal Concerning What to Do About p-Values. The American Statistician, 73(sup1), 192–201. https://doi.org/10.1080/00031305.2018.1529622
Cook, J., Hislop, J., Adewuyi, T., Harrild, K., Altman, D., Ramsay, C., Fraser, C., Buckley, B., Fayers, P., Harvey, I., Briggs, A., Norrie, J., Fergusson, D., Ford, I., & Vale, L. (2014). Assessing methods to specify the target difference for a randomised controlled trial: DELTA (Difference ELicitation in TriAls) review. Health Technology Assessment, 18(28). https://doi.org/10.3310/hta18280
Cook, T. D. (2002). P-Value Adjustment in Sequential Clinical Trials. Biometrics, 58(4), 1005–1011.
Cooper, H. (2020). Reporting quantitative research in psychology: How to meet APA Style Journal Article Reporting Standards (2nd ed.). American Psychological Association. https://doi.org/10.1037/0000178-000
Cooper, H. M., Hedges, L. V., & Valentine, J. C. (Eds.). (2009). The handbook of research synthesis and meta-analysis (2nd ed). Russell Sage Foundation.
Copay, A. G., Subach, B. R., Glassman, S. D., Polly, D. W., & Schuler, T. C. (2007). Understanding the minimum clinically important difference: A review of concepts and methods. The Spine Journal, 7(5), 541–546. https://doi.org/10.1016/j.spinee.2007.01.008
Corneille, O., Havemann, J., Henderson, E. L., IJzerman, H., Hussey, I., Orban de Xivry, J.-J., Jussim, L., Holmes, N. P., Pilacinski, A., Beffara, B., Carroll, H., Outa, N. O., Lush, P., & Lotter, L. D. (2023). Beware “persuasive communication devices” when writing and reading scientific articles. eLife, 12, e88654. https://doi.org/10.7554/eLife.88654
Correll, J., Mellinger, C., McClelland, G. H., & Judd, C. M. (2020). Avoid Cohen’s Small,” Medium,” and Large for Power Analysis. Trends in Cognitive Sciences, 24(3), 200–207. https://doi.org/10.1016/j.tics.2019.12.009
Cousineau, D., & Chiasson, F. (2019). Superb: Computes standard error and confidence interval of means under various designs and sampling schemes [Manual].
Cowles, M., & Davis, C. (1982). On the origins of the. 05 level of statistical significance. American Psychologist, 37(5), 553.
Cox, D. R. (1958). Some Problems Connected with Statistical Inference. Annals of Mathematical Statistics, 29(2), 357–372. https://doi.org/10.1214/aoms/1177706618
Cribbie, R. A., Gruman, J. A., & Arpin-Cribbie, C. A. (2004). Recommendations for applying tests of equivalence. Journal of Clinical Psychology, 60(1), 1–10.
Crusius, J., Gonzalez, M. F., Lange, J., & Cohen-Charash, Y. (2020). Envy: An Adversarial Review and Comparison of Two Competing Views. Emotion Review, 12(1), 3–21. https://doi.org/10.1177/1754073919873131
Crüwell, S., Apthorp, D., Baker, B. J., Colling, L., Elson, M., Geiger, S. J., Lobentanzer, S., Monéger, J., Patterson, A., Schwarzkopf, D. S., Zaneva, M., & Brown, N. J. L. (2023). What’s in a Badge? A Computational Reproducibility Investigation of the Open Data Badge Policy in One Issue of Psychological Science. Psychological Science, 09567976221140828. https://doi.org/10.1177/09567976221140828
Cumming, G. (2008). Replication and p Intervals: p Values Predict the Future Only Vaguely, but Confidence Intervals Do Much Better. Perspectives on Psychological Science, 3(4), 286–300. https://doi.org/10.1111/j.1745-6924.2008.00079.x
Cumming, G. (2013). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. Routledge.
Cumming, G. (2014). The New Statistics: Why and How. Psychological Science, 25(1), 7–29. https://doi.org/10.1177/0956797613504966
Cumming, G., & Calin-Jageman, R. (2016). Introduction to the New Statistics: Estimation, Open Science, and Beyond. Routledge.
Cumming, G., & Maillardet, R. (2006). Confidence intervals and replication: Where will the next mean fall? Psychological Methods, 11(3), 217–227. https://doi.org/10.1037/1082-989X.11.3.217
Danziger, S., Levav, J., & Avnaim-Pesso, L. (2011). Extraneous factors in judicial decisions. Proceedings of the National Academy of Sciences, 108(17), 6889–6892. https://doi.org/10.1073/PNAS.1018033108
de Groot, A. D. (1969). Methodology (Vol. 6). Mouton & Co.
de Heide, R., & Grünwald, P. D. (2017). Why optional stopping is a problem for Bayesians. arXiv:1708.08278 [Math, Stat]. https://arxiv.org/abs/1708.08278
DeBruine, L. M., & Barr, D. J. (2021). Understanding Mixed-Effects Models Through Data Simulation. Advances in Methods and Practices in Psychological Science, 4(1), 2515245920965119. https://doi.org/10.1177/2515245920965119
Delacre, M., Lakens, D., Ley, C., Liu, L., & Leys, C. (2021). Why Hedges’ g*s based on the non-pooled standard deviation should be reported with Welch’s t-test. PsyArXiv. https://doi.org/10.31234/osf.io/tu6mp
Delacre, M., Lakens, D., & Leys, C. (2017). Why Psychologists Should by Default Use Welch’s t-test Instead of Student’s t-test. International Review of Social Psychology, 30(1). https://doi.org/10.5334/irsp.82
Detsky, A. S. (1990). Using cost-effectiveness analysis to improve the efficiency of allocating funds to clinical trials. Statistics in Medicine, 9(1-2), 173–184. https://doi.org/10.1002/sim.4780090124
Dienes, Z. (2008). Understanding psychology as a science: An introduction to scientific and statistical inference. Palgrave Macmillan.
Dienes, Z. (2014). Using Bayes to get the most out of non-significant results. Frontiers in Psychology, 5. https://doi.org/10.3389/fpsyg.2014.00781
Dmitrienko, A., & D’Agostino Sr, R. (2013). Traditional multiplicity adjustment methods in clinical trials. Statistics in Medicine, 32(29), 5172–5218. https://doi.org/10.1002/sim.5990
Dodge, H. F., & Romig, H. G. (1929). A Method of Sampling Inspection. Bell System Technical Journal, 8(4), 613–631. https://doi.org/10.1002/j.1538-7305.1929.tb01240.x
Dongen, N. N. N. van, Doorn, J. B. van, Gronau, Q. F., Ravenzwaaij, D. van, Hoekstra, R., Haucke, M. N., Lakens, D., Hennig, C., Morey, R. D., Homer, S., Gelman, A., Sprenger, J., & Wagenmakers, E.-J. (2019). Multiple Perspectives on Inference for Two Simple Statistical Scenarios. The American Statistician, 73(sup1), 328–339. https://doi.org/10.1080/00031305.2019.1565553
Douglas, H. E. (2009). Science, policy, and the value-free ideal. University of Pittsburgh Press.
Dubin, R. (1969). Theory building. Free Press.
Duhem, P. (1954). The aim and structure of physical theory. Princeton University Press.
Dupont, W. D. (1983). Sequential stopping rules and sequentially adjusted P values: Does one require the other? Controlled Clinical Trials, 4(1), 3–10. https://doi.org/10.1016/S0197-2456(83)80003-8
Duyx, B., Urlings, M. J. E., Swaen, G. M. H., Bouter, L. M., & Zeegers, M. P. (2017). Scientific citations favor positive results: A systematic review and meta-analysis. Journal of Clinical Epidemiology, 88, 92–101. https://doi.org/10.1016/j.jclinepi.2017.06.002
Ebersole, C. R., Atherton, O. E., Belanger, A. L., Skulborstad, H. M., Allen, J. M., Banks, J. B., Baranski, E., Bernstein, M. J., Bonfiglio, D. B. V., Boucher, L., Brown, E. R., Budiman, N. I., Cairo, A. H., Capaldi, C. A., Chartier, C. R., Chung, J. M., Cicero, D. C., Coleman, J. A., Conway, J. G., … Nosek, B. A. (2016). Many Labs 3: Evaluating participant pool quality across the academic semester via replication. Journal of Experimental Social Psychology, 67, 68–82. https://doi.org/10.1016/j.jesp.2015.10.012
Eckermann, S., Karnon, J., & Willan, A. R. (2010). The Value of Value of Information. PharmacoEconomics, 28(9), 699–709. https://doi.org/10.2165/11537370-000000000-00000
Edwards, M. A., & Roy, S. (2017). Academic Research in the 21st Century: Maintaining Scientific Integrity in a Climate of Perverse Incentives and Hypercompetition. Environmental Engineering Science, 34(1), 51–61. https://doi.org/10.1089/ees.2016.0223
Elson, M., Mohseni, M. R., Breuer, J., Scharkow, M., & Quandt, T. (2014). Press CRTT to measure aggressive behavior: The unstandardized use of the competitive reaction time task in aggression research. Psychological Assessment, 26(2), 419–432. https://doi.org/10.1037/a0035569
Erdfelder, E., Faul, F., & Buchner, A. (1996). GPOWER: A general power analysis program. Behavior Research Methods, Instruments, & Computers, 28(1), 1–11. https://doi.org/10.3758/BF03203630
Eysenck, H. J. (1978). An exercise in mega-silliness. American Psychologist, 33(5), 517–517. https://doi.org/10.1037/0003-066X.33.5.517.a
Fanelli, D. (2010). Positive Results Increase Down the Hierarchy of the Sciences. PLoS ONE, 5(4). https://doi.org/10.1371/journal.pone.0010068
Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). GPower 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175–191. https://doi.org/10.3758/BF03193146
Ferguson, C. J. (2014). Comment: Why meta-analyses rarely resolve ideological debates. Emotion Review, 6(3), 251–252.
Ferguson, C. J., & Heene, M. (2012). A vast graveyard of undead theories publication bias and psychological science’s aversion to the null. Perspectives on Psychological Science, 7(6), 555–561.
Ferguson, C. J., & Heene, M. (2021). Providing a lower-bound estimate for psychology’s “crud factor”: The case of aggression. Professional Psychology: Research and Practice, 52(6), 620–626. https://doi.org/http://dx.doi.org/10.1037/pro0000386
Ferguson, C., Marcus, A., & Oransky, I. (2014). Publishing: The peer-review scam. Nature, 515(7528), 480–482. https://doi.org/10.1038/515480a
Ferron, J., & Onghena, P. (1996). The Power of Randomization Tests for Single-Case Phase Designs. The Journal of Experimental Education, 64(3), 231–239. https://doi.org/10.1080/00220973.1996.9943805
Feyerabend, P. (1993). Against method (3rd ed). Verso.
Feynman, R. P. (1974). Cargo cult science. Engineering and Science, 37(7), 10–13.
Fiedler, K. (2004). Tools, toys, truisms, and theories: Some thoughts on the creative cycle of theory formation. Personality and Social Psychology Review, 8(2), 123–131. https://doi.org/10.1207/s15327957pspr0802_5
Fiedler, K., & Schwarz, N. (2016). Questionable Research Practices Revisited. Social Psychological and Personality Science, 7(1), 45–52. https://doi.org/10.1177/1948550615612150
Field, S. A., Tyre, A. J., Jonzén, N., Rhodes, J. R., & Possingham, H. P. (2004). Minimizing the cost of environmental management decisions by optimizing statistical thresholds. Ecology Letters, 7(8), 669–675. https://doi.org/10.1111/j.1461-0248.2004.00625.x
Fisher, Ronald Aylmer. (1935). The design of experiments. Oliver And Boyd; Edinburgh; London.
Fisher, Ronald A. (1936). Has Mendel’s work been rediscovered? Annals of Science, 1(2), 115–137.
Fisher, Ronald A. (1956). Statistical methods and scientific inference: Vol. viii. Hafner Publishing Co.
Fraley, R. C., & Vazire, S. (2014). The N-Pact Factor: Evaluating the Quality of Empirical Journals with Respect to Sample Size and Statistical Power. PLOS ONE, 9(10), e109019. https://doi.org/10.1371/journal.pone.0109019
Francis, G. (2014). The frequency of excess success for articles in Psychological Science. Psychonomic Bulletin & Review, 21(5), 1180–1187. https://doi.org/10.3758/s13423-014-0601-x
Francis, G. (2016). Equivalent statistics and data interpretation. Behavior Research Methods, 1–15. https://doi.org/10.3758/s13428-016-0812-3
Franco, A., Malhotra, N., & Simonovits, G. (2014). Publication bias in the social sciences: Unlocking the file drawer. Science, 345(6203), 1502–1505. https://doi.org/10.1126/SCIENCE.1255484
Frankenhuis, W. E., Panchanathan, K., & Smaldino, P. E. (2022). Strategic ambiguity in the social sciences. Social Psychological Bulletin.
Fraser, H., Parker, T., Nakagawa, S., Barnett, A., & Fidler, F. (2018). Questionable research practices in ecology and evolution. PLOS ONE, 13(7), e0200303. https://doi.org/10.1371/journal.pone.0200303
Freiman, J. A., Chalmers, T. C., Smith, H., & Kuebler, R. R. (1978). The importance of beta, the type II error and sample size in the design and interpretation of the randomized control trial. Survey of 71 "negative" trials. The New England Journal of Medicine, 299(13), 690–694. https://doi.org/10.1056/NEJM197809282991304
Frick, R. W. (1996). The appropriate use of null hypothesis testing. Psychological Methods, 1(4), 379–390. https://doi.org/10.1037/1082-989X.1.4.379
Fricker, R. D., Burke, K., Han, X., & Woodall, W. H. (2019). Assessing the Statistical Analyses Used in Basic and Applied Social Psychology After Their p-Value Ban. The American Statistician, 73(sup1), 374–384. https://doi.org/10.1080/00031305.2018.1537892
Fried, B. J., Boers, M., & Baker, P. R. (1993). A method for achieving consensus on rheumatoid arthritis outcome measures: The OMERACT conference process. The Journal of Rheumatology, 20(3), 548–551.
Friede, T., & Kieser, M. (2006). Sample size recalculation in internal pilot study designs: A review. Biometrical Journal: Journal of Mathematical Methods in Biosciences, 48(4), 537–555. https://doi.org/10.1002/bimj.200510238
Friedlander, F. (1964). Type I and Type II Bias. American Psychologist, 19(3), 198–199. https://doi.org/10.1037/h0038977
Fugard, A. J. B., & Potts, H. W. W. (2015). Supporting thinking on sample sizes for thematic analyses: A quantitative tool. International Journal of Social Research Methodology, 18(6), 669–684. https://doi.org/10.1080/13645579.2015.1005453
Funder, D. C., & Ozer, D. J. (2019). Evaluating effect size in psychological research: Sense and nonsense. Advances in Methods and Practices in Psychological Science, 2(2), 156–168. https://doi.org/10.1177/2515245919847202
Gannon, M. A., de Bragança Pereira, C. A., & Polpo, A. (2019). Blending Bayesian and Classical Tools to Define Optimal Sample-Size-Dependent Significance Levels. The American Statistician, 73(sup1), 213–222. https://doi.org/10.1080/00031305.2018.1518268
Gelman, A., & Carlin, J. (2014). Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors. Perspectives on Psychological Science, 9(6), 641–651.
Gerring, J. (2012). Mere Description. British Journal of Political Science, 42(4), 721–746. https://doi.org/10.1017/S0007123412000130
Gillon, R. (1994). Medical ethics: Four principles plus attention to scope. BMJ, 309(6948), 184. https://doi.org/10.1136/bmj.309.6948.184
Glöckner, A. (2016). The irrational hungry judge effect revisited: Simulations reveal that the magnitude of the effect is overestimated. Judgment and Decision Making, 11(6), 601–610.
Glover, S., & Dixon, P. (2004). Likelihood ratios: A simple and flexible statistic for empirical psychologists. Psychonomic Bulletin & Review, 11(5), 791–806.
Goldacre, B., DeVito, N. J., Heneghan, C., Irving, F., Bacon, S., Fleminger, J., & Curtis, H. (2018). Compliance with requirement to report results on the EU Clinical Trials Register: Cohort study and web resource. BMJ, 362, k3218. https://doi.org/10.1136/bmj.k3218
Good, I. J. (1992). The Bayes/Non-Bayes compromise: A brief review. Journal of the American Statistical Association, 87(419), 597–606. https://doi.org/10.2307/2290192
Goodyear-Smith, F. A., van Driel, M. L., Arroll, B., & Del Mar, C. (2012). Analysis of decisions made in meta-analyses of depression screening and the risk of confirmation bias: A case study. BMC Medical Research Methodology, 12, 76. https://doi.org/10.1186/1471-2288-12-76
Gopalakrishna, G., Riet, G. ter, Vink, G., Stoop, I., Wicherts, J. M., & Bouter, L. M. (2022). Prevalence of questionable research practices, research misconduct and their potential explanatory factors: A survey among academic researchers in The Netherlands. PLOS ONE, 17(2), e0263023. https://doi.org/10.1371/journal.pone.0263023
Gosset, W. S. (1904). The Application of the "Law of Error" to the Work of the Brewery (1 vol 8; pp. 3–16). Arthur Guinness & Son, Ltd.
Green, P., & MacLeod, C. J. (2016). SIMR: An R package for power analysis of generalized linear mixed models by simulation. Methods in Ecology and Evolution, 7(4), 493–498. https://doi.org/10.1111/2041-210X.12504
Green, S. B. (1991). How Many Subjects Does It Take To Do A Regression Analysis. Multivariate Behavioral Research, 26(3), 499–510. https://doi.org/10.1207/s15327906mbr2603_7
Greenwald, A. G. (1975). Consequences of prejudice against the null hypothesis. Psychological Bulletin, 82(1), 1–20.
Grünwald, P., de Heide, R., & Koolen, W. (2019). Safe Testing. arXiv:1906.07801 [Cs, Math, Stat]. https://arxiv.org/abs/1906.07801
Gupta, S. K. (2011). Intention-to-treat concept: A review. Perspectives in Clinical Research, 2(3), 109–112. https://doi.org/10.4103/2229-3485.83221
Hacking, I. (1965). Logic of Statistical Inference. Cambridge University Press.
Hagger, M. S., Chatzisarantis, N. L. D., Alberts, H., Anggono, C. O., Batailler, C., Birt, A. R., Brand, R., Brandt, M. J., Brewer, G., Bruyneel, S., Calvillo, D. P., Campbell, W. K., Cannon, P. R., Carlucci, M., Carruth, N. P., Cheung, T., Crowell, A., De Ridder, D. T. D., Dewitte, S., … Zwienenberg, M. (2016). A Multilab Preregistered Replication of the Ego-Depletion Effect. Perspectives on Psychological Science, 11(4), 546–573. https://doi.org/10.1177/1745691616652873
Hallahan, M., & Rosenthal, R. (1996). Statistical power: Concepts, procedures, and applications. Behaviour Research and Therapy, 34(5), 489–499. https://doi.org/10.1016/0005-7967(95)00082-8
Hallinan, D., Boehm, F., Külpmann, A., & Elson, M. (2023). Information Provision for Informed Consent Procedures in Psychological Research Under the General Data Protection Regulation: A Practical Guide. Advances in Methods and Practices in Psychological Science, 6(1), 25152459231151944. https://doi.org/10.1177/25152459231151944
Halpern, J., Brown Jr, B. W., & Hornberger, J. (2001). The sample size for a clinical trial: A Bayesian decision theoretic approach. Statistics in Medicine, 20(6), 841–858. https://doi.org/10.1002/sim.703
Halpern, S. D., Karlawish, J. H., & Berlin, J. A. (2002). The continuing unethical conduct of underpowered clinical trials. Jama, 288(3), 358–362. https://doi.org/doi:10.1001/jama.288.3.358
Hand, D. J. (1994). Deconstructing Statistical Questions. Journal of the Royal Statistical Society. Series A (Statistics in Society), 157(3), 317–356. https://doi.org/10.2307/2983526
Hardwicke, T. E., Mathur, M. B., MacDonald, K., Nilsonne, G., Banks, G. C., Kidwell, M. C., Mohr, A. H., Clayton, E., Yoon, E. J., Tessler, M. H., Lenne, R. L., Altman, S., Long, B., & Frank, M. C. (2018). Data availability, reusability, and analytic reproducibility: Evaluating the impact of a mandatory open data policy at the journal Cognition. Open Science, 5(8), 180448. https://doi.org/10.1098/rsos.180448
Harms, C., & Lakens, D. (2018). Making ’null effects’ informative: Statistical techniques and inferential frameworks. Journal of Clinical and Translational Research, 3, 382–393. https://doi.org/10.18053/jctres.03.2017S2.007
Harrer, M., Cuijpers, P., Furukawa, T. A., & Ebert, D. D. (2021). Doing Meta-Analysis with R: A Hands-On Guide. Chapman and Hall/CRC. https://doi.org/10.1201/9781003107347
Hauck, D. W. W., & Anderson, S. (1984). A new statistical procedure for testing equivalence in two-group comparative bioavailability trials. Journal of Pharmacokinetics and Biopharmaceutics, 12(1), 83–91. https://doi.org/10.1007/BF01063612
Hedges, L. V., & Pigott, T. D. (2001). The power of statistical tests in meta-analysis. Psychological Methods, 6(3), 203–217. https://doi.org/10.1037/1082-989X.6.3.203
Hempel, C. G. (1966). Philosophy of natural science (Nachdr.). Prentice-Hall.
Hilgard, J. (2021). Maximal positive controls: A method for estimating the largest plausible effect size. Journal of Experimental Social Psychology, 93. https://doi.org/10.1016/j.jesp.2020.104082
Hill, C. J., Bloom, H. S., Black, A. R., & Lipsey, M. W. (2008). Empirical Benchmarks for Interpreting Effect Sizes in Research. Child Development Perspectives, 2(3), 172–177. https://doi.org/10.1111/j.1750-8606.2008.00061.x
Hodges, J. L., & Lehmann, E. L. (1954). Testing the Approximate Validity of Statistical Hypotheses. Journal of the Royal Statistical Society. Series B (Methodological), 16(2), 261–268. https://doi.org/10.1111/j.2517-6161.1954.tb00169.x
Hoenig, J. M., & Heisey, D. M. (2001). The abuse of power: The pervasive fallacy of power calculations for data analysis. The American Statistician, 55(1), 19–24. https://doi.org/10.1198/000313001300339897
Huedo-Medina, T. B., Sánchez-Meca, J., Marín-Martínez, F., & Botella, J. (2006). Assessing heterogeneity in meta-analysis: Q statistic or I$2̂$ index? Psychological Methods, 11(2), 193.
Hung, H. M. J., O’Neill, R. T., Bauer, P., & Kohne, K. (1997). The Behavior of the P-Value When the Alternative Hypothesis is True. Biometrics, 53(1), 11–22. https://doi.org/10.2307/2533093
Hunt, K. (1975). Do we really need more replications? Psychological Reports, 36(2), 587–593.
Hyde, J. S., Lindberg, S. M., Linn, M. C., Ellis, A. B., & Williams, C. C. (2008). Gender Similarities Characterize Math Performance. Science, 321(5888), 494–495. https://doi.org/10.1126/science.1160364
Ioannidis, J. P. A. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2(8), e124. https://doi.org/10.1371/journal.pmed.0020124
Ioannidis, J. P. A., & Trikalinos, T. A. (2007). An exploratory test for an excess of significant findings. Clinical Trials, 4(3), 245–253. https://doi.org/10.1177/1740774507079441
Iyengar, S., & Greenhouse, J. B. (1988). Selection Models and the File Drawer Problem. Statistical Science, 3(1), 109–117. https://www.jstor.org/stable/2245925
Jaeschke, R., Singer, J., & Guyatt, G. H. (1989). Measurement of health status: Ascertaining the minimal clinically important difference. Controlled Clinical Trials, 10(4), 407–415. https://doi.org/10.1016/0197-2456(89)90005-6
Jeffreys, H. (1939). Theory of probability (1st ed). Oxford University Press.
Jennison, C., & Turnbull, B. W. (2000). Group sequential methods with applications to clinical trials. Chapman & Hall/CRC.
Johansson, T. (2011). Hail the impossible: P-values, evidence, and likelihood. Scandinavian Journal of Psychology, 52(2), 113–125. https://doi.org/10.1111/j.1467-9450.2010.00852.x
John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23(5), 524–532. https://doi.org/10.1177/0956797611430953
Johnson, V. E. (2013). Revised standards for statistical evidence. Proceedings of the National Academy of Sciences, 110(48), 19313–19317. https://doi.org/10.1073/pnas.1313476110
Jones, L. V. (1952). Test of hypotheses: One-sided vs. Two-sided alternatives. Psychological Bulletin, 49(1), 43–46. https://doi.org/http://dx.doi.org/10.1037/h0056832
Jostmann, N. B., Lakens, D., & Schubert, T. W. (2009). Weight as an Embodiment of Importance. Psychological Science, 20(9), 1169–1174. https://doi.org/10.1111/j.1467-9280.2009.02426.x
Jostmann, N. B., Lakens, D., & Schubert, T. W. (2016). A short history of the weight-importance effect and a recommendation for pre-testing: Commentary on Ebersole et al. (2016). Journal of Experimental Social Psychology, 67, 93–94. https://doi.org/10.1016/j.jesp.2015.12.001
Julious, S. A. (2004). Sample sizes for clinical trials with normal data. Statistics in Medicine, 23(12), 1921–1986. https://doi.org/10.1002/sim.1783
Junk, T., & Lyons, L. (2020). Reproducibility and Replication of Experimental Particle Physics Results. Harvard Data Science Review, 2(4). https://doi.org/10.1162/99608f92.250f995b
Kaiser, H. F. (1960). Directional statistical decisions. Psychological Review, 67(3), 160–167. https://doi.org/10.1037/h0047595
Kaplan, R. M., & Irvin, V. L. (2015). Likelihood of Null Effects of Large NHLBI Clinical Trials Has Increased over Time. PLOS ONE, 10(8), e0132382. https://doi.org/10.1371/journal.pone.0132382
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773–795. https://doi.org/10.1080/01621459.1995.10476572
Keefe, R. S. E., Kraemer, H. C., Epstein, R. S., Frank, E., Haynes, G., Laughren, T. P., Mcnulty, J., Reed, S. D., Sanchez, J., & Leon, A. C. (2013). Defining a Clinically Meaningful Effect for the Design and Interpretation of Randomized Controlled Trials. Innovations in Clinical Neuroscience, 10(5-6 Suppl A), 4S–19S.
Kelley, K. (2007). Confidence Intervals for Standardized Effect Sizes: Theory, Application, and Implementation. Journal of Statistical Software, 20(8). https://doi.org/10.18637/JSS.V020.I08
Kelley, K., & Preacher, K. J. (2012). On effect size. Psychological Methods, 17(2), 137–152. https://doi.org/10.1037/a0028086
Kelley, K., & Rausch, J. R. (2006). Sample size planning for the standardized mean difference: Accuracy in parameter estimation via narrow confidence intervals. Psychological Methods, 11(4), 363–385. https://doi.org/10.1037
Kelter, R. (2021). Analysis of type I and II error rates of Bayesian and frequentist parametric and nonparametric two-sample hypothesis tests under preliminary assessment of normality. Computational Statistics, 36(2), 1263–1288. https://doi.org/10.1007/s00180-020-01034-7
Kenett, R. S., Shmueli, G., & Kenett, R. (2016). Information Quality: The Potential of Data and Analytics to Generate Knowledge (1st edition). Wiley.
Kennedy-Shaffer, L. (2019). Before p < 0.05 to Beyond p < 0.05: Using History to Contextualize p-Values and Significance Testing. The American Statistician, 73(sup1), 82–90. https://doi.org/10.1080/00031305.2018.1537891
Kenny, D. A., & Judd, C. M. (2019). The unappreciated heterogeneity of effect sizes: Implications for power, precision, planning of research, and replication. Psychological Methods, 24(5), 578–589. https://doi.org/10.1037/met0000209
Keppel, G. (1991). Design and analysis: A researcher’s handbook, 3rd ed (pp. xiii, 594). Prentice-Hall, Inc.
Kerr, N. L. (1998). HARKing: Hypothesizing After the Results are Known. Personality and Social Psychology Review, 2(3), 196–217. https://doi.org/10.1207/s15327957pspr0203_4
King, M. T. (2011). A point of minimal important difference (MID): A critique of terminology and methods. Expert Review of Pharmacoeconomics & Outcomes Research, 11(2), 171–184. https://doi.org/10.1586/erp.11.9
Kish, L. (1959). Some Statistical Problems in Research Design. American Sociological Review, 24(3), 328–338. https://doi.org/10.2307/2089381
Kish, L. (1965). Survey Sampling. Wiley.
Komić, D., Marušić, S. L., & Marušić, A. (2015). Research Integrity and Research Ethics in Professional Codes of Ethics: Survey of Terminology Used by Professional Organizations across Research Disciplines. PLOS ONE, 10(7), e0133662. https://doi.org/10.1371/journal.pone.0133662
Kraft, M. A. (2020). Interpreting effect sizes of education interventions. Educational Researcher, 49(4), 241–253. https://doi.org/10.3102/0013189X20912798
Kruschke, J. K. (2011). Bayesian assessment of null values via parameter estimation and model comparison. Perspectives on Psychological Science, 6(3), 299–312.
Kruschke, J. K. (2013). Bayesian estimation supersedes the t test. Journal of Experimental Psychology: General, 142(2), 573–603. https://doi.org/10.1037/a0029146
Kruschke, J. K. (2014). Doing Bayesian Data Analysis, Second Edition: A Tutorial with R, JAGS, and Stan (2 edition). Academic Press.
Kruschke, J. K. (2018). Rejecting or Accepting Parameter Values in Bayesian Estimation. Advances in Methods and Practices in Psychological Science, 1(2), 270–280. https://doi.org/10.1177/2515245918771304
Kruschke, J. K., & Liddell, T. M. (2017). The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychonomic Bulletin & Review. https://doi.org/10.3758/s13423-016-1221-4
Kuhn, T. S. (1962). The Structure of Scientific Revolutions. University of Chicago Press.
Kuipers, T. A. F. (2016). Models, postulates, and generalized nomic truth approximation. Synthese, 193(10), 3057–3077. https://doi.org/10.1007/s11229-015-0916-9
Lakatos, I. (1978). The methodology of scientific research programmes: Volume 1: Philosophical papers. Cambridge University Press.
Lakens, Daniël. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4. https://doi.org/10.3389/fpsyg.2013.00863
Lakens, Daniël. (2014). Performing high-powered studies efficiently with sequential analyses: Sequential analyses. European Journal of Social Psychology, 44(7), 701–710. https://doi.org/10.1002/ejsp.2023
Lakens, Daniël. (2017). Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses. Social Psychological and Personality Science, 8(4), 355–362. https://doi.org/10.1177/1948550617697177
Lakens, Daniël. (2019). The value of preregistration for psychological science: A conceptual analysis. Japanese Psychological Review, 62(3), 221–230. https://doi.org/10.24602/sjpr.62.3_221
Lakens, Daniël. (2020). Pandemic researchers recruit your own best critics. Nature, 581(7807), 121–121. https://doi.org/10.1038/d41586-020-01392-8
Lakens, Daniël. (2021). The practical alternative to the p value is the correctly used p value. Perspectives on Psychological Science, 16(3), 639–648. https://doi.org/10.1177/1745691620958012
Lakens, Daniël. (2022a). Sample Size Justification. Collabra: Psychology. https://doi.org/10.31234/osf.io/9d3yf
Lakens, Daniël. (2022b). Why P values are not measures of evidence. Trends in Ecology & Evolution, 37(4), 289–290. https://doi.org/10.1016/j.tree.2021.12.006
Lakens, Daniël. (2023). Is my study useless? Why researchers need methodological review boards. Nature, 613(7942), 9–9. https://doi.org/10.1038/d41586-022-04504-8
Lakens, Daniel. (2023). When and How to Deviate from a Preregistration. PsyArXiv. https://doi.org/10.31234/osf.io/ha29k
Lakens, Daniël, Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A. J., Argamon, S. E., Baguley, T., Becker, R. B., Benning, S. D., Bradford, D. E., Buchanan, E. M., Caldwell, A. R., Calster, B., Carlsson, R., Chen, S.-C., Chung, B., Colling, L. J., Collins, G. S., Crook, Z., … Zwaan, R. A. (2018). Justify your alpha. Nature Human Behaviour, 2, 168–171. https://doi.org/10.1038/s41562-018-0311-x
Lakens, Daniël, & Caldwell, A. R. (2021). Simulation-Based Power Analysis for Factorial Analysis of Variance Designs. Advances in Methods and Practices in Psychological Science, 4(1). https://doi.org/10.1177/2515245920951503
Lakens, Daniël, & DeBruine, L. (2020). Improving Transparency, Falsifiability, and Rigour by Making Hypothesis Tests Machine Readable. https://doi.org/10.31234/osf.io/5xcda
Lakens, Daniël, & Etz, A. J. (2017). Too True to be Bad: When Sets of Studies With Significant and Nonsignificant Findings Are Probably True. Social Psychological and Personality Science, 8(8), 875–881. https://doi.org/10.1177/1948550617693058
Lakens, Daniël, Hilgard, J., & Staaks, J. (2016). On the reproducibility of meta-analyses: Six practical recommendations. BMC Psychology, 4, 24. https://doi.org/10.1186/s40359-016-0126-3
Lakens, Daniël, McLatchie, N., Isager, P. M., Scheel, A. M., & Dienes, Z. (2020). Improving Inferences About Null Effects With Bayes Factors and Equivalence Tests. The Journals of Gerontology: Series B, 75(1), 45–57. https://doi.org/10.1093/geronb/gby065
Lakens, Daniël, Scheel, A. M., & Isager, P. M. (2018). Equivalence testing for psychological research: A tutorial. Advances in Methods and Practices in Psychological Science, 1(2), 259–269. https://doi.org/10.1177/2515245918770963
Lan, K. K. G., & DeMets, D. L. (1983). Discrete Sequential Boundaries for Clinical Trials. Biometrika, 70(3), 659. https://doi.org/10.2307/2336502
Langmuir, I., & Hall, R. N. (1989). Pathological Science. Physics Today, 42(10), 36–48. https://doi.org/10.1063/1.881205
Latan, H., Chiappetta Jabbour, C. J., Lopes de Sousa Jabbour, A. B., & Ali, M. (2021). Crossing the Red Line? Empirical Evidence and Useful Recommendations on Questionable Research Practices among Business Scholars. Journal of Business Ethics, 1–21. https://doi.org/10.1007/s10551-021-04961-7
Laudan, L. (1981). Science and Hypothesis. Springer Netherlands. https://doi.org/10.1007/978-94-015-7288-0
Laudan, L. (1986). Science and Values: The Aims of Science and Their Role in Scientific Debate.
Lawrence, J. M., Meyerowitz-Katz, G., Heathers, J. A. J., Brown, N. J. L., & Sheldrick, K. A. (2021). The lesson of ivermectin: Meta-analyses based on summary data alone are inherently unreliable. Nature Medicine, 27(11), 1853–1854. https://doi.org/10.1038/s41591-021-01535-y
Leamer, E. E. (1978). Specification Searches: Ad Hoc Inference with Nonexperimental Data (1 edition). Wiley.
Lehmann, E. L., & Romano, J. P. (2005). Testing statistical hypotheses (3rd ed). Springer.
Lenth, R. V. (2001). Some practical guidelines for effective sample size determination. The American Statistician, 55(3), 187–193. https://doi.org/10.1198/000313001317098149
Lenth, R. V. (2007). Post hoc power: Tables and commentary. Iowa City: Department of Statistics and Actuarial Science, University of Iowa.
Leon, A. C., Davis, L. L., & Kraemer, H. C. (2011). The Role and Interpretation of Pilot Studies in Clinical Research. Journal of Psychiatric Research, 45(5), 626–629. https://doi.org/10.1016/j.jpsychires.2010.10.008
Letrud, K., & Hernes, S. (2019). Affirmative citation bias in scientific myth debunking: A three-in-one case study. PLOS ONE, 14(9), e0222213. https://doi.org/10.1371/journal.pone.0222213
Leung, P. T. M., Macdonald, E. M., Stanbrook, M. B., Dhalla, I. A., & Juurlink, D. N. (2017). A 1980 Letter on the Risk of Opioid Addiction. New England Journal of Medicine, 376(22), 2194–2195. https://doi.org/10.1056/NEJMc1700150
Levine, T. R., Weber, R., Park, H. S., & Hullett, C. R. (2008). A communication researchers’ guide to null hypothesis significance testing and alternatives. Human Communication Research, 34(2), 188–209.
Leys, C., Delacre, M., Mora, Y. L., Lakens, D., & Ley, C. (2019). How to Classify, Detect, and Manage Univariate and Multivariate Outliers, With Emphasis on Pre-Registration. International Review of Social Psychology, 32(1), 5. https://doi.org/10.5334/irsp.289
Linden, A. H., & Hönekopp, J. (2021). Heterogeneity of Research Results: A New Perspective From Which to Assess and Promote Progress in Psychological Science. Perspectives on Psychological Science, 16(2), 358–376. https://doi.org/10.1177/1745691620964193
Lindley, D. V. (1957). A statistical paradox. Biometrika, 44(1/2), 187–192.
Lindsay, D. S. (2015). Replication in Psychological Science. Psychological Science, 26(12), 1827–1832. https://doi.org/10.1177/0956797615616374
Longino, H. E. (1990). Science as Social Knowledge: Values and Objectivity in Scientific Inquiry. Princeton University Press.
Louis, T. A., & Zeger, S. L. (2009). Effective communication of standard errors and confidence intervals. Biostatistics, 10(1), 1–2. https://doi.org/10.1093/biostatistics/kxn014
Lovakov, A., & Agadullina, E. R. (2021). Empirically derived guidelines for effect size interpretation in social psychology. European Journal of Social Psychology, 51(3), 485–504. https://doi.org/10.1002/ejsp.2752
Lubin, A. (1957). Replicability as a publication criterion. American Psychologist, 12, 519–520. https://doi.org/10.1037/h0039746
Luttrell, A., Petty, R. E., & Xu, M. (2017). Replicating and fixing failed replications: The case of need for cognition and argument quality. Journal of Experimental Social Psychology, 69, 178–183. https://doi.org/10.1016/j.jesp.2016.09.006
Lyons, I. M., Nuerk, H.-C., & Ansari, D. (2015). Rethinking the implications of numerical ratio effects for understanding the development of representational precision and numerical processing across formats. Journal of Experimental Psychology: General, 144(5), 1021–1035. https://doi.org/10.1037/xge0000094
MacCoun, R., & Perlmutter, S. (2015). Blind analysis: Hide results to seek the truth. Nature, 526(7572), 187–189. https://doi.org/10.1038/526187a
Mack, R. W. (1951). The Need for Replication Research in Sociology. American Sociological Review, 16(1), 93–94. https://doi.org/10.2307/2087978
Mahoney, M. J. (1979). Psychology of the scientist: An evaluative review. Social Studies of Science, 9(3), 349–375. https://doi.org/10.1177/030631277900900304
Maier, M., & Lakens, D. (2022). Justify your alpha: A primer on two practical approaches. Advances in Methods and Practices in Psychological Science. https://doi.org/10.31234/osf.io/ts4r6
Makel, M. C., Hodges, J., Cook, B. G., & Plucker, J. A. (2021). Both Questionable and Open Research Practices Are Prevalent in Education Research. Educational Researcher, 50(8), 493–504. https://doi.org/10.3102/0013189X211001356
Marshall, B., Cardon, P., Poddar, A., & Fontenot, R. (2013). Does Sample Size Matter in Qualitative Research?: A Review of Qualitative Interviews in is Research. Journal of Computer Information Systems, 54(1), 11–22. https://doi.org/10.1080/08874417.2013.11645667
Maxwell, S. E., & Delaney, H. D. (2004). Designing experiments and analyzing data: A model comparison perspective (2nd ed). Lawrence Erlbaum Associates.
Maxwell, S. E., Delaney, H. D., & Kelley, K. (2017). Designing Experiments and Analyzing Data: A Model Comparison Perspective, Third Edition (3 edition). Routledge.
Maxwell, S. E., & Kelley, K. (2011). Ethics and sample size planning. In Handbook of ethics in quantitative methodology (pp. 179–204). Routledge.
Maxwell, S. E., Kelley, K., & Rausch, J. R. (2008). Sample Size Planning for Statistical Power and Accuracy in Parameter Estimation. Annual Review of Psychology, 59(1), 537–563. https://doi.org/10.1146/annurev.psych.59.103006.093735
Mayo, D. G. (1996). Error and the growth of experimental knowledge. University of Chicago Press.
Mayo, D. G. (2018). Statistical inference as severe testing: How to get beyond the statistics wars. Cambridge University Press.
Mayo, D. G., & Spanos, A. (2011). Error statistics. Philosophy of Statistics, 7, 152–198.
Mazzolari, R., Porcelli, S., Bishop, D. J., & Lakens, D. (2022). Myths and methodologies: The use of equivalence and non-inferiority tests for interventional studies in exercise physiology and sport science. Experimental Physiology, 107(3), 201–212. https://doi.org/10.1113/EP090171
McCarthy, R. J., Skowronski, J. J., Verschuere, B., Meijer, E. H., Jim, A., Hoogesteyn, K., Orthey, R., Acar, O. A., Aczel, B., Bakos, B. E., Barbosa, F., Baskin, E., Bègue, L., Ben-Shakhar, G., Birt, A. R., Blatz, L., Charman, S. D., Claesen, A., Clay, S. L., … Yıldız, E. (2018). Registered Replication Report on Srull and Wyer (1979). Advances in Methods and Practices in Psychological Science, 1(3), 321–336. https://doi.org/10.1177/2515245918777487
McElreath, R. (2016). Statistical Rethinking: A Bayesian Course with Examples in R and Stan (Vol. 122). CRC Press.
McGrath, R. E., & Meyer, G. J. (2006). When effect sizes disagree: The case of r and d. Psychological Methods, 11(4), 386–401. https://doi.org/10.1037/1082-989X.11.4.386
McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological Bulletin, 111(2), 361–365. https://doi.org/10.1037/0033-2909.111.2.361
McGuire, W. J. (2004). A Perspectivist Approach to Theory Construction. Personality and Social Psychology Review, 8(2), 173–182. https://doi.org/10.1207/s15327957pspr0802_11
McIntosh, R. D., & Rittmo, J. Ö. (2021). Power calculations in single-case neuropsychology: A practical primer. Cortex, 135, 146–158. https://doi.org/10.1016/j.cortex.2020.11.005
Meehl, P. E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 103–115. https://www.jstor.org/stable/186099
Meehl, P. E. (1978). Theoretical Risks and Tabular Asterisks: Sir Karl, Sir Ronald, and the Slow Progress of Soft Psychology. Journal of Consulting and Clinical Psychology, 46(4), 806–834. https://doi.org/10.1037/0022-006X.46.4.806
Meehl, P. E. (1990a). Appraising and amending theories: The strategy of Lakatosian defense and two principles that warrant it. Psychological Inquiry, 1(2), 108–141. https://doi.org/10.1207/s15327965pli0102_1
Meehl, P. E. (1990b). Why Summaries of Research on Psychological Theories are Often Uninterpretable: Psychological Reports, 66(1), 195–244. https://doi.org/10.2466/pr0.1990.66.1.195
Meehl, P. E. (2004). Cliometric metatheory III: Peircean consensus, verisimilitude and asymptotic method. The British Journal for the Philosophy of Science, 55(4), 615–643.
Melara, R. D., & Algom, D. (2003). Driven by information: A tectonic theory of Stroop effects. Psychological Review, 110(3), 422–471. https://doi.org/10.1037/0033-295X.110.3.422
Mellers, B., Hertwig, R., & Kahneman, D. (2001). Do frequency representations eliminate conjunction effects? An exercise in adversarial collaboration. Psychological Science, 12(4), 269–275. https://doi.org/10.1111/1467-9280.00350
Merton, R. K. (1942). A Note on Science and Democracy. Journal of Legal and Political Sociology, 1, 115–126.
Meyners, M. (2012). Equivalence tests A review. Food Quality and Preference, 26(2), 231–245. https://doi.org/10.1016/j.foodqual.2012.05.003
Meyvis, T., & Van Osselaer, S. M. J. (2018). Increasing the Power of Your Study by Increasing the Effect Size. Journal of Consumer Research, 44(5), 1157–1173. https://doi.org/10.1093/jcr/ucx110
Millar, R. B. (2011). Maximum likelihood estimation and inference: With examples in R, SAS, and ADMB. Wiley.
Miller, J. (2009). What is the probability of replicating a statistically significant effect? Psychonomic Bulletin & Review, 16(4), 617–640. https://doi.org/10.3758/PBR.16.4.617
Miller, J., & Ulrich, R. (2019). The quest for an optimal alpha. PLOS ONE, 14(1), e0208631. https://doi.org/10.1371/journal.pone.0208631
Mitroff, I. I. (1974). Norms and Counter-Norms in a Select Group of the Apollo Moon Scientists: A Case Study of the Ambivalence of Scientists. American Sociological Review, 39(4), 579–595. https://doi.org/10.2307/2094423
Moe, K. (1984). Should the Nazi Research Data Be Cited? The Hastings Center Report, 14(6), 5–7. https://doi.org/10.2307/3561733
Moran, C., Link to external site, this link will open in a new window, Richard, A., Link to external site, this link will open in a new window, Wilson, K., Twomey, R., Link to external site, this link will open in a new window, Coroiu, A., & Link to external site, this link will open in a new window. (2022). I know it’s bad, but I have been pressured into it: Questionable research practices among psychology students in Canada. Canadian Psychology/Psychologie Canadienne. https://doi.org/10.1037/cap0000326
Morey, R. D. (2020). Power and precision [Blog]. https://medium.com/@richarddmorey/power-and-precision-47f644ddea5e.
Morey, R. D., Hoekstra, R., Rouder, J. N., Lee, M. D., & Wagenmakers, E.-J. (2016). The fallacy of placing confidence in confidence intervals. Psychonomic Bulletin & Review, 23(1), 103–123.
Morey, R. D., Kaschak, M. P., Díez-Álamo, A. M., Glenberg, A. M., Zwaan, R. A., Lakens, D., Ibáñez, A., García, A., Gianelli, C., Jones, J. L., Madden, J., Alifano, F., Bergen, B., Bloxsom, N. G., Bub, D. N., Cai, Z. G., Chartier, C. R., Chatterjee, A., Conwell, E., … Ziv-Crispel, N. (2021). A pre-registered, multi-lab non-replication of the action-sentence compatibility effect (ACE). Psychonomic Bulletin & Review. https://doi.org/10.3758/s13423-021-01927-8
Morris, T. P., White, I. R., & Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods. Statistics in Medicine, 38(11), 2074–2102. https://doi.org/10.1002/sim.8086
Morse, J. M. (1995). The Significance of Saturation. Qualitative Health Research, 5(2), 147–149. https://doi.org/10.1177/104973239500500201
Moscovici, S. (1972). Society and theory in social psychology. In Context of social psychology (pp. 17–81).
Moshontz, H., Campbell, L., Ebersole, C. R., IJzerman, H., Urry, H. L., Forscher, P. S., Grahe, J. E., McCarthy, R. J., Musser, E. D., & Antfolk, J. (2018). The Psychological Science Accelerator: Advancing psychology through a distributed collaborative network. Advances in Methods and Practices in Psychological Science, 1(4), 501–515. https://doi.org/10.1177/2515245918797607
Motyl, M., Demos, A. P., Carsel, T. S., Hanson, B. E., Melton, Z. J., Mueller, A. B., Prims, J. P., Sun, J., Washburn, A. N., Wong, K. M., Yantis, C., & Skitka, L. J. (2017). The state of social and personality science: Rotten to the core, not so bad, getting better, or getting worse? Journal of Personality and Social Psychology, 113, 34–58. https://doi.org/10.1037/pspa0000084
Mrozek, J. R., & Taylor, L. O. (2002). What determines the value of life? A meta-analysis. Journal of Policy Analysis and Management, 21(2), 253–270. https://doi.org/10.1002/pam.10026
Mudge, J. F., Baker, L. F., Edge, C. B., & Houlahan, J. E. (2012). Setting an Optimal α That Minimizes Errors in Null Hypothesis Significance Tests. PLOS ONE, 7(2), e32734. https://doi.org/10.1371/journal.pone.0032734
Mullan, F., & Jacoby, I. (1985). The town meeting for technology: The maturation of consensus conferences. JAMA, 254(8), 1068–1072. https://doi.org/10.1001/jama.1985.03360080080035
Mulligan, A., Hall, L., & Raphael, E. (2013). Peer review in a changing world: An international study measuring the attitudes of researchers. Journal of the American Society for Information Science and Technology, 64(1), 132–161. https://doi.org/10.1002/asi.22798
Murphy, K. R., & Myors, B. (1999). Testing the hypothesis that treatments have negligible effects: Minimum-effect tests in the general linear model. Journal of Applied Psychology, 84(2), 234–248. https://doi.org/10.1037/0021-9010.84.2.234
Murphy, K. R., Myors, B., & Wolach, A. H. (2014). Statistical power analysis: A simple and general model for traditional and modern hypothesis tests (Fourth edition). Routledge, Taylor & Francis Group.
National Academy of Sciences, National Academy of Engineering, & Institute of Medicine. (2009). On being a scientist: A guide to responsible conduct in research: Third edition. The National Academies Press. https://doi.org/10.17226/12192
Neher, A. (1967). Probability Pyramiding, Research Error and the Need for Independent Replication. The Psychological Record, 17(2), 257–262. https://doi.org/10.1007/BF03393713
Nemeth, C., Brown, K., & Rogers, J. (2001). Devil’s advocate versus authentic dissent: Stimulating quantity and quality. European Journal of Social Psychology, 31(6), 707–720. https://doi.org/10.1002/ejsp.58
Neyman, J. (1957). "Inductive Behavior" as a Basic Concept of Philosophy of Science. Revue de l’Institut International de Statistique / Review of the International Statistical Institute, 25(1/3), 7. https://doi.org/10.2307/1401671
Neyman, J., & Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 231(694-706), 289–337. https://doi.org/10.1098/rsta.1933.0009
Nickerson, R. S. (1998). Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology, 2(2), 175–220.
Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5(2), 241–301. https://doi.org/10.1037//1082-989X.5.2.241
Niiniluoto, I. (1998). Verisimilitude: The Third Period. The British Journal for the Philosophy of Science, 49, 1–29.
Niiniluoto, I. (1999). Critical Scientific Realism. Oxford University Press.
Norman, G. R., Sloan, J. A., & Wyrwich, K. W. (2004). The truly remarkable universality of half a standard deviation: Confirmation through another look. Expert Review of Pharmacoeconomics & Outcomes Research, 4(5), 581–585.
Nosek, B. A., & Lakens, D. (2014). Registered reports: A method to increase the credibility of published results. Social Psychology, 45(3), 137–141. https://doi.org/10.1027/1864-9335/a000192
Nuijten, M. B., Hartgerink, C. H. J., van Assen, M. A. L. M., Epskamp, S., & Wicherts, J. M. (2015). The prevalence of statistical reporting errors in psychology (19852013). Behavior Research Methods. https://doi.org/10.3758/s13428-015-0664-2
Nuijten, M. B., & Wicherts, J. (2023). The effectiveness of implementing statcheck in the peer review process to avoid statistical reporting errors. PsyArXiv. https://doi.org/10.31234/osf.io/bxau9
Nunnally, J. (1960). The place of statistics in psychology. Educational and Psychological Measurement, 20(4), 641–650. https://doi.org/10.1177/001316446002000401
O’Donnell, M., Nelson, L. D., Ackermann, E., Aczel, B., Akhtar, A., Aldrovandi, S., Alshaif, N., Andringa, R., Aveyard, M., Babincak, P., Balatekin, N., Baldwin, S. A., Banik, G., Baskin, E., Bell, R., Białobrzeska, O., Birt, A. R., Boot, W. R., Braithwaite, S. R., … Zrubka, M. (2018). Registered Replication Report: Dijksterhuis and van Knippenberg (1998). Perspectives on Psychological Science, 13(2), 268–294. https://doi.org/10.1177/1745691618755704
Obels, P., Lakens, D., Coles, N. A., Gottfried, J., & Green, S. A. (2020). Analysis of Open Data and Computational Reproducibility in Registered Reports in Psychology. Advances in Methods and Practices in Psychological Science, 3(2), 229–237. https://doi.org/10.1177/2515245920918872
Oddie, G. (2013). The content, consequence and likeness approaches to verisimilitude: Compatibility, trivialization, and underdetermination. Synthese, 190(9), 1647–1687. https://doi.org/10.1007/s11229-011-9930-8
Okada, K. (2013). Is Omega Squared Less Biased? A Comparison of Three Major Effect Size Indices in One-Way Anova. Behaviormetrika, 40(2), 129–147. https://doi.org/10.2333/bhmk.40.129
Olejnik, S., & Algina, J. (2003). Generalized Eta and Omega Squared Statistics: Measures of Effect Size for Some Common Research Designs. Psychological Methods, 8(4), 434–447. https://doi.org/10.1037/1082-989X.8.4.434
Olsson-Collentine, A., Wicherts, J. M., & van Assen, M. A. L. M. (2020). Heterogeneity in direct replications in psychology and its association with effect size. Psychological Bulletin, 146(10), 922–940. https://doi.org/10.1037/bul0000294
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716–aac4716. https://doi.org/10.1126/science.aac4716
Orben, A., & Lakens, D. (2020). Crud (Re)Defined. Advances in Methods and Practices in Psychological Science, 3(2), 238–247. https://doi.org/10.1177/2515245920917961
Parker, R. A., & Berman, N. G. (2003). Sample Size. The American Statistician, 57(3), 166–170. https://doi.org/10.1198/0003130031919
Parkhurst, D. F. (2001). Statistical significance tests: Equivalence and reverse tests should reduce misinterpretation. Bioscience, 51(12), 1051–1057. https://doi.org/10.1641/0006-3568(2001)051[1051:SSTEAR]2.0.CO;2
Parsons, S., Kruijt, A.-W., & Fox, E. (2019). Psychological Science Needs a Standard Practice of Reporting the Reliability of Cognitive-Behavioral Measurements. Advances in Methods and Practices in Psychological Science, 2(4), 378–395. https://doi.org/10.1177/2515245919879695
Pawitan, Y. (2001). In all likelihood: Statistical modelling and inference using likelihood. Clarendon Press ; Oxford University Press.
Pemberton, M., Hall, S., Moskovitz, C., & Anson, C. M. (2019). Text recycling: Views of North American journal editors from an interview-based study. Learned Publishing, 32(4), 355–366. https://doi.org/10.1002/leap.1259
Perneger, T. V. (1998). What’s wrong with Bonferroni adjustments. Bmj, 316(7139), 1236–1238.
Perugini, M., Gallucci, M., & Costantini, G. (2014). Safeguard power as a protection against imprecise power estimates. Perspectives on Psychological Science, 9(3), 319–332. https://doi.org/10.1177/1745691614528519
Perugini, M., Gallucci, M., & Costantini, G. (2018). A Practical Primer To Power Analysis for Simple Experimental Designs. International Review of Social Psychology, 31(1), 20. https://doi.org/10.5334/irsp.181
Peters, J. L., Sutton, A. J., Jones, D. R., Abrams, K. R., & Rushton, L. (2007). Performance of the trim and fill method in the presence of publication bias and between-study heterogeneity. Statistics in Medicine, 26(25), 4544–4562. https://doi.org/10.1002/sim.2889
Phillips, B. M., Hunt, J. W., Anderson, B. S., Puckett, H. M., Fairey, R., Wilson, C. J., & Tjeerdema, R. (2001). Statistical significance of sediment toxicity test results: Threshold values derived by the detectable significance approach. Environmental Toxicology and Chemistry, 20(2), 371–373. https://doi.org/10.1002/etc.5620200218
Pickett, J. T., & Roche, S. P. (2017). Questionable, Objectionable or Criminal? Public Opinion on Data Fraud and Selective Reporting in Science. Science and Engineering Ethics, 1–21. https://doi.org/10.1007/s11948-017-9886-2
Platt, J. R. (1964). Strong Inference: Certain systematic methods of scientific thinking may produce much more rapid progress than others. Science, 146(3642), 347–353. https://doi.org/10.1126/science.146.3642.347
Pocock, S. J. (1977). Group sequential methods in the design and analysis of clinical trials. Biometrika, 64(2), 191–199. https://doi.org/10.1093/biomet/64.2.191
Polanin, J. R., Hennessy, E. A., & Tsuji, S. (2020). Transparency and Reproducibility of Meta-Analyses in Psychology: A Meta-Review. Perspectives on Psychological Science, 15(4), 1026–1041. https://doi.org/10.1177/1745691620906416
Popper, K. R. (2002). The logic of scientific discovery. Routledge.
Primbs, M., Pennington, C. R., Lakens, D., Silan, M. A., Lieck, D. S. N., Forscher, P., Buchanan, E. M., & Westwood, S. J. (2022). Are Small Effects the Indispensable Foundation for a Cumulative Psychological Science? A Reply to Götz et al. (2022). Perspectives on Psychological Science. https://doi.org/10.31234/osf.io/6s8bj
Proschan, M. A. (2005). Two-Stage Sample Size Re-Estimation Based on a Nuisance Parameter: A Review. Journal of Biopharmaceutical Statistics, 15(4), 559–574. https://doi.org/10.1081/BIP-200062852
Proschan, M. A., Lan, K. K. G., & Wittes, J. T. (2006). Statistical monitoring of clinical trials: A unified approach. Springer.
Psillos, S. (1999). Scientific realism: How science tracks truth. Routledge.
Quertemont, E. (2011). How to Statistically Show the Absence of an Effect. Psychologica Belgica, 51(2), 109–127. https://doi.org/10.5334/pb-51-2-109
Rabelo, A. L. A., Farias, J. E. M., Sarmet, M. M., Joaquim, T. C. R., Hoersting, R. C., Victorino, L., Modesto, J. G. N., & Pilati, R. (2020). Questionable research practices among Brazilian psychological researchers: Results from a replication study and an international comparison. International Journal of Psychology, 55(4), 674–683. https://doi.org/10.1002/ijop.12632
Radick, G. (2022). Mendel the fraud? A social history of truth in genetics. Studies in History and Philosophy of Science, 93, 39–46. https://doi.org/10.1016/j.shpsa.2021.12.012
Reif, F. (1961). The Competitive World of the Pure Scientist. Science, 134(3494), 1957–1962. https://doi.org/10.1126/science.134.3494.1957
Rice, W. R., & Gaines, S. D. (1994). ’Heads I win, tails you lose’: Testing directional alternative hypotheses in ecological and evolutionary research. Trends in Ecology & Evolution, 9(6), 235–237. https://doi.org/10.1016/0169-5347(94)90258-5
Richard, F. D., Bond, C. F., & Stokes-Zoota, J. J. (2003). One Hundred Years of Social Psychology Quantitatively Described. Review of General Psychology, 7(4), 331–363. https://doi.org/10.1037/1089-2680.7.4.331
Richardson, J. T. E. (2011). Eta squared and partial eta squared as measures of effect size in educational research. Educational Research Review, 6(2), 135–147. https://doi.org/10.1016/j.edurev.2010.12.001
Rijnsoever, F. J. van. (2017). (I Can’t Get No) Saturation: A simulation and guidelines for sample sizes in qualitative research. PLOS ONE, 12(7), e0181689. https://doi.org/10.1371/journal.pone.0181689
Rogers, J. L., Howard, K. I., & Vessey, J. T. (1993). Using significance tests to evaluate equivalence between two experimental groups. Psychological Bulletin, 113(3), 553–565. https://doi.org/http://dx.doi.org/10.1037/0033-2909.113.3.553
Rogers, S. (1992). How a publicity blitz created the myth of subliminal advertising. Public Relations Quarterly, 37(4), 12.
Ropovik, I., Adamkovic, M., & Greger, D. (2021). Neglect of publication bias compromises meta-analyses of educational research. PLOS ONE, 16(6), e0252415. https://doi.org/10.1371/journal.pone.0252415
Rosenthal, R. (1966). Experimenter effects in behavioral research. Appleton-Century-Crofts.
Ross-Hellauer, T., Deppe, A., & Schmidt, B. (2017). Survey on open peer review: Attitudes and experience amongst editors, authors and reviewers. PLOS ONE, 12(12), e0189311. https://doi.org/10.1371/journal.pone.0189311
Rouder, J. N. (2014). Optional stopping: No problem for Bayesians. Psychonomic Bulletin & Review, 21(2), 301–308.
Rouder, J. N., Haaf, J. M., & Snyder, H. K. (2019). Minimizing Mistakes in Psychological Science. Advances in Methods and Practices in Psychological Science, 2(1), 3–11. https://doi.org/10.1177/2515245918801915
Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16(2), 225–237. https://doi.org/10.3758/PBR.16.2.225
Royall, R. (1997). Statistical Evidence: A Likelihood Paradigm. Chapman and Hall/CRC.
Rozeboom, W. W. (1960). The fallacy of the null-hypothesis significance test. Psychological Bulletin, 57(5), 416–428. https://doi.org/10.1037/h0042040
Rücker, G., Schwarzer, G., Carpenter, J. R., & Schumacher, M. (2008). Undue reliance on I(2) in assessing heterogeneity may mislead. BMC Medical Research Methodology, 8, 79. https://doi.org/10.1186/1471-2288-8-79
Sarafoglou, A., Kovacs, M., Bakos, B., Wagenmakers, E.-J., & Aczel, B. (2022). A survey on how preregistration affects the research workflow: Better science but more work. Royal Society Open Science, 9(7), 211997. https://doi.org/10.1098/rsos.211997
Scheel, A. M., Schijen, M. R. M. J., & Lakens, D. (2021). An Excess of Positive Results: Comparing the Standard Psychology Literature With Registered Reports. Advances in Methods and Practices in Psychological Science, 4(2), 25152459211007467. https://doi.org/10.1177/25152459211007467
Scheel, A. M., Tiokhin, L., Isager, P. M., & Lakens, D. (2021). Why Hypothesis Testers Should Spend Less Time Testing Hypotheses. Perspectives on Psychological Science, 16(4), 744–755. https://doi.org/10.1177/1745691620966795
Schimmack, U. (2012). The ironic effect of significant results on the credibility of multiple-study articles. Psychological Methods, 17(4), 551–566. https://doi.org/10.1037/a0029487
Schnuerch, M., & Erdfelder, E. (2020). Controlling decision errors with minimal costs: The sequential probability ratio t test. Psychological Methods, 25(2), 206–226. https://doi.org/10.1037/met0000234
Schoemann, A. M., Boulton, A. J., & Short, S. D. (2017). Determining Power and Sample Size for Simple and Complex Mediation Models. Social Psychological and Personality Science, 8(4), 379–386. https://doi.org/10.1177/1948550617715068
Schoenegger, P., & Pils, R. (2023). Social sciences in crisis: On the proposed elimination of the discussion section. Synthese, 202(2), 54. https://doi.org/10.1007/s11229-023-04267-3
Schönbrodt, F. D., Wagenmakers, E.-J., Zehetleitner, M., & Perugini, M. (2017). Sequential hypothesis testing with Bayes factors: Efficiently testing mean differences. Psychological Methods, 22(2), 322–339. https://doi.org/10.1037/MET0000061
Schuirmann, D. J. (1987). A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics, 15(6), 657–680.
Schulz, K. F., & Grimes, D. A. (2005). Sample size calculations in randomised trials: Mandatory and mystical. The Lancet, 365(9467), 1348–1353. https://doi.org/10.1016/S0140-6736(05)61034-3
Schumi, J., & Wittes, J. T. (2011). Through the looking glass: Understanding non-inferiority. Trials, 12(1), 106. https://doi.org/10.1186/1745-6215-12-106
Schweder, T., & Hjort, N. L. (2016). Confidence, Likelihood, Probability: Statistical Inference with Confidence Distributions. Cambridge University Press. https://doi.org/10.1017/CBO9781139046671
Scull, A. (2023). Rosenhan revisited: Successful scientific fraud. History of Psychiatry, 0957154X221150878. https://doi.org/10.1177/0957154X221150878
Seaman, M. A., & Serlin, R. C. (1998). Equivalence confidence intervals for two-group comparisons of means. Psychological Methods, 3(4), 403–411. https://doi.org/http://dx.doi.org.dianus.libr.tue.nl/10.1037/1082-989X.3.4.403
Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin, 105(2), 309–316. https://doi.org/10.1037/0033-2909.105.2.309
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2001). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin.
Shafer, G. (1976). A mathematical theory of evidence. Princeton University Press.
Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289–310.
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science, 22(11), 1359–1366. https://doi.org/10.1177/0956797611417632
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2013). Life after P-Hacking.
Simonsohn, U. (2015). Small telescopes: Detectability and the evaluation of replication results. Psychological Science, 26(5), 559–569. https://doi.org/10.1177/0956797614567341
Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve: A key to the file-drawer. Journal of Experimental Psychology: General, 143(2), 534.
Smart, R. G. (1964). The importance of negative results in psychological research. Canadian Psychologist / Psychologie Canadienne, 5a(4), 225–232. https://doi.org/10.1037/h0083036
Smithson, M. (2003). Confidence intervals. Sage Publications.
Sotola, L. K. (2022). Garbage In, Garbage Out? Evaluating the Evidentiary Value of Published Meta-analyses Using Z-Curve Analysis. Collabra: Psychology, 8(1), 32571. https://doi.org/10.1525/collabra.32571
Spanos, A. (1999). Probability theory and statistical inference: Econometric modeling with observational data. Cambridge University Press.
Spanos, A. (2013). Who should be afraid of the Jeffreys-Lindley paradox? Philosophy of Science, 80(1), 73–93. https://doi.org/10.1086/668875
Spellman, B. A. (2015). A Short (Personal) Future History of Revolution 2.0. Perspectives on Psychological Science, 10(6), 886–899. https://doi.org/10.1177/1745691615609918
Spiegelhalter, D. (2019). The Art of Statistics: How to Learn from Data (Illustrated edition). Basic Books.
Spiegelhalter, D. J., Freedman, L. S., & Blackburn, P. R. (1986). Monitoring clinical trials: Conditional or predictive power? Controlled Clinical Trials, 7(1), 8–17. https://doi.org/10.1016/0197-2456(86)90003-6
Stanley, T. D., & Doucouliagos, H. (2014). Meta-regression approximations to reduce publication selection bias. Research Synthesis Methods, 5(1), 60–78. https://doi.org/10.1002/jrsm.1095
Stanley, T. D., Doucouliagos, H., & Ioannidis, J. P. A. (2017). Finding the power to reduce publication bias: Finding the power to reduce publication bias. Statistics in Medicine. https://doi.org/10.1002/sim.7228
Steiger, J. H. (2004). Beyond the F Test: Effect Size Confidence Intervals and Tests of Close Fit in the Analysis of Variance and Contrast Analysis. Psychological Methods, 9(2), 164–182. https://doi.org/10.1037/1082-989X.9.2.164
Sterling, T. D. (1959). Publication Decisions and Their Possible Effects on Inferences Drawn from Tests of Significance–Or Vice Versa. Journal of the American Statistical Association, 54(285), 30–34. https://doi.org/10.2307/2282137
Stewart, L. A., & Tierney, J. F. (2002). To IPD or not to IPD?: Advantages and Disadvantages of Systematic Reviews Using Individual Patient Data. Evaluation & the Health Professions, 25(1), 76–97. https://doi.org/10.1177/0163278702025001006
Stodden, V., Seiler, J., & Ma, Z. (2018). An empirical analysis of journal policy effectiveness for computational reproducibility. Proceedings of the National Academy of Sciences, 115(11), 2584–2589. https://doi.org/10.1073/pnas.1708290115
Strand, J. F. (2023). Error tight: Exercises for lab groups to prevent research mistakes. Psychological Methods, No Pagination Specified–No Pagination Specified. https://doi.org/10.1037/met0000547
Stroebe, W., & Strack, F. (2014). The Alleged Crisis and the Illusion of Exact Replication. Perspectives on Psychological Science, 9(1), 59–71. https://doi.org/10.1177/1745691613514450
Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18(6), 643–662.
Swift, J. K., Link to external site, this link will open in a new window, Christopherson, C. D., Link to external site, this link will open in a new window, Bird, M. O., Link to external site, this link will open in a new window, Zöld, A., Link to external site, this link will open in a new window, Goode, J., & Link to external site, this link will open in a new window. (2022). Questionable research practices among faculty and students in APA-accredited clinical and counseling psychology doctoral programs. Training and Education in Professional Psychology, 16(3), 299–305. https://doi.org/10.1037/tep0000322
Taper, M. L., & Lele, S. R. (2011). Philosophy of Statistics. In P. S. Bandyophadhyay & M. R. Forster (Eds.), Evidence, evidence functions, and error probabilities (pp. 513–531). Elsevier, USA.
Taylor, D. J., & Muller, K. E. (1996). Bias in linear model power and sample size calculation due to estimating noncentrality. Communications in Statistics-Theory and Methods, 25(7), 1595–1610. https://doi.org/10.1080/03610929608831787
Teare, M. D., Dimairo, M., Shephard, N., Hayman, A., Whitehead, A., & Walters, S. J. (2014). Sample size requirements to estimate key design parameters from external pilot randomised controlled trials: A simulation study. Trials, 15(1), 264. https://doi.org/10.1186/1745-6215-15-264
Tendeiro, J. N., & Kiers, H. A. L. (2019). A review of issues about null hypothesis Bayesian testing. Psychological Methods. https://doi.org/10.1037/met0000221
Tendeiro, J. N., Kiers, H. A. L., Hoekstra, R., Wong, T. K., & Morey, R. D. (2024). Diagnosing the Misuse of the Bayes Factor in Applied Research. Advances in Methods and Practices in Psychological Science, 7(1), 25152459231213371. https://doi.org/10.1177/25152459231213371
ter Schure, J., & Grünwald, P. D. (2019). Accumulation Bias in Meta-Analysis: The Need to Consider Time in Error Control. arXiv:1905.13494 [Math, Stat]. https://arxiv.org/abs/1905.13494
Terrin, N., Schmid, C. H., Lau, J., & Olkin, I. (2003). Adjusting for publication bias in the presence of heterogeneity. Statistics in Medicine, 22(13), 2113–2126. https://doi.org/10.1002/sim.1461
Thompson, B. (2007). Effect sizes, confidence intervals, and confidence intervals for effect sizes. Psychology in the Schools, 44(5), 423–432. https://doi.org/10.1002/pits.20234
Tversky, A. (1977). Features of similarity. Psychological Review, 84(4), 327–352. https://doi.org/10.1037/0033-295X.84.4.327
Tversky, A., & Kahneman, D. (1971). Belief in the law of small numbers. Psychological Bulletin, 76(2), 105–110. https://doi.org/10.1037/h0031322
Ulrich, R., & Miller, J. (2018). Some properties of p-curves, with an application to gradual publication bias. Psychological Methods, 23(3), 546–560. https://doi.org/10.1037/met0000125
Uygun Tunç, D., & Tunç, M. N. (2022). A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework. Meta-Psychology. https://doi.org/10.31234/osf.io/pdm7y
Uygun Tunç, D., Tunç, M. N., & Lakens, D. (2023). The epistemic and pragmatic function of dichotomous claims based on statistical hypothesis tests. Theory & Psychology, 09593543231160112. https://doi.org/10.1177/09593543231160112
Valentine, J. C., Pigott, T. D., & Rothstein, H. R. (2010). How Many Studies Do You Need?: A Primer on Statistical Power for Meta-Analysis. Journal of Educational and Behavioral Statistics, 35(2), 215–247. https://doi.org/10.3102/1076998609346961
van de Schoot, R., Winter, S. D., Griffioen, E., Grimmelikhuijsen, S., Arts, I., Veen, D., Grandfield, E. M., & Tummers, L. G. (2021). The Use of Questionable Research Practices to Survive in Academia Examined With Expert Elicitation, Prior-Data Conflicts, Bayes Factors for Replication Effects, and the Bayes Truth Serum. Frontiers in Psychology, 12.
van de Schoot, R., Winter, S. D., Ryan, O., Zondervan-Zwijnenburg, M., & Depaoli, S. (2017). A systematic review of Bayesian articles in psychology: The last 25 years. Psychological Methods, 22(2), 217–239. https://doi.org/10.1037/met0000100
Van Fraassen, B. C. (1980). The scientific image. Clarendon Press ; Oxford University Press.
van ’t Veer, A. E., & Giner-Sorolla, R. (2016). Pre-registration in social psychologyA discussion and suggested template. Journal of Experimental Social Psychology, 67, 2–12. https://doi.org/10.1016/j.jesp.2016.03.004
Varkey, B. (2021). Principles of Clinical Ethics and Their Application to Practice. Medical Principles and Practice: International Journal of the Kuwait University, Health Science Centre, 30(1), 17–28. https://doi.org/10.1159/000509119
Vazire, S. (2017). Quality Uncertainty Erodes Trust in Science. Collabra: Psychology, 3(1), 1. https://doi.org/10.1525/collabra.74
Vazire, S., & Holcombe, A. O. (2022). Where Are the Self-Correcting Mechanisms in Science? Review of General Psychology, 26(2), 212–223. https://doi.org/10.1177/10892680211033912
Verschuere, B., Meijer, E. H., Jim, A., Hoogesteyn, K., Orthey, R., McCarthy, R. J., Skowronski, J. J., Acar, O. A., Aczel, B., Bakos, B. E., Barbosa, F., Baskin, E., Bègue, L., Ben-Shakhar, G., Birt, A. R., Blatz, L., Charman, S. D., Claesen, A., Clay, S. L., … Yıldız, E. (2018). Registered Replication Report on Mazar, Amir, and Ariely (2008). Advances in Methods and Practices in Psychological Science, 1(3), 299–317. https://doi.org/10.1177/2515245918781032
Viamonte, S. M., Ball, K. K., & Kilgore, M. (2006). A Cost-Benefit Analysis of Risk-Reduction Strategies Targeted at Older Drivers. Traffic Injury Prevention, 7(4), 352–359. https://doi.org/10.1080/15389580600791362
Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. J Stat Softw, 36(3), 1–48. https://doi.org/http://dx.doi.org/10.18637/jss.v036.i03
Vohs, K. D., Schmeichel, B. J., Lohmann, S., Gronau, Q. F., Finley, A. J., Ainsworth, S. E., Alquist, J. L., Baker, M. D., Brizi, A., Bunyi, A., Butschek, G. J., Campbell, C., Capaldi, J., Cau, C., Chambers, H., Chatzisarantis, N. L. D., Christensen, W. J., Clay, S. L., Curtis, J., … Albarracín, D. (2021). A Multisite Preregistered Paradigmatic Test of the Ego-Depletion Effect. Psychological Science, 32(10), 1566–1581. https://doi.org/10.1177/0956797621989733
Vosgerau, J., Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2019). 99% impossible: A valid, or falsifiable, internal meta-analysis. Journal of Experimental Psychology. General, 148(9), 1628–1639. https://doi.org/10.1037/xge0000663
Vuorre, M., & Curley, J. P. (2018). Curating Research Assets: A Tutorial on the Git Version Control System. Advances in Methods and Practices in Psychological Science, 1(2), 219–236. https://doi.org/10.1177/2515245918754826
Wacholder, S., Chanock, S., Garcia-Closas, M., El ghormli, L., & Rothman, N. (2004). Assessing the Probability That a Positive Report is False: An Approach for Molecular Epidemiology Studies. JNCI Journal of the National Cancer Institute, 96(6), 434–442. https://doi.org/10.1093/jnci/djh075
Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14(5), 779–804. https://doi.org/10.3758/BF03194105
Wagenmakers, E.-J., Beek, T., Dijkhoff, L., Gronau, Q. F., Acosta, A., Adams, R. B., Albohn, D. N., Allard, E. S., Benning, S. D., Blouin-Hudon, E.-M., Bulnes, L. C., Caldwell, T. L., Calin-Jageman, R. J., Capaldi, C. A., Carfagno, N. S., Chasten, K. T., Cleeremans, A., Connell, L., DeCicco, J. M., … Zwaan, R. A. (2016). Registered Replication Report: Strack, Martin, & Stepper (1988). Perspectives on Psychological Science, 11(6), 917–928. https://doi.org/10.1177/1745691616674458
Wagenmakers, E.-J., Wetzels, R., Borsboom, D., & van der Maas, H. L. J. (2011). Why psychologists must change the way they analyze their data: The case of psi: Comment on Bem (2011). Journal of Personality and Social Psychology, 100(3), 426–432. https://doi.org/10.1037/a0022790
Wald, A. (1945). Sequential tests of statistical hypotheses. The Annals of Mathematical Statistics, 16(2), 117–186. https://doi.org/https://www.jstor.org/stable/2240273
Waldron, S., & Allen, C. (2022). Not all pre-registrations are equal. Neuropsychopharmacology, 47(13), 2181–2183. https://doi.org/10.1038/s41386-022-01418-x
Wang, B., Zhou, Z., Wang, H., Tu, X. M., & Feng, C. (2019). The p-value and model specification in statistics. General Psychiatry, 32(3), e100081. https://doi.org/10.1136/gpsych-2019-100081
Wason, P. C. (1960). On the failure to eliminate hypotheses in a conceptual task. Quarterly Journal of Experimental Psychology, 12(3), 129–140. https://doi.org/10.1080/17470216008416717
Wassmer, G., & Brannath, W. (2016). Group Sequential and Confirmatory Adaptive Designs in Clinical Trials. Springer International Publishing. https://doi.org/10.1007/978-3-319-32562-0
Weinshall-Margel, K., & Shapard, J. (2011). Overlooked factors in the analysis of parole decisions. Proceedings of the National Academy of Sciences, 108(42), E833–E833. https://doi.org/10.1073/pnas.1110910108
Wellek, S. (2010). Testing statistical hypotheses of equivalence and noninferiority (2nd ed). CRC Press.
Westberg, M. (1985). Combining Independent Statistical Tests. Journal of the Royal Statistical Society. Series D (The Statistician), 34(3), 287–296. https://doi.org/10.2307/2987655
Westfall, J., Kenny, D. A., & Judd, C. M. (2014). Statistical power and optimal design in experiments in which samples of participants respond to samples of stimuli. Journal of Experimental Psychology: General, 143(5), 2020–2045. https://doi.org/10.1037/xge0000014
Westlake, W. J. (1972). Use of Confidence Intervals in Analysis of Comparative Bioavailability Trials. Journal of Pharmaceutical Sciences, 61(8), 1340–1341. https://doi.org/10.1002/JPS.2600610845
Whitney, S. N. (2016). Balanced Ethics Review. Springer International Publishing. https://doi.org/10.1007/978-3-319-20705-6
Wicherts, J. M. (2011). Psychology must learn a lesson from fraud case. Nature, 480(7375), 7–7. https://doi.org/10.1038/480007a
Wicherts, J. M., Veldkamp, C. L. S., Augusteijn, H. E. M., Bakker, M., Aert, V., M, R. C., Assen, V., & M, M. A. L. (2016). Degrees of Freedom in Planning, Running, Analyzing, and Reporting Psychological Studies: A Checklist to Avoid p-Hacking. Frontiers in Psychology, 7. https://doi.org/10.3389/fpsyg.2016.01832
Wiebels, K., & Moreau, D. (2021). Leveraging Containers for Reproducible Psychological Research. Advances in Methods and Practices in Psychological Science, 4(2), 25152459211017853. https://doi.org/10.1177/25152459211017853
Wigboldus, D. H. J., & Dotsch, R. (2016). Encourage Playing with Data and Discourage Questionable Reporting Practices. Psychometrika, 81(1), 27–32. https://doi.org/10.1007/s11336-015-9445-1
Williams, R. H., Zimmerman, D. W., & Zumbo, B. D. (1995). Impact of Measurement Error on Statistical Power: Review of an Old Paradox. The Journal of Experimental Education, 63(4), 363–370. https://doi.org/10.1080/00220973.1995.9943470
Wilson, E. C. F. (2015). A Practical Guide to Value of Information Analysis. PharmacoEconomics, 33(2), 105–121. https://doi.org/10.1007/s40273-014-0219-x
Wilson VanVoorhis, C. R., & Morgan, B. L. (2007). Understanding power and rules of thumb for determining sample sizes. Tutorials in Quantitative Methods for Psychology, 3(2), 43–50. https://doi.org/10.20982/tqmp.03.2.p043
Winer, B. J. (1962). Statistical principles in experimental design. New York : McGraw-Hill.
Wingen, T., Berkessel, J. B., & Englich, B. (2020). No Replication, No Trust? How Low Replicability Influences Trust in Psychology. Social Psychological and Personality Science, 11(4), 454–463. https://doi.org/10.1177/1948550619877412
Wiseman, R., Watt, C., & Kornbrot, D. (2019). Registered reports: An early example and analysis. PeerJ, 7, e6232. https://doi.org/10.7717/peerj.6232
Wittes, J., & Brittain, E. (1990). The role of internal pilot studies in increasing the efficiency of clinical trials. Statistics in Medicine, 9(1-2), 65–72. https://doi.org/10.1002/sim.4780090113
Wong, T. K., Kiers, H., & Tendeiro, J. (2022). On the Potential Mismatch Between the Function of the Bayes Factor and ResearchersExpectations. Collabra: Psychology, 8(1), 36357. https://doi.org/10.1525/collabra.36357
Wynants, L., Calster, B. V., Collins, G. S., Riley, R. D., Heinze, G., Schuit, E., Bonten, M. M. J., Dahly, D. L., Damen, J. A., Debray, T. P. A., Jong, V. M. T. de, Vos, M. D., Dhiman, P., Haller, M. C., Harhay, M. O., Henckaerts, L., Heus, P., Kammer, M., Kreuzberger, N., … Smeden, M. van. (2020). Prediction models for diagnosis and prognosis of covid-19: Systematic review and critical appraisal. BMJ, 369, m1328. https://doi.org/10.1136/bmj.m1328
Yarkoni, T., & Westfall, J. (2017). Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning. Perspectives on Psychological Science, 12(6), 1100–1122. https://doi.org/10.1177/1745691617693393
Yuan, K.-H., & Maxwell, S. (2005). On the Post Hoc Power in Testing Mean Differences. Journal of Educational and Behavioral Statistics, 30(2), 141–167. https://doi.org/10.3102/10769986030002141
Zabell, S. L. (1992). R. A. Fisher and Fiducial Argument. Statistical Science, 7(3), 369–387. https://doi.org/10.1214/ss/1177011233
Zenko, M. (2015). Red Team: How to Succeed By Thinking Like the Enemy (1st edition). Basic Books.
Zumbo, B. D., & Hubley, A. M. (1998). A note on misconceptions concerning prospective and retrospective power. Journal of the Royal Statistical Society: Series D (The Statistician), 47(2), 385–388. https://doi.org/10.1111/1467-9884.00139