Abelson, P. (2003). The Value of Life and Health for Public Policy. Economic Record, 79, S2–S13.
Aberson, C. L. (2019). Applied Power Analysis for the Behavioral Sciences (Second). Routledge.
Aert, R. C. M. van, & Assen, M. A. L. M. van. (2018). Correcting for Publication Bias in a Meta-Analysis with the P-uniform* Method. MetaArXiv.
Agnoli, F., Wicherts, J. M., Veldkamp, C. L. S., Albiero, P., & Cubelli, R. (2017). Questionable research practices among italian research psychologists. PLOS ONE, 12(3), e0172792.
Albers, C. J., Kiers, H. A. L., & Ravenzwaaij, D. van. (2018). Credible Confidence: A Pragmatic View on the Frequentist vs Bayesian Debate. Collabra: Psychology, 4(1), 31.
Albers, C. J., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of Experimental Social Psychology, 74, 187–195.
Aldrich, J. (1997). R.A. Fisher and the making of maximum likelihood 1912-1922. Statistical Science, 12(3), 162–176.
Allison, D. B., Allison, R. L., Faith, M. S., Paultre, F., & Pi-Sunyer, F. X. (1997). Power and money: Designing statistically powerful studies while minimizing financial costs. Psychological Methods, 2(1), 20–33.
Altman, D. G., & Bland, J. M. (1995). Statistics notes: Absence of evidence is not evidence of absence. BMJ, 311(7003), 485.
Altoè, G., Bertoldo, G., Zandonella Callegher, C., Toffalini, E., Calcagnì, A., Finos, L., & Pastore, M. (2020). Enhancing Statistical Inference in Psychological Research via Prospective and Retrospective Design Analysis. Frontiers in Psychology, 10.
Anderson, K. M. (2014). Group Sequential Design in R. In Clinical Trial Biostatistics and Biopharmaceutical Applications (pp. 179–209). CRC Press.
Anderson, M. S., Ronning, E. A., De Vries, R., & Martinson, B. C. (2007). The perverse effects of competition on scientists’ work and relationships. Science and Engineering Ethics, 13(4), 437–461.
Anderson, S. F., & Kelley, K. (2020). BUCSS: Bias and uncertainty corrected sample size.
Anderson, S. F., Kelley, K., & Maxwell, S. E. (2017). Sample-size planning for more accurate statistical power: A method adjusting sample effect sizes for publication bias and uncertainty. Psychological Science, 28(11), 1547–1562.
Anderson, S. F., & Maxwell, S. E. (2017). Addressing the Replication Crisis: Using Original Studies to Design Replication Studies with Appropriate Statistical Power. Multivariate Behavioral Research, 1–20.
Anderson, S. F., & Maxwell, S. E. (2016). There’s more than one way to conduct a replication study: Beyond statistical significance. Psychological Methods, 21(1), 1–12.
Anvari, F., Kievit, R., Lakens, D., Pennington, C. R., Przybylski, A. K., Tiokhin, L., Wiernik, B. M., & Orben, A. (2021). Not all effects are indispensable: Psychological science requires verifiable lines of reasoning for whether an effect matters. Perspectives on Psychological Science.
Anvari, F., & Lakens, D. (2018). The replicability crisis and public trust in psychological science. Comprehensive Results in Social Psychology, 3(3), 266–286.
Anvari, F., & Lakens, D. (2021). Using anchor-based methods to determine the smallest effect size of interest. Journal of Experimental Social Psychology, 96, 104159.
Appelbaum, M., Cooper, H., Kline, R. B., Mayo-Wilson, E., Nezu, A. M., & Rao, S. M. (2018). Journal article reporting standards for quantitative research in psychology: The APA Publications and Communications Board task force report. American Psychologist, 73(1), 3.
Armitage, P., McPherson, C. K., & Rowe, B. C. (1969). Repeated significance tests on accumulating data. Journal of the Royal Statistical Society: Series A (General), 132(2), 235–244.
Arslan, R. C. (2019). How to Automatically Document Data With the codebook Package to Facilitate Data Reuse. Advances in Methods and Practices in Psychological Science, 2515245919838783.
Auguie, B. (2017). gridExtra: Miscellaneous functions for "grid" graphics.
Babbage, C. (1830). Reflections on the Decline of Science in England: And on Some of Its Causes. B. Fellowes.
Bacchetti, P. (2010). Current sample size conventions: Flaws, harms, and alternatives. BMC Medicine, 8(1), 17.
Baguley, T. (2004). Understanding statistical power in the context of applied research. Applied Ergonomics, 35(2), 73–80.
Baguley, T. (2009). Standardized or simple effect size: What should be reported? British Journal of Psychology, 100(3), 603–617.
Baguley, T. (2012). Serious stats: A guide to advanced statistics for the behavioral sciences. Palgrave Macmillan.
Bakan, D. (1966). The test of significance in psychological research. Psychological Bulletin, 66(6), 423–437.
Bakker, B. N., Kokil, J., Dörr, T., Fasching, N., & Lelkes, Y. (2021). Questionable and Open Research Practices: Attitudes and Perceptions among Quantitative Communication Researchers. Journal of Communication, 71(5), 715–738.
Ball, K., Berch, D. B., Helmers, K. F., Jobe, J. B., Leveck, M. D., Marsiske, M., Morris, J. N., Rebok, G. W., Smith, D. M., & Tennstedt, S. L. (2002). Effects of cognitive training interventions with older adults: A randomized controlled trial. Jama, 288(18), 2271–2281.
Barber, T. X. (1976). Pitfalls in Human Research: Ten Pivotal Points. Pergamon Press.
Bartoš, F., & Schimmack, U. (2020). Z-Curve.2.0: Estimating Replication Rates and Discovery Rates.
Bartoš, F., & Schimmack, U. (2022). Zcurve: An implementation of z-curves.
Bates, D., & Maechler, M. (2021). Matrix: Sparse and dense matrix classes and methods.
Bauer, P., & Kieser, M. (1996). A unifying approach for confidence intervals and testing of equivalence and difference. Biometrika, 83(4), 934–937.
Bausell, R. B., & Li, Y.-F. (2002). Power Analysis for Experimental Research: A Practical Guide for the Biological, Medical and Social Sciences (1st edition). Cambridge University Press.
Becker, B. J. (2005). Failsafe N or File-Drawer Number. In Publication Bias in Meta-Analysis (pp. 111–125). John Wiley & Sons, Ltd.
Bem, D. J. (2011). Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology, 100(3), 407–425.
Bender, R., & Lange, S. (2001). Adjusting for multiple testingwhen and how? Journal of Clinical Epidemiology, 54(4), 343–349.
Benjamini, Y. (2016). It’s Not the p-values’ Fault. The American Statistician: Supplemental Material to the ASA Statement on P-Values and Statistical Significance, 70, 1–2.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 289–300.
Ben-Shachar, M. S., Lüdecke, D., & Makowski, D. (2020a). Effectsize: Estimation of Effect Size Indices and Standardized Parameters. Journal of Open Source Software, 5(56), 2815.
Ben-Shachar, M. S., Lüdecke, D., & Makowski, D. (2020b). effectsize: Estimation of effect size indices and standardized parameters. Journal of Open Source Software, 5(56), 2815.
Berger, J. O., & Bayarri, M. J. (2004). The Interplay of Bayesian and Frequentist Analysis. Statistical Science, 19(1), 58–80.
Berkeley, G. (1735). A defence of free-thinking in mathematics, in answer to a pamphlet of Philalethes Cantabrigiensis entitled Geometry No Friend to Infidelity. Also an appendix concerning mr. Walton’s Vindication of the principles of fluxions against the objections contained in The analyst. By the author of The minute philosopher (Vol. 3).
Bird, S. B., & Sivilotti, M. L. A. (2008). Self-plagiarism, recycling fraud, and the intent to mislead. Journal of Medical Toxicology, 4(2), 69–70.
Bishop, D. V. M. (2018). Fallibility in Science: Responding to Errors in the Work of Oneself and Others. Advances in Methods and Practices in Psychological Science, 2515245918776632.
Bland, M. (2015). An introduction to medical statistics (Fourth edition). Oxford University Press.
Borenstein, M. (Ed.). (2009). Introduction to meta-analysis. John Wiley & Sons.
Bosco, F. A., Aguinis, H., Singh, K., Field, J. G., & Pierce, C. A. (2015). Correlational effect size benchmarks. The Journal of Applied Psychology, 100(2), 431–449.
Bretz, F., Hothorn, T., & Westfall, P. H. (2011). Multiple comparisons using R. CRC Press.
Bross, I. D. (1971). Critical levels, statistical language and scientific inference. In Foundations of statistical inference (pp. 500–513). Holt, Rinehart and Winston.
Brown, G. W. (1983). Errors, Types I and II. American Journal of Diseases of Children, 137(6), 586–591.
Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM Test: A Simple Technique Detects Numerous Anomalies in the Reporting of Results in Psychology. Social Psychological and Personality Science, 8(4), 363–369.
Brunner, J., & Schimmack, U. (2020). Estimating Population Mean Power Under Conditions of Heterogeneity and Selection for Significance. Meta-Psychology, 4.
Bryan, C. J., Tipton, E., & Yeager, D. S. (2021). Behavioural science is unlikely to change the world without a heterogeneity revolution. Nature Human Behaviour, 1–10.
Brysbaert, M. (2019). How many participants do we have to include in properly powered experiments? A tutorial of power analysis with reference tables. Journal of Cognition, 2(1), 16.
Brysbaert, M., & Stevens, M. (2018). Power Analysis and Effect Size in Mixed Effects Models: A Tutorial. Journal of Cognition, 1(1).
Buchanan, E. M., Gillenwaters, A. M., Scofield, J. E., & Valentine, K. D. (2019). MOTE: Effect size and confidence interval calculator.
Buchanan, E. M., Scofield, J., & Valentine, K. D. (2017). MOTE: Effect Size and Confidence Interval Calculator.
Bulus, M., & Dong, N. (2021). Bound Constrained Optimization of Sample Sizes Subject to Monetary Restrictions in Planning Multilevel Randomized Trials and Regression Discontinuity Studies. The Journal of Experimental Education, 89(2), 379–401.
Burriss, R. P., Troscianko, J., Lovell, P. G., Fulford, A. J. C., Stevens, M., Quigley, R., Payne, J., Saxton, T. K., & Rowland, H. M. (2015). Changes in women’s facial skin color over the ovulatory cycle are not detectable by the human visual system. PLOS ONE, 10(7), e0130093.
Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafò, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365–376.
Button, K. S., Kounali, D., Thomas, L., Wiles, N. J., Peters, T. J., Welton, N. J., Ades, A. E., & Lewis, G. (2015). Minimal clinically important difference on the Beck Depression Inventory - II according to the patient’s perspective. Psychological Medicine, 45(15), 3269–3279.
Caldwell, A., & Lakens, D. (2021). Superpower: Simulation-based power analysis for factorial designs.
Caplan, A. L. (2021). How Should We Regard Information Gathered in Nazi Experiments? AMA Journal of Ethics, 23(1), 55–58.
Carter, E. C., & McCullough, M. E. (2014). Publication bias and the limited strength model of self-control: Has the evidence for ego depletion been overestimated? Frontiers in Psychology, 5.
Carter, E. C., Schönbrodt, F. D., Gervais, W. M., & Hilgard, J. (2019). Correcting for Bias in Psychology: A Comparison of Meta-Analytic Methods. Advances in Methods and Practices in Psychological Science, 2(2), 115–144.
Cascio, W. F., & Zedeck, S. (1983). Open a New Window in Rational Research Planning: Adjust Alpha to Maximize Statistical Power. Personnel Psychology, 36(3), 517–526.
Cevolani, G., Crupi, V., & Festa, R. (2011). Verisimilitude and belief change for conjunctive theories. Erkenntnis, 75(2), 183.
Chalmers, I., & Glasziou, P. (2009). Avoidable waste in the production and reporting of research evidence. The Lancet, 374(9683), 86–89.
Chambers, C. D., & Tzavella, L. (2022). The past, present and future of Registered Reports. Nature Human Behaviour, 6(1), 29–42.
Champely, S. (2020). Pwr: Basic functions for power analysis.
Chang, M. (2016). Adaptive Design Theory and Implementation Using SAS and R (2nd edition). Chapman and Hall/CRC.
Chatziathanasiou, K. (2022). Beware the Lure of Narratives: Hungry Judges Should not Motivate the Use of Artificial Intelligence in Law ({{SSRN Scholarly Paper}} ID 4011603). Social Science Research Network.
Chin, J. M., Pickett, J. T., Vazire, S., & Holcombe, A. O. (2021). Questionable Research Practices and Open Science in Quantitative Criminology. Journal of Quantitative Criminology.
Cho, H.-C., & Abe, S. (2013). Is two-tailed testing for directional research hypotheses tests legitimate? Journal of Business Research, 66(9), 1261–1266.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed). L. Erlbaum Associates.
Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45(12), 1304–1312.
Cohen, J. (1994). The earth is round (p \(<\) .05). American Psychologist, 49(12), 997–1003.
Colling, L. J., Szűcs, D., De Marco, D., Cipora, K., Ulrich, R., Nuerk, H.-C., Soltanlou, M., Bryce, D., Chen, S.-C., Schroeder, P. A., Henare, D. T., Chrystall, C. K., Corballis, P. M., Ansari, D., Goffin, C., Sokolowski, H. M., Hancock, P. J. B., Millen, A. E., Langton, S. R. H., … McShane, B. B. (2020). Registered Replication Report on Fischer, Castel, Dodd, and Pratt (2003). Advances in Methods and Practices in Psychological Science, 3(2), 143–162.
Colquhoun, D. (2019). The False Positive Risk: A Proposal Concerning What to Do About p-Values. The American Statistician, 73(sup1), 192–201.
Cook, J., Hislop, J., Adewuyi, T., Harrild, K., Altman, D., Ramsay, C., Fraser, C., Buckley, B., Fayers, P., Harvey, I., Briggs, A., Norrie, J., Fergusson, D., Ford, I., & Vale, L. (2014). Assessing methods to specify the target difference for a randomised controlled trial: DELTA (Difference ELicitation in TriAls) review. Health Technology Assessment, 18(28).
Cook, T. D. (2002). P-Value Adjustment in Sequential Clinical Trials. Biometrics, 58(4), 1005–1011.
Cooper, H. (2020). Reporting quantitative research in psychology: How to meet APA Style Journal Article Reporting Standards (2nd ed.). American Psychological Association.
Cooper, H. M., Hedges, L. V., & Valentine, J. C. (Eds.). (2009). The handbook of research synthesis and meta-analysis (2nd ed). Russell Sage Foundation.
Copay, A. G., Subach, B. R., Glassman, S. D., Polly, D. W., & Schuler, T. C. (2007). Understanding the minimum clinically important difference: A review of concepts and methods. The Spine Journal, 7(5), 541–546.
Correll, J., Mellinger, C., McClelland, G. H., & Judd, C. M. (2020). Avoid Cohen’s Small,” Medium,” and Large for Power Analysis. Trends in Cognitive Sciences, 24(3), 200–207.
Cousineau, D., & Chiasson, F. (2019). Superb: Computes standard error and confidence interval of means under various designs and sampling schemes [Manual].
Cowles, M., & Davis, C. (1982). On the origins of the. 05 level of statistical significance. American Psychologist, 37(5), 553.
Cox, D. R. (1958). Some Problems Connected with Statistical Inference. Annals of Mathematical Statistics, 29(2), 357–372.
Cribbie, R. A., Gruman, J. A., & Arpin-Cribbie, C. A. (2004). Recommendations for applying tests of equivalence. Journal of Clinical Psychology, 60(1), 1–10.
Crüwell, S., Apthorp, D., Baker, B. J., Colling, L., Elson, M., Geiger, S. J., Lobentanzer, S., Monéger, J., Patterson, A., Schwarzkopf, D. S., Zaneva, M., & Brown, N. J. L. (2023). What’s in a Badge? A Computational Reproducibility Investigation of the Open Data Badge Policy in One Issue of Psychological Science. Psychological Science, 09567976221140828.
Cumming, G. (2014). The New Statistics: Why and How. Psychological Science, 25(1), 7–29.
Cumming, G. (2008). Replication and p Intervals: p Values Predict the Future Only Vaguely, but Confidence Intervals Do Much Better. Perspectives on Psychological Science, 3(4), 286–300.
Cumming, G. (2013). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. Routledge.
Cumming, G., & Calin-Jageman, R. (2016). Introduction to the New Statistics: Estimation, Open Science, and Beyond. Routledge.
Cumming, G., & Maillardet, R. (2006). Confidence intervals and replication: Where will the next mean fall? Psychological Methods, 11(3), 217–227.
Danziger, S., Levav, J., & Avnaim-Pesso, L. (2011). Extraneous factors in judicial decisions. Proceedings of the National Academy of Sciences, 108(17), 6889–6892.
de Groot, A. D. (1969). Methodology (Vol. 6). Mouton & Co.
DeBruine, L. M., & Barr, D. J. (2021). Understanding Mixed-Effects Models Through Data Simulation. Advances in Methods and Practices in Psychological Science, 4(1), 2515245920965119.
Delacre, M., Lakens, D., Ley, C., Liu, L., & Leys, C. (2021). Why Hedges’ g*s based on the non-pooled standard deviation should be reported with Welch’s t-test. PsyArXiv.
Delacre, M., Lakens, D., & Leys, C. (2017). Why Psychologists Should by Default Use Welch’s t-test Instead of Student’s t-test. International Review of Social Psychology, 30(1).
Detsky, A. S. (1990). Using cost-effectiveness analysis to improve the efficiency of allocating funds to clinical trials. Statistics in Medicine, 9(1-2), 173–184.
Dienes, Z. (2008). Understanding psychology as a science: An introduction to scientific and statistical inference. Palgrave Macmillan.
Dienes, Z. (2014). Using Bayes to get the most out of non-significant results. Frontiers in Psychology, 5.
Dmitrienko, A., & D’Agostino Sr, R. (2013). Traditional multiplicity adjustment methods in clinical trials. Statistics in Medicine, 32(29), 5172–5218.
Dodge, H. F., & Romig, H. G. (1929). A Method of Sampling Inspection. Bell System Technical Journal, 8(4), 613–631.
Dongen, N. N. N. van, Doorn, J. B. van, Gronau, Q. F., Ravenzwaaij, D. van, Hoekstra, R., Haucke, M. N., Lakens, D., Hennig, C., Morey, R. D., Homer, S., Gelman, A., Sprenger, J., & Wagenmakers, E.-J. (2019). Multiple Perspectives on Inference for Two Simple Statistical Scenarios. The American Statistician, 73(sup1), 328–339.
Dorai-Raj, S. (2014). Binom: Binomial confidence intervals for several parameterizations.
Dubin, R. (1969). Theory building. Free Press.
Duhem, P. (1954). The aim and structure of physical theory. Princeton University Press.
Dunn, O. J. (1961). Multiple Comparisons among Means. Journal of the American Statistical Association, 56(293), 52–64.
Dupont, W. D. (1983). Sequential stopping rules and sequentially adjusted P values: Does one require the other? Controlled Clinical Trials, 4(1), 3–10.
Ebersole, C. R., Atherton, O. E., Belanger, A. L., Skulborstad, H. M., Allen, J. M., Banks, J. B., Baranski, E., Bernstein, M. J., Bonfiglio, D. B. V., Boucher, L., Brown, E. R., Budiman, N. I., Cairo, A. H., Capaldi, C. A., Chartier, C. R., Chung, J. M., Cicero, D. C., Coleman, J. A., Conway, J. G., … Nosek, B. A. (2016). Many Labs 3: Evaluating participant pool quality across the academic semester via replication. Journal of Experimental Social Psychology, 67, 68–82.
Eckermann, S., Karnon, J., & Willan, A. R. (2010). The Value of Value of Information. PharmacoEconomics, 28(9), 699–709.
Edwards, M. A., & Roy, S. (2017). Academic Research in the 21st Century: Maintaining Scientific Integrity in a Climate of Perverse Incentives and Hypercompetition. Environmental Engineering Science, 34(1), 51–61.
Elson, M., Mohseni, M. R., Breuer, J., Scharkow, M., & Quandt, T. (2014). Press CRTT to measure aggressive behavior: The unstandardized use of the competitive reaction time task in aggression research. Psychological Assessment, 26(2), 419–432.
Erdfelder, E., Faul, F., & Buchner, A. (1996). GPOWER: A general power analysis program. Behavior Research Methods, Instruments, & Computers, 28(1), 1–11.
Eysenck, H. J. (1978). An exercise in mega-silliness. American Psychologist, 33(5), 517–517.
Fanelli, D. (2010). Positive Results Increase Down the Hierarchy of the Sciences. PLoS ONE, 5(4).
Fanelli, D. (2009). How Many Scientists Fabricate and Falsify Research? A Systematic Review and Meta-Analysis of Survey Data. PLOS ONE, 4(5), e5738.
Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). GPower 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175–191.
Ferguson, C. J. (2014). Comment: Why meta-analyses rarely resolve ideological debates. Emotion Review, 6(3), 251–252.
Ferguson, C. J., & Heene, M. (2021). Providing a lower-bound estimate for psychology’s “crud factor”: The case of aggression. Professional Psychology: Research and Practice, 52(6), 620–626.
Ferguson, C. J., & Heene, M. (2012). A vast graveyard of undead theories publication bias and psychological science’s aversion to the null. Perspectives on Psychological Science, 7(6), 555–561.
Ferron, J., & Onghena, P. (1996). The Power of Randomization Tests for Single-Case Phase Designs. The Journal of Experimental Education, 64(3), 231–239.
Feyerabend, P. (1993). Against method (3rd ed). Verso.
Feynman, R. P. (1974). Cargo cult science. Engineering and Science, 37(7), 10–13.
Fiedler, K. (2004). Tools, toys, truisms, and theories: Some thoughts on the creative cycle of theory formation. Personality and Social Psychology Review, 8(2), 123–131.
Fiedler, K., & Schwarz, N. (2016). Questionable Research Practices Revisited. Social Psychological and Personality Science, 7(1), 45–52.
Fiedler, K., & Schwarz, N. (2015). Questionable Research Practices Revisited. Social Psychological and Personality Science, 1948550615612150.
Field, S. A., Tyre, A. J., Jonzén, N., Rhodes, J. R., & Possingham, H. P. (2004). Minimizing the cost of environmental management decisions by optimizing statistical thresholds. Ecology Letters, 7(8), 669–675.
Fisher, Ronald Aylmer. (1935). The design of experiments. Oliver And Boyd; Edinburgh; London.
Fisher, Ronald A. (1956). Statistical methods and scientific inference: Vol. viii. Hafner Publishing Co.
Fraley, R. C., & Vazire, S. (2014). The N-Pact Factor: Evaluating the Quality of Empirical Journals with Respect to Sample Size and Statistical Power. PLOS ONE, 9(10), e109019.
Francis, G. (2014). The frequency of excess success for articles in Psychological Science. Psychonomic Bulletin & Review, 21(5), 1180–1187.
Franco, A., Malhotra, N., & Simonovits, G. (2014). Publication bias in the social sciences: Unlocking the file drawer. Science, 345(6203), 1502–1505.
Fraser, H., Parker, T., Nakagawa, S., Barnett, A., & Fidler, F. (2018). Questionable research practices in ecology and evolution. PLOS ONE, 13(7), e0200303.
Freiman, J. A., Chalmers, T. C., Smith, H., & Kuebler, R. R. (1978). The importance of beta, the type II error and sample size in the design and interpretation of the randomized control trial. Survey of 71 "negative" trials. The New England Journal of Medicine, 299(13), 690–694.
Frick, R. W. (1996). The appropriate use of null hypothesis testing. Psychological Methods, 1(4), 379–390.
Fricker, R. D., Burke, K., Han, X., & Woodall, W. H. (2019). Assessing the Statistical Analyses Used in Basic and Applied Social Psychology After Their p-Value Ban. The American Statistician, 73(sup1), 374–384.
Fried, B. J., Boers, M., & Baker, P. R. (1993). A method for achieving consensus on rheumatoid arthritis outcome measures: The OMERACT conference process. The Journal of Rheumatology, 20(3), 548–551.
Friede, T., & Kieser, M. (2006). Sample size recalculation in internal pilot study designs: A review. Biometrical Journal: Journal of Mathematical Methods in Biosciences, 48(4), 537–555.
Fugard, A. J. B., & Potts, H. W. W. (2015). Supporting thinking on sample sizes for thematic analyses: A quantitative tool. International Journal of Social Research Methodology, 18(6), 669–684.
Funder, D. C., & Ozer, D. J. (2019). Evaluating effect size in psychological research: Sense and nonsense. Advances in Methods and Practices in Psychological Science, 2(2), 156–168.
Gannon, M. A., de Bragança Pereira, C. A., & Polpo, A. (2019). Blending Bayesian and Classical Tools to Define Optimal Sample-Size-Dependent Significance Levels. The American Statistician, 73(sup1), 213–222.
Gelman, A., & Carlin, J. (2014). Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors. Perspectives on Psychological Science, 9(6), 641–651.
Gerring, J. (2012). Mere Description. British Journal of Political Science, 42(4), 721–746.
Gillon, R. (1994). Medical ethics: Four principles plus attention to scope. BMJ, 309(6948), 184.
Glöckner, A. (2016). The irrational hungry judge effect revisited: Simulations reveal that the magnitude of the effect is overestimated. Judgment and Decision Making, 11(6), 601–610.
Glover, S., & Dixon, P. (2004). Likelihood ratios: A simple and flexible statistic for empirical psychologists. Psychonomic Bulletin & Review, 11(5), 791–806.
Goldacre, B., DeVito, N. J., Heneghan, C., Irving, F., Bacon, S., Fleminger, J., & Curtis, H. (2018). Compliance with requirement to report results on the EU Clinical Trials Register: Cohort study and web resource. BMJ, 362, k3218.
Good, I. J. (1992). The Bayes/Non-Bayes compromise: A brief review. Journal of the American Statistical Association, 87(419), 597–606.
Goodyear-Smith, F. A., van Driel, M. L., Arroll, B., & Del Mar, C. (2012). Analysis of decisions made in meta-analyses of depression screening and the risk of confirmation bias: A case study. BMC Medical Research Methodology, 12, 76.
Gopalakrishna, G., Riet, G. ter, Vink, G., Stoop, I., Wicherts, J. M., & Bouter, L. M. (2022). Prevalence of questionable research practices, research misconduct and their potential explanatory factors: A survey among academic researchers in The Netherlands. PLOS ONE, 17(2), e0263023.
Gosset, W. S. (1904). The Application of the "Law of Error" to the Work of the Brewery (1 vol 8; pp. 3–16). Arthur Guinness & Son, Ltd.
Götz, F. M., Gosling, S. D., & Rentfrow, P. J. (2022). Small Effects: The Indispensable Foundation for a Cumulative Psychological Science. Perspectives on Psychological Science, 17(1), 205–215.
Green, P., & MacLeod, C. J. (2016). SIMR: An R package for power analysis of generalized linear mixed models by simulation. Methods in Ecology and Evolution, 7(4), 493–498.
Green, S. B. (1991). How Many Subjects Does It Take To Do A Regression Analysis. Multivariate Behavioral Research, 26(3), 499–510.
Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. European Journal of Epidemiology, 31(4), 337–350.
Greenwald, A. G. (1975). Consequences of prejudice against the null hypothesis. Psychological Bulletin, 82(1), 1–20.
Grünwald, P., de Heide, R., & Koolen, W. (2019). Safe Testing. arXiv:1906.07801 [Cs, Math, Stat].
Gupta, S. K. (2011). Intention-to-treat concept: A review. Perspectives in Clinical Research, 2(3), 109–112.
Hacking, I. (1965). Logic of Statistical Inference. Cambridge University Press.
Hagger, M. S., Chatzisarantis, N. L. D., Alberts, H., Anggono, C. O., Batailler, C., Birt, A. R., Brand, R., Brandt, M. J., Brewer, G., Bruyneel, S., Calvillo, D. P., Campbell, W. K., Cannon, P. R., Carlucci, M., Carruth, N. P., Cheung, T., Crowell, A., De Ridder, D. T. D., Dewitte, S., … Zwienenberg, M. (2016). A Multilab Preregistered Replication of the Ego-Depletion Effect. Perspectives on Psychological Science, 11(4), 546–573.
Hallahan, M., & Rosenthal, R. (1996). Statistical power: Concepts, procedures, and applications. Behaviour Research and Therapy, 34(5), 489–499.
Hallinan, D., Boehm, F., Külpmann, A., & Elson, M. (2023). Information Provision for Informed Consent Procedures in Psychological Research Under the General Data Protection Regulation: A Practical Guide. Advances in Methods and Practices in Psychological Science, 6(1), 25152459231151944.
Halpern, J., Brown Jr, B. W., & Hornberger, J. (2001). The sample size for a clinical trial: A Bayesian decision theoretic approach. Statistics in Medicine, 20(6), 841–858.
Halpern, S. D., Karlawish, J. H., & Berlin, J. A. (2002). The continuing unethical conduct of underpowered clinical trials. Jama, 288(3), 358–362.
Hand, D. J. (1994). Deconstructing Statistical Questions. Journal of the Royal Statistical Society. Series A (Statistics in Society), 157(3), 317–356.
Hardwicke, T. E., Mathur, M. B., MacDonald, K., Nilsonne, G., Banks, G. C., Kidwell, M. C., Mohr, A. H., Clayton, E., Yoon, E. J., Tessler, M. H., Lenne, R. L., Altman, S., Long, B., & Frank, M. C. (2018). Data availability, reusability, and analytic reproducibility: Evaluating the impact of a mandatory open data policy at the journal Cognition. Open Science, 5(8), 180448.
Harms, C., & Lakens, D. (2018). Making ’null effects’ informative: Statistical techniques and inferential frameworks. Journal of Clinical and Translational Research, 3, 382–393.
Harrer, M., Cuijpers, P., Furukawa, T. A., & Ebert, D. D. (2021). Doing Meta-Analysis with R: A Hands-On Guide. Chapman and Hall/CRC.
Hauck, D. W. W., & Anderson, S. (1984). A new statistical procedure for testing equivalence in two-group comparative bioavailability trials. Journal of Pharmacokinetics and Biopharmaceutics, 12(1), 83–91.
Hedges, L. V., & Pigott, T. D. (2001). The power of statistical tests in meta-analysis. Psychological Methods, 6(3), 203–217.
Hempel, C. G. (1966). Philosophy of natural science (Nachdr.). Prentice-Hall.
Hilgard, J. (2021). Maximal positive controls: A method for estimating the largest plausible effect size. Journal of Experimental Social Psychology, 93.
Hill, C. J., Bloom, H. S., Black, A. R., & Lipsey, M. W. (2008). Empirical Benchmarks for Interpreting Effect Sizes in Research. Child Development Perspectives, 2(3), 172–177.
Hodges, J. L., & Lehmann, E. L. (1954). Testing the Approximate Validity of Statistical Hypotheses. Journal of the Royal Statistical Society. Series B (Methodological), 16(2), 261–268.
Hoenig, J. M., & Heisey, D. M. (2001). The abuse of power: The pervasive fallacy of power calculations for data analysis. The American Statistician, 55(1), 19–24.
Huedo-Medina, T. B., Sánchez-Meca, J., Marín-Martínez, F., & Botella, J. (2006). Assessing heterogeneity in meta-analysis: Q statistic or I$2̂$ index? Psychological Methods, 11(2), 193.
Hung, H. M. J., O’Neill, R. T., Bauer, P., & Kohne, K. (1997). The Behavior of the P-Value When the Alternative Hypothesis is True. Biometrics, 53(1), 11–22.
Hyde, J. S., Lindberg, S. M., Linn, M. C., Ellis, A. B., & Williams, C. C. (2008). Gender Similarities Characterize Math Performance. Science, 321(5888), 494–495.
Ioannidis, J. P. A. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2(8), e124.
Ioannidis, J. P. A., & Trikalinos, T. A. (2007). An exploratory test for an excess of significant findings. Clinical Trials, 4(3), 245–253.
Iyengar, S., & Greenhouse, J. B. (1988). Selection Models and the File Drawer Problem. Statistical Science, 3(1), 109–117.
Jaeschke, R., Singer, J., & Guyatt, G. H. (1989). Measurement of health status: Ascertaining the minimal clinically important difference. Controlled Clinical Trials, 10(4), 407–415.
Jeffreys, H. (1939). Theory of probability (1st ed). Oxford University Press.
Jennison, C., & Turnbull, B. W. (2000). Group sequential methods with applications to clinical trials. Chapman & Hall/CRC.
Johansson, T. (2011). Hail the impossible: P-values, evidence, and likelihood. Scandinavian Journal of Psychology, 52(2), 113–125.
John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23(5), 524–532.
Johnson, V. E. (2013). Revised standards for statistical evidence. Proceedings of the National Academy of Sciences, 110(48), 19313–19317.
Jones, L. V. (1952). Test of hypotheses: One-sided vs. Two-sided alternatives. Psychological Bulletin, 49(1), 43–46.
Jostmann, N. B., Lakens, D., & Schubert, T. W. (2009). Weight as an Embodiment of Importance. Psychological Science, 20(9), 1169–1174.
Jostmann, N. B., Lakens, D., & Schubert, T. W. (2016). A short history of the weight-importance effect and a recommendation for pre-testing: Commentary on Ebersole et al. (2016). Journal of Experimental Social Psychology, 67, 93–94.
Julious, S. A. (2004). Sample sizes for clinical trials with normal data. Statistics in Medicine, 23(12), 1921–1986.
Kaiser, H. F. (1960). Directional statistical decisions. Psychological Review, 67(3), 160–167.
Kaplan, R. M., & Irvin, V. L. (2015). Likelihood of Null Effects of Large NHLBI Clinical Trials Has Increased over Time. PLOS ONE, 10(8), e0132382.
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773–795.
Keefe, R. S. E., Kraemer, H. C., Epstein, R. S., Frank, E., Haynes, G., Laughren, T. P., Mcnulty, J., Reed, S. D., Sanchez, J., & Leon, A. C. (2013). Defining a Clinically Meaningful Effect for the Design and Interpretation of Randomized Controlled Trials. Innovations in Clinical Neuroscience, 10(5-6 Suppl A), 4S–19S.
Kelley, K. (2007). Confidence Intervals for Standardized Effect Sizes: Theory, Application, and Implementation. Journal of Statistical Software, 20(8).
Kelley, K., & Preacher, K. J. (2012). On effect size. Psychological Methods, 17(2), 137–152.
Kelley, K., & Rausch, J. R. (2006). Sample size planning for the standardized mean difference: Accuracy in parameter estimation via narrow confidence intervals. Psychological Methods, 11(4), 363–385.
Kenett, R. S., Shmueli, G., & Kenett, R. (2016). Information Quality: The Potential of Data and Analytics to Generate Knowledge (1st edition). Wiley.
Kennedy-Shaffer, L. (2019). Before p \(<\) 0.05 to Beyond p \(<\) 0.05: Using History to Contextualize p-Values and Significance Testing. The American Statistician, 73(sup1), 82–90.
Kenny, D. A., & Judd, C. M. (2019). The unappreciated heterogeneity of effect sizes: Implications for power, precision, planning of research, and replication. Psychological Methods, 24(5), 578–589.
Keppel, G. (1991). Design and analysis: A researcher’s handbook, 3rd ed (pp. xiii, 594). Prentice-Hall, Inc.
Kerr, N. L. (1998). HARKing: Hypothesizing After the Results are Known. Personality and Social Psychology Review, 2(3), 196–217.
King, M. T. (2011). A point of minimal important difference (MID): A critique of terminology and methods. Expert Review of Pharmacoeconomics & Outcomes Research, 11(2), 171–184.
Kish, L. (1965). Survey Sampling. Wiley.
Klein, O., Hardwicke, T. E., Aust, F., Breuer, J., Danielsson, H., Mohr, A. H., Ijzerman, H., Nilsonne, G., Vanpaemel, W., & Frank, M. C. (2018). A Practical Guide for Transparency in Psychological Science. Collabra: Psychology, 4(1), 20.
Komić, D., Marušić, S. L., & Marušić, A. (2015). Research Integrity and Research Ethics in Professional Codes of Ethics: Survey of Terminology Used by Professional Organizations across Research Disciplines. PLOS ONE, 10(7), e0133662.
Krafczyk, M. S., Shi, A., Bhaskar, A., Marinov, D., & Stodden, V. (2021). Learning from reproducing computational results: Introducing three principles and the Reproduction Package. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 379(2197), rsta.2020.0069, 20200069.
Kraft, M. A. (2020). Interpreting effect sizes of education interventions. Educational Researcher, 49(4), 241–253.
Kruschke, J. K. (2018). Rejecting or Accepting Parameter Values in Bayesian Estimation. Advances in Methods and Practices in Psychological Science, 1(2), 270–280.
Kruschke, J. K. (2014). Doing Bayesian Data Analysis, Second Edition: A Tutorial with R, JAGS, and Stan (2 edition). Academic Press.
Kruschke, J. K. (2011). Bayesian assessment of null values via parameter estimation and model comparison. Perspectives on Psychological Science, 6(3), 299–312.
Kruschke, J. K. (2013). Bayesian estimation supersedes the t test. Journal of Experimental Psychology: General, 142(2), 573–603.
Kruschke, J. K., & Liddell, T. M. (2017). The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychonomic Bulletin & Review.
Kruschke, J. K., & Meredith, M. (2021). BEST: Bayesian estimation supersedes the t-test.
Kuhn, T. S. (1962). The Structure of Scientific Revolutions. University of Chicago Press.
Kuipers, T. A. F. (2016). Models, postulates, and generalized nomic truth approximation. Synthese, 193(10), 3057–3077.
Lakatos, I. (1978). The methodology of scientific research programmes: Volume 1: Philosophical papers. Cambridge University Press.
Lakens, D. (2022a). Why P values are not measures of evidence. Trends in Ecology & Evolution.
Lakens, D. (2023). Is my study useless? Why researchers need methodological review boards. Nature, 613(7942), 9–9.
Lakens, D. (2020). Pandemic researchers recruit your own best critics. Nature, 581(7807), 121–121.
Lakens, D. (2021). The practical alternative to the p value is the correctly used p value. Perspectives on Psychological Science, 16(3), 639–648.
Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4.
Lakens, D. (2014). Performing high-powered studies efficiently with sequential analyses: Sequential analyses. European Journal of Social Psychology, 44(7), 701–710.
Lakens, D. (2015). On the challenges of drawing conclusions from p -values just below 0.05. PeerJ, 3, e1142.
Lakens, D. (2017). Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses. Social Psychological and Personality Science, 8(4), 355–362.
Lakens, D. (2019). The value of preregistration for psychological science: A conceptual analysis. Japanese Psychological Review, 62(3), 221–230.
Lakens, D. (2022b). Sample Size Justification. Collabra: Psychology.
Lakens, D., Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A. J., Argamon, S. E., Baguley, T., Becker, R. B., Benning, S. D., Bradford, D. E., Buchanan, E. M., Caldwell, A. R., Calster, B., Carlsson, R., Chen, S.-C., Chung, B., Colling, L. J., Collins, G. S., Crook, Z., … Zwaan, R. A. (2018). Justify your alpha. Nature Human Behaviour, 2, 168–171.
Lakens, D., & Caldwell, A. (2022). TOSTER: Two one-sided tests (TOST) equivalence testing.
Lakens, D., & Caldwell, A. R. (2021). Simulation-Based Power Analysis for Factorial Analysis of Variance Designs. Advances in Methods and Practices in Psychological Science, 4(1).
Lakens, D., & DeBruine, L. (2020). Improving Transparency, Falsifiability, and Rigour by Making Hypothesis Tests Machine Readable.
Lakens, D., & Etz, A. J. (2017). Too True to be Bad: When Sets of Studies With Significant and Nonsignificant Findings Are Probably True. Social Psychological and Personality Science, 8(8), 875–881.
Lakens, D., Hilgard, J., & Staaks, J. (2016). On the reproducibility of meta-analyses: Six practical recommendations. BMC Psychology, 4, 24.
Lakens, D., McLatchie, N., Isager, P. M., Scheel, A. M., & Dienes, Z. (2020). Improving Inferences About Null Effects With Bayes Factors and Equivalence Tests. The Journals of Gerontology: Series B, 75(1), 45–57.
Lakens, D., Scheel, A. M., & Isager, P. M. (2018). Equivalence testing for psychological research: A tutorial. Advances in Methods and Practices in Psychological Science, 1(2), 259–269.
Lan, K. K. G., & DeMets, D. L. (1983). Discrete Sequential Boundaries for Clinical Trials. Biometrika, 70(3), 659.
Latan, H., Chiappetta Jabbour, C. J., Lopes de Sousa Jabbour, A. B., & Ali, M. (2021). Crossing the Red Line? Empirical Evidence and Useful Recommendations on Questionable Research Practices among Business Scholars. Journal of Business Ethics, 1–21.
Laudan, L. (1986). Science and Values: The Aims of Science and Their Role in Scientific Debate.
Laudan, L. (1981). Science and Hypothesis. Springer Netherlands.
Lawrence, J. M., Meyerowitz-Katz, G., Heathers, J. A. J., Brown, N. J. L., & Sheldrick, K. A. (2021). The lesson of ivermectin: Meta-analyses based on summary data alone are inherently unreliable. Nature Medicine, 27(11), 1853–1854.
Leamer, E. E. (1978). Specification Searches: Ad Hoc Inference with Nonexperimental Data (1 edition). Wiley.
Lehmann, E. L., & Romano, J. P. (2005). Testing statistical hypotheses (3rd ed). Springer.
Lenth, R. V. (2001). Some practical guidelines for effective sample size determination. The American Statistician, 55(3), 187–193.
Lenth, R. V. (2007). Post hoc power: Tables and commentary. Iowa City: Department of Statistics and Actuarial Science, University of Iowa.
Leon, A. C., Davis, L. L., & Kraemer, H. C. (2011). The Role and Interpretation of Pilot Studies in Clinical Research. Journal of Psychiatric Research, 45(5), 626–629.
Levine, T. R., Weber, R., Park, H. S., & Hullett, C. R. (2008). A communication researchers’ guide to null hypothesis significance testing and alternatives. Human Communication Research, 34(2), 188–209.
Leys, C., Delacre, M., Mora, Y. L., Lakens, D., & Ley, C. (2019). How to Classify, Detect, and Manage Univariate and Multivariate Outliers, With Emphasis on Pre-Registration. International Review of Social Psychology, 32(1), 5.
Linden, A. H., & Hönekopp, J. (2021). Heterogeneity of Research Results: A New Perspective From Which to Assess and Promote Progress in Psychological Science. Perspectives on Psychological Science, 16(2), 358–376.
Lindley, D. V. (1957). A statistical paradox. Biometrika, 44(1/2), 187–192.
Lindsay, D. S. (2015). Replication in Psychological Science. Psychological Science, 26(12), 1827–1832.
Louis, T. A., & Zeger, S. L. (2009). Effective communication of standard errors and confidence intervals. Biostatistics, 10(1), 1–2.
Lovakov, A., & Agadullina, E. R. (2021). Empirically derived guidelines for effect size interpretation in social psychology. European Journal of Social Psychology, 51(3), 485–504.
Luttrell, A., Petty, R. E., & Xu, M. (2017). Replicating and fixing failed replications: The case of need for cognition and argument quality. Journal of Experimental Social Psychology, 69, 178–183.
Maier, M., & Lakens, D. (2021). JustifyAlpha: Justifying alpha levels for hypothesis tests.
Maier, M., & Lakens, D. (2022). Justify your alpha: A primer on two practical approaches. Advances in Methods and Practices in Psychological Science.
Makel, M. C., Hodges, J., Cook, B. G., & Plucker, J. A. (2021). Both Questionable and Open Research Practices Are Prevalent in Education Research. Educational Researcher, 50(8), 493–504.
Marshall, B., Cardon, P., Poddar, A., & Fontenot, R. (2013). Does Sample Size Matter in Qualitative Research?: A Review of Qualitative Interviews in is Research. Journal of Computer Information Systems, 54(1), 11–22.
Maxwell, S. E., & Delaney, H. D. (2004). Designing experiments and analyzing data: A model comparison perspective (2nd ed). Lawrence Erlbaum Associates.
Maxwell, S. E., Delaney, H. D., & Kelley, K. (2017). Designing Experiments and Analyzing Data: A Model Comparison Perspective, Third Edition (3 edition). Routledge.
Maxwell, S. E., & Kelley, K. (2011). Ethics and sample size planning. In Handbook of ethics in quantitative methodology (pp. 179–204). Routledge.
Maxwell, S. E., Kelley, K., & Rausch, J. R. (2008). Sample Size Planning for Statistical Power and Accuracy in Parameter Estimation. Annual Review of Psychology, 59(1), 537–563.
Mayo, D. G. (2018). Statistical inference as severe testing: How to get beyond the statistics wars. Cambridge University Press.
Mazzolari, R., Porcelli, S., Bishop, D. J., & Lakens, D. (2022). Myths and methodologies: The use of equivalence and non-inferiority tests for interventional studies in exercise physiology and sport science. Experimental Physiology, 107(3), 201–212.
McCarthy, R. J., Skowronski, J. J., Verschuere, B., Meijer, E. H., Jim, A., Hoogesteyn, K., Orthey, R., Acar, O. A., Aczel, B., Bakos, B. E., Barbosa, F., Baskin, E., Bègue, L., Ben-Shakhar, G., Birt, A. R., Blatz, L., Charman, S. D., Claesen, A., Clay, S. L., … Yıldız, E. (2018). Registered Replication Report on Srull and Wyer (1979). Advances in Methods and Practices in Psychological Science, 1(3), 321–336.
McElreath, R. (2016). Statistical Rethinking: A Bayesian Course with Examples in R and Stan (Vol. 122). CRC Press.
McGrath, R. E., & Meyer, G. J. (2006). When effect sizes disagree: The case of r and d. Psychological Methods, 11(4), 386–401.
McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological Bulletin, 111(2), 361–365.
McGuire, W. J. (2004). A Perspectivist Approach to Theory Construction. Personality and Social Psychology Review, 8(2), 173–182.
McIntosh, R. D., & Rittmo, J. Ö. (2021). Power calculations in single-case neuropsychology: A practical primer. Cortex, 135, 146–158.
Meehl, P. E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 103–115.
Meehl, P. E. (1978). Theoretical Risks and Tabular Asterisks: Sir Karl, Sir Ronald, and the Slow Progress of Soft Psychology. Journal of Consulting and Clinical Psychology, 46(4), 806–834.
Meehl, P. E. (1990a). Appraising and amending theories: The strategy of Lakatosian defense and two principles that warrant it. Psychological Inquiry, 1(2), 108–141.
Meehl, P. E. (1990b). Why Summaries of Research on Psychological Theories are Often Uninterpretable: Psychological Reports, 66(1), 195–244.
Meehl, P. E. (2004). Cliometric metatheory III: Peircean consensus, verisimilitude and asymptotic method. The British Journal for the Philosophy of Science, 55(4), 615–643.
Melara, R. D., & Algom, D. (2003). Driven by information: A tectonic theory of Stroop effects. Psychological Review, 110(3), 422–471.
Mellers, B., Hertwig, R., & Kahneman, D. (2001). Do frequency representations eliminate conjunction effects? An exercise in adversarial collaboration. Psychological Science, 12(4), 269–275.
Mersmann, O., Trautmann, H., Steuer, D., & Bornkamp, B. (2018). Truncnorm: Truncated normal distribution.
Meyners, M. (2012). Equivalence tests A review. Food Quality and Preference, 26(2), 231–245.
Meyvis, T., & Van Osselaer, S. M. J. (2018). Increasing the Power of Your Study by Increasing the Effect Size. Journal of Consumer Research, 44(5), 1157–1173.
Millar, R. B. (2011). Maximum likelihood estimation and inference: With examples in R, SAS, and ADMB. Wiley.
Miller, J. (2009). What is the probability of replicating a statistically significant effect? Psychonomic Bulletin & Review, 16(4), 617–640.
Miller, J., & Ulrich, R. (2019). The quest for an optimal alpha. PLOS ONE, 14(1), e0208631.
Moe, K. (1984). Should the Nazi Research Data Be Cited? The Hastings Center Report, 14(6), 5–7.
Moran, C., Link to external site, this link will open in a new window, Richard, A., Link to external site, this link will open in a new window, Wilson, K., Twomey, R., Link to external site, this link will open in a new window, Coroiu, A., & Link to external site, this link will open in a new window. (2022). I know it’s bad, but I have been pressured into it: Questionable research practices among psychology students in Canada. Canadian Psychology/Psychologie Canadienne.
Morey, R. D. (2020). Power and precision [Blog].
Morey, R. D., Hoekstra, R., Rouder, J. N., Lee, M. D., & Wagenmakers, E.-J. (2016). The fallacy of placing confidence in confidence intervals. Psychonomic Bulletin & Review, 23(1), 103–123.
Morey, R. D., Kaschak, M. P., Díez-Álamo, A. M., Glenberg, A. M., Zwaan, R. A., Lakens, D., Ibáñez, A., García, A., Gianelli, C., Jones, J. L., Madden, J., Alifano, F., Bergen, B., Bloxsom, N. G., Bub, D. N., Cai, Z. G., Chartier, C. R., Chatterjee, A., Conwell, E., … Ziv-Crispel, N. (2021). A pre-registered, multi-lab non-replication of the action-sentence compatibility effect (ACE). Psychonomic Bulletin & Review.
Morris, T. P., White, I. R., & Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods. Statistics in Medicine, 38(11), 2074–2102.
Morse, J. M. (1995). The Significance of Saturation. Qualitative Health Research, 5(2), 147–149.
Moshontz, H., Campbell, L., Ebersole, C. R., IJzerman, H., Urry, H. L., Forscher, P. S., Grahe, J. E., McCarthy, R. J., Musser, E. D., & Antfolk, J. (2018). The Psychological Science Accelerator: Advancing psychology through a distributed collaborative network. Advances in Methods and Practices in Psychological Science, 1(4), 501–515.
Motyl, M., Demos, A. P., Carsel, T. S., Hanson, B. E., Melton, Z. J., Mueller, A. B., Prims, J. P., Sun, J., Washburn, A. N., Wong, K. M., Yantis, C., & Skitka, L. J. (2017). The state of social and personality science: Rotten to the core, not so bad, getting better, or getting worse? Journal of Personality and Social Psychology, 113, 34–58.
Mrozek, J. R., & Taylor, L. O. (2002). What determines the value of life? A meta-analysis. Journal of Policy Analysis and Management, 21(2), 253–270.
Mudge, J. F., Baker, L. F., Edge, C. B., & Houlahan, J. E. (2012). Setting an Optimal \(\alpha\) That Minimizes Errors in Null Hypothesis Significance Tests. PLOS ONE, 7(2), e32734.
Mullan, F., & Jacoby, I. (1985). The town meeting for technology: The maturation of consensus conferences. JAMA, 254(8), 1068–1072.
Murphy, K. R., & Myors, B. (1999). Testing the hypothesis that treatments have negligible effects: Minimum-effect tests in the general linear model. Journal of Applied Psychology, 84(2), 234–248.
Murphy, K. R., Myors, B., & Wolach, A. H. (2014). Statistical power analysis: A simple and general model for traditional and modern hypothesis tests (Fourth edition). Routledge, Taylor & Francis Group.
Neyman, J. (1957). "Inductive Behavior" as a Basic Concept of Philosophy of Science. Revue de l’Institut International de Statistique / Review of the International Statistical Institute, 25(1/3), 7.
Neyman, J., & Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 231(694-706), 289–337.
Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5(2), 241–301.
Niiniluoto, I. (1998). Verisimilitude: The Third Period. The British Journal for the Philosophy of Science, 49, 1–29.
Niiniluoto, I. (1999). Critical Scientific Realism. Oxford University Press.
Norman, G. R., Sloan, J. A., & Wyrwich, K. W. (2004). The truly remarkable universality of half a standard deviation: Confirmation through another look. Expert Review of Pharmacoeconomics & Outcomes Research, 4(5), 581–585.
Nosek, B. A., & Lakens, D. (2014). Registered reports: A method to increase the credibility of published results. Social Psychology, 45(3), 137–141.
Nuijten, M. B., Hartgerink, C. H. J., van Assen, M. A. L. M., Epskamp, S., & Wicherts, J. M. (2015). The prevalence of statistical reporting errors in psychology (1985). Behavior Research Methods.
Nunnally, J. (1960). The place of statistics in psychology. Educational and Psychological Measurement, 20(4), 641–650.
O’Donnell, M., Nelson, L. D., Ackermann, E., Aczel, B., Akhtar, A., Aldrovandi, S., Alshaif, N., Andringa, R., Aveyard, M., Babincak, P., Balatekin, N., Baldwin, S. A., Banik, G., Baskin, E., Bell, R., Białobrzeska, O., Birt, A. R., Boot, W. R., Braithwaite, S. R., … Zrubka, M. (2018). Registered Replication Report: Dijksterhuis and van Knippenberg (1998). Perspectives on Psychological Science, 13(2), 268–294.
Obels, P., Lakens, D., Coles, N. A., Gottfried, J., & Green, S. A. (2020). Analysis of Open Data and Computational Reproducibility in Registered Reports in Psychology. Advances in Methods and Practices in Psychological Science, 3(2), 229–237.
Oddie, G. (2013). The content, consequence and likeness approaches to verisimilitude: Compatibility, trivialization, and underdetermination. Synthese, 190(9), 1647–1687.
Okada, K. (2013). Is Omega Squared Less Biased? A Comparison of Three Major Effect Size Indices in One-Way Anova. Behaviormetrika, 40(2), 129–147.
Olejnik, S., & Algina, J. (2003). Generalized Eta and Omega Squared Statistics: Measures of Effect Size for Some Common Research Designs. Psychological Methods, 8(4), 434–447.
Olsson-Collentine, A., Wicherts, J. M., & van Assen, M. A. L. M. (2020a). Heterogeneity in direct replications in psychology and its association with effect size. Psychological Bulletin, 146(10), 922–940.
Olsson-Collentine, A., Wicherts, J. M., & van Assen, M. A. L. M. (2020b). Heterogeneity in direct replications in psychology and its association with effect size. Psychological Bulletin, 146(10), 922–940.
Ooms, J. (2021). Gifski: Highest quality GIF encoder.
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716–aac4716.
Orben, A., & Lakens, D. (2020). Crud (Re)Defined. Advances in Methods and Practices in Psychological Science, 3(2), 238–247.
Parker, R. A., & Berman, N. G. (2003). Sample Size. The American Statistician, 57(3), 166–170.
Parkhurst, D. F. (2001). Statistical significance tests: Equivalence and reverse tests should reduce misinterpretation. Bioscience, 51(12), 1051–1057.[1051:SSTEAR]2.0.CO;2
Parsons, S., Kruijt, A.-W., & Fox, E. (2019). Psychological Science Needs a Standard Practice of Reporting the Reliability of Cognitive-Behavioral Measurements. Advances in Methods and Practices in Psychological Science, 2(4), 378–395.
Pawitan, Y. (2001). In all likelihood: Statistical modelling and inference using likelihood. Clarendon Press ; Oxford University Press.
Pedersen, T. L. (2020). Patchwork: The composer of plots.
Pemberton, M., Hall, S., Moskovitz, C., & Anson, C. M. (2019). Text recycling: Views of North American journal editors from an interview-based study. Learned Publishing, 32(4), 355–366.
Perneger, T. V. (1998). What’s wrong with Bonferroni adjustments. Bmj, 316(7139), 1236–1238.
Perugini, M., Gallucci, M., & Costantini, G. (2018). A Practical Primer To Power Analysis for Simple Experimental Designs. International Review of Social Psychology, 31(1), 20.
Perugini, M., Gallucci, M., & Costantini, G. (2014). Safeguard power as a protection against imprecise power estimates. Perspectives on Psychological Science, 9(3), 319–332.
Peters, J. L., Sutton, A. J., Jones, D. R., Abrams, K. R., & Rushton, L. (2007). Performance of the trim and fill method in the presence of publication bias and between-study heterogeneity. Statistics in Medicine, 26(25), 4544–4562.
Phillips, B. M., Hunt, J. W., Anderson, B. S., Puckett, H. M., Fairey, R., Wilson, C. J., & Tjeerdema, R. (2001). Statistical significance of sediment toxicity test results: Threshold values derived by the detectable significance approach. Environmental Toxicology and Chemistry, 20(2), 371–373.
Pickett, J. T., & Roche, S. P. (2017). Questionable, Objectionable or Criminal? Public Opinion on Data Fraud and Selective Reporting in Science. Science and Engineering Ethics, 1–21.
Platt, J. R. (1964). Strong Inference: Certain systematic methods of scientific thinking may produce much more rapid progress than others. Science, 146(3642), 347–353.
Pocock, S. J. (1977). Group sequential methods in the design and analysis of clinical trials. Biometrika, 64(2), 191–199.
Polanin, J. R., Hennessy, E. A., & Tsuji, S. (2020). Transparency and Reproducibility of Meta-Analyses in Psychology: A Meta-Review. Perspectives on Psychological Science, 15(4), 1026–1041.
Popper, K. R. (2002). The logic of scientific discovery. Routledge.
Primbs, M., Pennington, C. R., Lakens, D., Silan, M. A., Lieck, D. S. N., Forscher, P., Buchanan, E. M., & Westwood, S. J. (2022). Are Small Effects the Indispensable Foundation for a Cumulative Psychological Science? A Reply to Götz et al. (2022). Perspectives on Psychological Science.
Proschan, M. A. (2005). Two-Stage Sample Size Re-Estimation Based on a Nuisance Parameter: A Review. Journal of Biopharmaceutical Statistics, 15(4), 559–574.
Proschan, M. A., Lan, K. K. G., & Wittes, J. T. (2006). Statistical monitoring of clinical trials: A unified approach. Springer.
Psillos, S. (1999). Scientific realism: How science tracks truth. Routledge.
Quertemont, E. (2011). How to Statistically Show the Absence of an Effect. Psychologica Belgica, 51(2), 109–127.
R Core Team. (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing.
Rabelo, A. L. A., Farias, J. E. M., Sarmet, M. M., Joaquim, T. C. R., Hoersting, R. C., Victorino, L., Modesto, J. G. N., & Pilati, R. (2020). Questionable research practices among Brazilian psychological researchers: Results from a replication study and an international comparison. International Journal of Psychology, 55(4), 674–683.
Rice, W. R., & Gaines, S. D. (1994). ’Heads I win, tails you lose’: Testing directional alternative hypotheses in ecological and evolutionary research. Trends in Ecology & Evolution, 9(6), 235–237.
Richard, F. D., Bond, C. F., & Stokes-Zoota, J. J. (2003). One Hundred Years of Social Psychology Quantitatively Described. Review of General Psychology, 7(4), 331–363.
Richardson, J. T. E. (2011). Eta squared and partial eta squared as measures of effect size in educational research. Educational Research Review, 6(2), 135–147.
Rijnsoever, F. J. van. (2017). (I Can’t Get No) Saturation: A simulation and guidelines for sample sizes in qualitative research. PLOS ONE, 12(7), e0181689.
Ripley, B. (2021). MASS: Support functions and datasets for venables and ripley’s MASS.
Rogers, J. L., Howard, K. I., & Vessey, J. T. (1993). Using significance tests to evaluate equivalence between two experimental groups. Psychological Bulletin, 113(3), 553–565.
Rogers, S. (1992). How a publicity blitz created the myth of subliminal advertising. Public Relations Quarterly, 37(4), 12.
Roig, M. (2009). Plagiarism: Consider the context. Science (New York, N.Y.), 325(5942), 813–814.
Ropovik, I., Adamkovic, M., & Greger, D. (2021). Neglect of publication bias compromises meta-analyses of educational research. PLOS ONE, 16(6), e0252415.
Rosenthal, R., Rosnow, R. L., & Rubin, D. B. (2000). Contrasts and effect sizes in behavioral research: A correlational approach. Cambridge University Press.
Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16(2), 225–237.
Royall, R. (1997). Statistical Evidence: A Likelihood Paradigm. Chapman and Hall/CRC.
Rozeboom, W. W. (1960). The fallacy of the null-hypothesis significance test. Psychological Bulletin, 57(5), 416–428.
Rücker, G., Schwarzer, G., Carpenter, J. R., & Schumacher, M. (2008). Undue reliance on I(2) in assessing heterogeneity may mislead. BMC Medical Research Methodology, 8, 79.
Sarafoglou, A., Kovacs, M., Bakos, B., Wagenmakers, E.-J., & Aczel, B. (2022). A survey on how preregistration affects the research workflow: Better science but more work. Royal Society Open Science, 9(7), 211997.
Scheel, A. M., Schijen, M. R. M. J., & Lakens, D. (2021). An Excess of Positive Results: Comparing the Standard Psychology Literature With Registered Reports. Advances in Methods and Practices in Psychological Science, 4(2), 25152459211007467.
Scheel, A. M., Tiokhin, L., Isager, P. M., & Lakens, D. (2021). Why Hypothesis Testers Should Spend Less Time Testing Hypotheses. Perspectives on Psychological Science, 16(4), 744–755.
Schimmack, U. (2012). The ironic effect of significant results on the credibility of multiple-study articles. Psychological Methods, 17(4), 551–566.
Schnuerch, M., & Erdfelder, E. (2020). Controlling decision errors with minimal costs: The sequential probability ratio t test. Psychological Methods, 25(2), 206–226.
Schoemann, A. M., Boulton, A. J., & Short, S. D. (2017). Determining Power and Sample Size for Simple and Complex Mediation Models. Social Psychological and Personality Science, 8(4), 379–386.
Schönbrodt, F. D., Wagenmakers, E.-J., Zehetleitner, M., & Perugini, M. (2017). Sequential hypothesis testing with Bayes factors: Efficiently testing mean differences. Psychological Methods, 22(2), 322–339.
Schuirmann, D. J. (1987). A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics, 15(6), 657–680.
Schulz, K. F., & Grimes, D. A. (2005). Sample size calculations in randomised trials: Mandatory and mystical. The Lancet, 365(9467), 1348–1353.
Schumi, J., & Wittes, J. T. (2011). Through the looking glass: Understanding non-inferiority. Trials, 12(1), 106.
Schweder, T., & Hjort, N. L. (2016). Confidence, Likelihood, Probability: Statistical Inference with Confidence Distributions. Cambridge University Press.
Scull, A. (2023). Rosenhan revisited: Successful scientific fraud. History of Psychiatry, 0957154X221150878.
Seaman, M. A., & Serlin, R. C. (1998). Equivalence confidence intervals for two-group comparisons of means. Psychological Methods, 3(4), 403–411.
Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin, 105(2), 309–316.
Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289–310.
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2013). Life after P-Hacking.
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science, 22(11), 1359–1366.
Simonsohn, U. (2015). Small telescopes: Detectability and the evaluation of replication results. Psychological Science, 26(5), 559–569.
Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve: A key to the file-drawer. Journal of Experimental Psychology: General, 143(2), 534.
Smaldino, P. E., & McElreath, R. (2016). The natural selection of bad science. Royal Society Open Science, 3(9), 160384.
Smithson, M. (2003). Confidence intervals. Sage Publications.
Sotola, L. K. (2022). Garbage In, Garbage Out? Evaluating the Evidentiary Value of Published Meta-analyses Using Z-Curve Analysis. Collabra: Psychology, 8(1), 32571.
Spanos, A. (2013). Who should be afraid of the Jeffreys-Lindley paradox? Philosophy of Science, 80(1), 73–93.
Spellman, B. A. (2015). A Short (Personal) Future History of Revolution 2.0. Perspectives on Psychological Science, 10(6), 886–899.
Spiegelhalter, D. (2019). The Art of Statistics: How to Learn from Data (Illustrated edition). Basic Books.
Spiegelhalter, D. J., Freedman, L. S., & Blackburn, P. R. (1986). Monitoring clinical trials: Conditional or predictive power? Controlled Clinical Trials, 7(1), 8–17.
Stanley, T. D., & Doucouliagos, H. (2014). Meta-regression approximations to reduce publication selection bias. Research Synthesis Methods, 5(1), 60–78.
Stanley, T. D., Doucouliagos, H., & Ioannidis, J. P. A. (2017). Finding the power to reduce publication bias: Finding the power to reduce publication bias. Statistics in Medicine.
Steiger, J. H. (2004). Beyond the F Test: Effect Size Confidence Intervals and Tests of Close Fit in the Analysis of Variance and Contrast Analysis. Psychological Methods, 9(2), 164–182.
Sterling, T. D. (1959). Publication Decisions and Their Possible Effects on Inferences Drawn from Tests of Significance–Or Vice Versa. Journal of the American Statistical Association, 54(285), 30–34.
Stewart, L. A., & Tierney, J. F. (2002). To IPD or not to IPD?: Advantages and Disadvantages of Systematic Reviews Using Individual Patient Data. Evaluation & the Health Professions, 25(1), 76–97.
Stodden, V., Seiler, J., & Ma, Z. (2018). An empirical analysis of journal policy effectiveness for computational reproducibility. Proceedings of the National Academy of Sciences, 115(11), 2584–2589.
Stroebe, W., & Strack, F. (2014). The Alleged Crisis and the Illusion of Exact Replication. Perspectives on Psychological Science, 9(1), 59–71.
Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18(6), 643–662.
Swift, J. K., Link to external site, this link will open in a new window, Christopherson, C. D., Link to external site, this link will open in a new window, Bird, M. O., Link to external site, this link will open in a new window, Zöld, A., Link to external site, this link will open in a new window, Goode, J., & Link to external site, this link will open in a new window. (2022). Questionable research practices among faculty and students in APA-accredited clinical and counseling psychology doctoral programs. Training and Education in Professional Psychology, 16(3), 299–305.
Taper, M. L., & Lele, S. R. (2011). Philosophy of Statistics. In P. S. Bandyophadhyay & M. R. Forster (Eds.), Evidence, evidence functions, and error probabilities (pp. 513–531). Elsevier, USA.
Taylor, D. J., & Muller, K. E. (1996). Bias in linear model power and sample size calculation due to estimating noncentrality. Communications in Statistics-Theory and Methods, 25(7), 1595–1610.
Teare, M. D., Dimairo, M., Shephard, N., Hayman, A., Whitehead, A., & Walters, S. J. (2014). Sample size requirements to estimate key design parameters from external pilot randomised controlled trials: A simulation study. Trials, 15(1), 264.
Tendeiro, J. N., & Kiers, H. A. L. (2019). A review of issues about null hypothesis Bayesian testing. Psychological Methods.
ter Schure, J., & Grünwald, P. D. (2019). Accumulation Bias in Meta-Analysis: The Need to Consider Time in Error Control. arXiv:1905.13494 [Math, Stat].
Terrin, N., Schmid, C. H., Lau, J., & Olkin, I. (2003). Adjusting for publication bias in the presence of heterogeneity. Statistics in Medicine, 22(13), 2113–2126.
Thompson, B. (2007). Effect sizes, confidence intervals, and confidence intervals for effect sizes. Psychology in the Schools, 44(5), 423–432.
Tversky, A. (1977). Features of similarity. Psychological Review, 84(4), 327–352.
Tversky, A., & Kahneman, D. (1971). Belief in the law of small numbers. Psychological Bulletin, 76(2), 105–110.
Ulrich, R., & Miller, J. (2018). Some properties of p-curves, with an application to gradual publication bias. Psychological Methods, 23(3), 546–560.
Uygun Tunç, D., & Tunç, M. N. (2022). A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework. Meta-Psychology.
Uygun Tunç, D., Tunç, M. N., & Lakens, D. (2021). The Epistemic and Pragmatic Function of Dichotomous Claims Based on Statistical Hypothesis Tests. PsyArXiv.
Valentine, J. C., Pigott, T. D., & Rothstein, H. R. (2010). How Many Studies Do You Need?: A Primer on Statistical Power for Meta-Analysis. Journal of Educational and Behavioral Statistics, 35(2), 215–247.
van Aert, R. C. M. (2021). Puniform: Meta-analysis methods correcting for publication bias.
van de Schoot, R., Winter, S. D., Griffioen, E., Grimmelikhuijsen, S., Arts, I., Veen, D., Grandfield, E. M., & Tummers, L. G. (2021). The Use of Questionable Research Practices to Survive in Academia Examined With Expert Elicitation, Prior-Data Conflicts, Bayes Factors for Replication Effects, and the Bayes Truth Serum. Frontiers in Psychology, 12.
Van Fraassen, B. C. (1980). The scientific image. Clarendon Press ; Oxford University Press.
van ’t Veer, A. E., & Giner-Sorolla, R. (2016). Pre-registration in social psychology discussion and suggested template. Journal of Experimental Social Psychology, 67, 2–12.
Varkey, B. (2021). Principles of Clinical Ethics and Their Application to Practice. Medical Principles and Practice: International Journal of the Kuwait University, Health Science Centre, 30(1), 17–28.
Venables, W. N., & Ripley, B. D. (2002). Modern applied statistics with s (Fourth). Springer.
Verschuere, B., Meijer, E. H., Jim, A., Hoogesteyn, K., Orthey, R., McCarthy, R. J., Skowronski, J. J., Acar, O. A., Aczel, B., Bakos, B. E., Barbosa, F., Baskin, E., Bègue, L., Ben-Shakhar, G., Birt, A. R., Blatz, L., Charman, S. D., Claesen, A., Clay, S. L., … Yıldız, E. (2018). Registered Replication Report on Mazar, Amir, and Ariely (2008). Advances in Methods and Practices in Psychological Science, 1(3), 299–317.
Viamonte, S. M., Ball, K. K., & Kilgore, M. (2006). A Cost-Benefit Analysis of Risk-Reduction Strategies Targeted at Older Drivers. Traffic Injury Prevention, 7(4), 352–359.
Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. J Stat Softw, 36(3), 1–48.
Viechtbauer, W. (2021). Metafor: Meta-analysis package for r.
Vohs, K. D., Schmeichel, B. J., Lohmann, S., Gronau, Q. F., Finley, A. J., Ainsworth, S. E., Alquist, J. L., Baker, M. D., Brizi, A., Bunyi, A., Butschek, G. J., Campbell, C., Capaldi, J., Cau, C., Chambers, H., Chatzisarantis, N. L. D., Christensen, W. J., Clay, S. L., Curtis, J., … Albarracín, D. (2021). A Multisite Preregistered Paradigmatic Test of the Ego-Depletion Effect. Psychological Science, 32(10), 1566–1581.
Vosgerau, J., Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2019). 99% impossible: A valid, or falsifiable, internal meta-analysis. Journal of Experimental Psychology. General, 148(9), 1628–1639.
Vuorre, M., & Curley, J. P. (2018). Curating Research Assets: A Tutorial on the Git Version Control System. Advances in Methods and Practices in Psychological Science, 1(2), 219–236.
Wacholder, S., Chanock, S., Garcia-Closas, M., El ghormli, L., & Rothman, N. (2004). Assessing the Probability That a Positive Report is False: An Approach for Molecular Epidemiology Studies. JNCI Journal of the National Cancer Institute, 96(6), 434–442.
Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14(5), 779–804.
Wagenmakers, E.-J., Beek, T., Dijkhoff, L., Gronau, Q. F., Acosta, A., Adams, R. B., Albohn, D. N., Allard, E. S., Benning, S. D., Blouin-Hudon, E.-M., Bulnes, L. C., Caldwell, T. L., Calin-Jageman, R. J., Capaldi, C. A., Carfagno, N. S., Chasten, K. T., Cleeremans, A., Connell, L., DeCicco, J. M., … Zwaan, R. A. (2016). Registered Replication Report: Strack, Martin, & Stepper (1988). Perspectives on Psychological Science, 11(6), 917–928.
Wald, A. (1945). Sequential tests of statistical hypotheses. The Annals of Mathematical Statistics, 16(2), 117–186.
Waldron, S., & Allen, C. (2022). Not all pre-registrations are equal. Neuropsychopharmacology, 47(13), 2181–2183.
Wassmer, G., & Brannath, W. (2016). Group Sequential and Confirmatory Adaptive Designs in Clinical Trials. Springer International Publishing.
Wassmer, G., & Pahlke, F. (2019). Rpact: Confirmatory adaptive clinical trial design and analysis.
Wassmer, G., & Pahlke, F. (2022). Rpact: Confirmatory adaptive clinical trial design and analysis.
Weinshall-Margel, K., & Shapard, J. (2011). Overlooked factors in the analysis of parole decisions. Proceedings of the National Academy of Sciences, 108(42), E833–E833.
Wellek, S. (2010). Testing statistical hypotheses of equivalence and noninferiority (2nd ed). CRC Press.
Westberg, M. (1985). Combining Independent Statistical Tests. Journal of the Royal Statistical Society. Series D (The Statistician), 34(3), 287–296.
Westfall, J., Kenny, D. A., & Judd, C. M. (2014). Statistical power and optimal design in experiments in which samples of participants respond to samples of stimuli. Journal of Experimental Psychology: General, 143(5), 2020–2045.
Westlake, W. J. (1972). Use of Confidence Intervals in Analysis of Comparative Bioavailability Trials. Journal of Pharmaceutical Sciences, 61(8), 1340–1341.
Whitney, S. N. (2016). Balanced Ethics Review. Springer International Publishing.
Wicherts, J. M., Veldkamp, C. L. S., Augusteijn, H. E. M., Bakker, M., Aert, V., M, R. C., Assen, V., & M, M. A. L. (2016). Degrees of Freedom in Planning, Running, Analyzing, and Reporting Psychological Studies: A Checklist to Avoid p-Hacking. Frontiers in Psychology, 7.
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag New York.
Wickham, H. (2021). Tidyverse: Easily install and load the tidyverse.
Wiebels, K., & Moreau, D. (2021). Leveraging Containers for Reproducible Psychological Research. Advances in Methods and Practices in Psychological Science, 4(2), 25152459211017853.
Wigboldus, D. H. J., & Dotsch, R. (2016). Encourage Playing with Data and Discourage Questionable Reporting Practices. Psychometrika, 81(1), 27–32.
Williams, R. H., Zimmerman, D. W., & Zumbo, B. D. (1995). Impact of Measurement Error on Statistical Power: Review of an Old Paradox. The Journal of Experimental Education, 63(4), 363–370.
Wilson, E. C. F. (2015). A Practical Guide to Value of Information Analysis. PharmacoEconomics, 33(2), 105–121.
Wilson VanVoorhis, C. R., & Morgan, B. L. (2007). Understanding power and rules of thumb for determining sample sizes. Tutorials in Quantitative Methods for Psychology, 3(2), 43–50.
Winer, B. J. (1962). Statistical principles in experimental design. New York : McGraw-Hill.
Wingen, T., Berkessel, J. B., & Englich, B. (2020). No Replication, No Trust? How Low Replicability Influences Trust in Psychology. Social Psychological and Personality Science, 11(4), 454–463.
Wiseman, R., Watt, C., & Kornbrot, D. (2019). Registered reports: An early example and analysis. PeerJ, 7, e6232.
Wittes, J., & Brittain, E. (1990). The role of internal pilot studies in increasing the efficiency of clinical trials. Statistics in Medicine, 9(1-2), 65–72.
Wong, T. K., Kiers, H., & Tendeiro, J. (2022). On the Potential Mismatch Between the Function of the Bayes Factor and ResearchersExpectations. Collabra: Psychology, 8(1), 36357.
Wynants, L., Calster, B. V., Collins, G. S., Riley, R. D., Heinze, G., Schuit, E., Bonten, M. M. J., Dahly, D. L., Damen, J. A., Debray, T. P. A., Jong, V. M. T. de, Vos, M. D., Dhiman, P., Haller, M. C., Harhay, M. O., Henckaerts, L., Heus, P., Kammer, M., Kreuzberger, N., … Smeden, M. van. (2020). Prediction models for diagnosis and prognosis of covid-19: Systematic review and critical appraisal. BMJ, 369, m1328.
Xie, Y. (2015). Dynamic documents with R and knitr (2nd ed.). Chapman; Hall/CRC.
Xie, Y. (2022). Bookdown: Authoring books and technical documents with r markdown.
Xie, Y., Dervieux, C., & Riederer, E. (2020). R markdown cookbook. Chapman; Hall/CRC.
Yarkoni, T., & Westfall, J. (2017). Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning. Perspectives on Psychological Science, 12(6), 1100–1122.
Yuan, K.-H., & Maxwell, S. (2005). On the Post Hoc Power in Testing Mean Differences. Journal of Educational and Behavioral Statistics, 30(2), 141–167.
Zabell, S. L. (1992). R. A. Fisher and Fiducial Argument. Statistical Science, 7(3), 369–387.
Zhu, H. (2021). kableExtra: Construct complex table with kable and pipe syntax.
Zumbo, B. D., & Hubley, A. M. (1998). A note on misconceptions concerning prospective and retrospective power. Journal of the Royal Statistical Society: Series D (The Statistician), 47(2), 385–388.