References
Abelson, P. (2003). The Value of Life and
Health for Public Policy. Economic
Record, 79, S2–S13. https://doi.org/10.1111/1475-4932.00087
Aberson, C. L. (2019). Applied Power Analysis for the
Behavioral Sciences (2nd ed.). Routledge.
Aert, R. C. M. van, & Assen, M. A. L. M. van. (2018). Correcting
for Publication Bias in a Meta-Analysis with
the P-uniform* Method. https://doi.org/10.31222/osf.io/zqjr9
Agnoli, F., Wicherts, J. M., Veldkamp, C. L. S., Albiero, P., &
Cubelli, R. (2017). Questionable research practices among italian
research psychologists. PLOS ONE, 12(3), e0172792. https://doi.org/10.1371/journal.pone.0172792
Akker, O. van den, Bakker, M., Assen, M. A. L. M. van, Pennington, C.
R., Verweij, L., Elsherif, M., Claesen, A., Gaillard, S. D. M., Yeung,
S. K., Frankenberger, J.-L., Krautter, K., Cockcroft, J. P., Kreuer, K.
S., Evans, T. R., Heppel, F., Schoch, S. F., Korbmacher, M., Yamada, Y.,
Albayrak-Aydemir, N., … Wicherts, J. (2023, May 10). The
effectiveness of preregistration in psychology: Assessing
preregistration strictness and preregistration-study consistency.
https://doi.org/10.31222/osf.io/h8xjw
Albers, C. J., Kiers, H. A. L., & Ravenzwaaij, D. van. (2018).
Credible Confidence: A Pragmatic View on the
Frequentist vs Bayesian Debate. Collabra:
Psychology, 4(1, 1), 31. https://doi.org/10.1525/collabra.149
Albers, C. J., & Lakens, D. (2018). When power analyses based on
pilot data are biased: Inaccurate effect size estimators
and follow-up bias. Journal of Experimental Social Psychology,
74, 187–195. https://doi.org/10.1016/j.jesp.2017.09.004
Aldhous, P. (2011). Journal rejects studies contradicting
precognition. New Scientist. https://www.newscientist.com/article/dn20447-journal-rejects-studies-contradicting-precognition/
Aldrich, J. (1997). R.A. Fisher and the making
of maximum likelihood 1912-1922. Statistical Science,
12(3), 162–176. https://doi.org/10.1214/ss/1030037906
Allison, D. B., Allison, R. L., Faith, M. S., Paultre, F., &
Pi-Sunyer, F. X. (1997). Power and money: Designing
statistically powerful studies while minimizing financial costs.
Psychological Methods, 2(1), 20–33. https://doi.org/10.1037/1082-989X.2.1.20
Altman, D. G., & Bland, J. M. (1995). Statistics notes:
Absence of evidence is not evidence of absence.
BMJ, 311(7003), 485. https://doi.org/10.1136/bmj.311.7003.485
Altoè, G., Bertoldo, G., Zandonella Callegher, C., Toffalini, E.,
Calcagnì, A., Finos, L., & Pastore, M. (2020). Enhancing
Statistical Inference in Psychological
Research via Prospective and Retrospective
Design Analysis. Frontiers in Psychology, 10.
https://www.frontiersin.org/articles/10.3389/fpsyg.2019.02893
Anderson, M. S., Martinson, B. C., & De Vries, R. (2007). Normative
dissonance in science: Results from a national survey of
US scientists. Journal of Empirical Research on Human
Research Ethics, 2(4), 3–14. http://jre.sagepub.com/content/2/4/3.short
Anderson, M. S., Ronning, E. A., De Vries, R., & Martinson, B. C.
(2007). The perverse effects of competition on scientists’ work and
relationships. Science and Engineering Ethics, 13(4),
437–461. http://link.springer.com/article/10.1007/s11948-007-9042-5
Anderson, S. F., Kelley, K., & Maxwell, S. E. (2017). Sample-size
planning for more accurate statistical power: A method
adjusting sample effect sizes for publication bias and uncertainty.
Psychological Science, 28(11), 1547–1562. https://doi.org/10.1177/0956797617723724
Anderson, S. F., & Maxwell, S. E. (2016). There’s more than one way
to conduct a replication study: Beyond statistical
significance. Psychological Methods, 21(1), 1–12. https://doi.org/10.1037/met0000051
Anvari, F., Kievit, R., Lakens, D., Pennington, C. R., Przybylski, A.
K., Tiokhin, L., Wiernik, B. M., & Orben, A. (2021). Not all effects
are indispensable: Psychological science requires
verifiable lines of reasoning for whether an effect matters.
Perspectives on Psychological Science. https://doi.org/10.31234/osf.io/g3vtr
Anvari, F., & Lakens, D. (2018). The replicability crisis and public
trust in psychological science. Comprehensive Results in Social
Psychology, 3(3), 266–286. https://doi.org/10.1080/23743603.2019.1684822
Anvari, F., & Lakens, D. (2021). Using anchor-based methods to
determine the smallest effect size of interest. Journal of
Experimental Social Psychology, 96, 104159. https://doi.org/10.1016/j.jesp.2021.104159
Appelbaum, M., Cooper, H., Kline, R. B., Mayo-Wilson, E., Nezu, A. M.,
& Rao, S. M. (2018). Journal article reporting standards for
quantitative research in psychology: The APA Publications
and Communications Board task force report. American
Psychologist, 73(1), 3. https://doi.org/10.1037/amp0000191
Armitage, P., McPherson, C. K., & Rowe, B. C. (1969). Repeated
significance tests on accumulating data. Journal of the Royal
Statistical Society: Series A (General), 132(2), 235–244.
Arslan, R. C. (2019). How to Automatically Document Data
With the codebook Package to Facilitate Data
Reuse. Advances in Methods and Practices in Psychological
Science, 2515245919838783. https://doi.org/10.1177/2515245919838783
Azrin, N. H., Holz, W., Ulrich, R., & Goldiamond, I. (1961). The
control of the content of conversation through reinforcement.
Journal of the Experimental Analysis of Behavior, 4,
25–30. https://doi.org/10.1901/jeab.1961.4-25
Babbage, C. (1830). Reflections on the Decline of
Science in England: And on
Some of Its Causes. B. Fellowes. http://archive.org/details/reflectionsonde00mollgoog
Bacchetti, P. (2010). Current sample size conventions:
Flaws, harms, and alternatives. BMC Medicine,
8(1), 17. https://doi.org/10.1186/1741-7015-8-17
Baguley, T. (2004). Understanding statistical power in the context of
applied research. Applied Ergonomics, 35(2), 73–80. https://doi.org/10.1016/j.apergo.2004.01.002
Baguley, T. (2009). Standardized or simple effect size:
What should be reported? British Journal of
Psychology, 100(3), 603–617. https://doi.org/10.1348/000712608X377117
Baguley, T. (2012). Serious stats: A guide to advanced statistics
for the behavioral sciences. Palgrave Macmillan.
Bakan, D. (1966). The test of significance in psychological research.
Psychological Bulletin, 66(6), 423–437. https://doi.org/10.1037/h0020412
Bakan, D. (1967). On method: Toward a reconstruction of
psychological investigation. San Francisco, Jossey-Bass. http://archive.org/details/onmethodtowardre0000baka_y0b7
Bakker, B. N., Kokil, J., Dörr, T., Fasching, N., & Lelkes, Y.
(2021). Questionable and Open Research Practices:
Attitudes and Perceptions among
Quantitative Communication Researchers. Journal of
Communication, 71(5), 715–738. https://doi.org/10.1093/joc/jqab031
Ball, K., Berch, D. B., Helmers, K. F., Jobe, J. B., Leveck, M. D.,
Marsiske, M., Morris, J. N., Rebok, G. W., Smith, D. M., &
Tennstedt, S. L. (2002). Effects of cognitive training interventions
with older adults: A randomized controlled trial. Jama,
288(18), 2271–2281.
Barber, T. X. (1976). Pitfalls in Human Research:
Ten Pivotal Points. Pergamon Press. https://books.google.com?id=UBN9AAAAMAAJ
Bartoš, F., & Schimmack, U. (2020). Z-Curve.2.0:
Estimating Replication Rates and Discovery
Rates. https://doi.org/10.31234/osf.io/urgtn
Bauer, P., & Kieser, M. (1996). A unifying approach for confidence
intervals and testing of equivalence and difference.
Biometrika, 83(4), 934–937. http://biomet.oxfordjournals.org/content/83/4/934.short
Bausell, R. B., & Li, Y.-F. (2002). Power Analysis
for Experimental Research: A Practical Guide
for the Biological, Medical and Social
Sciences (1st edition). Cambridge University Press.
Beck, W. S. (1957). Modern Science and the nature of
life (First Edition). Harcourt, Brace.
Becker, B. J. (2005). Failsafe N or File-Drawer
Number. In Publication Bias in
Meta-Analysis (pp. 111–125). John Wiley & Sons,
Ltd. https://doi.org/10.1002/0470870168.ch7
Bem, D. J. (2011). Feeling the future: Experimental evidence for
anomalous retroactive influences on cognition and affect. Journal of
Personality and Social Psychology, 100(3), 407–425. https://doi.org/10.1037/a0021524
Bem, D. J., Utts, J., & Johnson, W. O. (2011). Must psychologists
change the way they analyze their data? Journal of Personality and
Social Psychology, 101(4), 716–719. https://doi.org/10.1037/a0024777
Bender, R., & Lange, S. (2001). Adjusting for multiple testing—when
and how? Journal of Clinical Epidemiology, 54(4),
343–349. http://www.sciencedirect.com/science/article/pii/S0895435600003140
Benjamini, Y. (2016). It’s Not the p-values’
Fault. The American Statistician: Supplemental Material
to the ASA Statement on P-Values and Statistical Significance,
70, 1–2. http://amstat.tandfonline.com/doi/suppl/10.1080/00031305.2016.1154108/suppl_file/utas_a_1154108_sm
5354.pdf
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false
discovery rate: A practical and powerful approach to multiple testing.
Journal of the Royal Statistical Society. Series B
(Methodological), 289–300. http://www.jstor.org/stable/2346101
Ben-Shachar, M. S., Lüdecke, D., & Makowski, D. (2020). Effectsize:
Estimation of Effect Size Indices and
Standardized Parameters. Journal of Open Source
Software, 5(56), 2815. https://doi.org/10.21105/joss.02815
Berger, J. O., & Bayarri, M. J. (2004). The Interplay
of Bayesian and Frequentist Analysis.
Statistical Science, 19(1), 58–80. https://doi.org/10.1214/088342304000000116
Berkeley, G. (1735). A defence of free-thinking in mathematics, in
answer to a pamphlet of Philalethes Cantabrigiensis
entitled Geometry No Friend to Infidelity.
Also an appendix concerning mr. Walton’s
Vindication of the principles of fluxions against the
objections contained in The analyst. By the
author of The minute philosopher (Vol. 3). https://www.maths.tcd.ie/pub/HistMath/People/Berkeley/Defence/Defence.html
Bird, S. B., & Sivilotti, M. L. A. (2008). Self-plagiarism,
recycling fraud, and the intent to mislead. Journal of Medical
Toxicology, 4(2), 69–70. https://doi.org/10.1007/BF03160957
Bishop, D. V. M. (2018). Fallibility in Science:
Responding to Errors in the Work
of Oneself and Others. Advances in Methods
and Practices in Psychological Science, 2515245918776632. https://doi.org/10.1177/2515245918776632
Bland, M. (2015). An introduction to medical statistics (Fourth
edition). Oxford University Press.
Bonett, D. G. (2012). Replication-Extension Studies.
Current Directions in Psychological Science, 21(6),
409–412. https://doi.org/10.1177/0963721412459512
Borenstein, M. (Ed.). (2009). Introduction to meta-analysis.
John Wiley & Sons.
Bosco, F. A., Aguinis, H., Singh, K., Field, J. G., & Pierce, C. A.
(2015). Correlational effect size benchmarks. The Journal of Applied
Psychology, 100(2), 431–449. https://doi.org/10.1037/a0038047
Bozarth, J. D., & Roberts, R. R. (1972). Signifying significant
significance. American Psychologist, 27(8), 774.
Bretz, F., Hothorn, T., & Westfall, P. H. (2011). Multiple
comparisons using R. CRC Press.
Bross, I. D. (1971). Critical levels, statistical language and
scientific inference. In Foundations of statistical inference
(pp. 500–513). Holt, Rinehart and Winston.
Brown, G. W. (1983). Errors, Types I and II.
American Journal of Diseases of Children, 137(6),
586–591. https://doi.org/10.1001/archpedi.1983.02140320062014
Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM
Test: A Simple Technique Detects Numerous Anomalies
in the Reporting of Results in
Psychology. Social Psychological and Personality
Science, 8(4), 363–369. https://doi.org/10.1177/1948550616673876
Brunner, J., & Schimmack, U. (2020). Estimating Population
Mean Power Under Conditions of Heterogeneity and
Selection for Significance.
Meta-Psychology, 4. https://doi.org/10.15626/MP.2018.874
Bryan, C. J., Tipton, E., & Yeager, D. S. (2021). Behavioural
science is unlikely to change the world without a heterogeneity
revolution. Nature Human Behaviour, 1–10. https://doi.org/10.1038/s41562-021-01143-3
Brysbaert, M. (2019). How many participants do we have to include in
properly powered experiments? A tutorial of power analysis
with reference tables. Journal of Cognition, 2(1), 16.
https://doi.org/10.5334/joc.72
Brysbaert, M., & Stevens, M. (2018). Power Analysis and
Effect Size in Mixed Effects Models: A
Tutorial. Journal of Cognition, 1(1). https://doi.org/10.5334/joc.10
Buchanan, E. M., Scofield, J., & Valentine, K. D. (2017).
MOTE: Effect Size and Confidence
Interval Calculator. (Version 0.0.0.9100.).
Bulus, Metin, & Dong, N. (2021). Bound Constrained
Optimization of Sample Sizes Subject to
Monetary Restrictions in Planning Multilevel
Randomized Trials and Regression Discontinuity
Studies. The Journal of Experimental Education,
89(2), 379–401. https://doi.org/10.1080/00220973.2019.1636197
Bulus, M., & Polat, C. (2023). pwrss R paketi ile istatistiksel
güç analizi [Statistical power analysis with pwrss R package]. https://osf.io/https://osf.io/ua5fc
Burriss, R. P., Troscianko, J., Lovell, P. G., Fulford, A. J. C.,
Stevens, M., Quigley, R., Payne, J., Saxton, T. K., & Rowland, H. M.
(2015). Changes in women’s facial skin color over the ovulatory cycle
are not detectable by the human visual system. PLOS ONE,
10(7), e0130093. https://doi.org/10.1371/journal.pone.0130093
Caplan, A. L. (2021). How Should We Regard Information
Gathered in Nazi Experiments? AMA Journal of
Ethics, 23(1), 55–58. https://doi.org/10.1001/amajethics.2021.55
Carter, E. C., & McCullough, M. E. (2014). Publication bias and the
limited strength model of self-control: Has the evidence for ego
depletion been overestimated? Frontiers in Psychology,
5. https://doi.org/10.3389/fpsyg.2014.00823
Carter, E. C., Schönbrodt, F. D., Gervais, W. M., & Hilgard, J.
(2019). Correcting for Bias in Psychology:
A Comparison of Meta-Analytic Methods.
Advances in Methods and Practices in Psychological Science,
2(2), 115–144. https://doi.org/10.1177/2515245919847196
Cascio, W. F., & Zedeck, S. (1983). Open a New Window
in Rational Research Planning: Adjust Alpha to
Maximize Statistical Power. Personnel Psychology,
36(3), 517–526. https://doi.org/10.1111/j.1744-6570.1983.tb02233.x
Ceci, S. J., & Bjork, R. A. (2000). Psychological
Science in the Public Interest: The
Case for Juried Analyses. Psychological
Science, 11(3), 177–178. https://doi.org/10.1111/1467-9280.00237
Cevolani, G., Crupi, V., & Festa, R. (2011). Verisimilitude and
belief change for conjunctive theories. Erkenntnis,
75(2), 183. http://link.springer.com/article/10.1007/s10670-011-9290-2
Chalmers, I., & Glasziou, P. (2009). Avoidable waste in the
production and reporting of research evidence. The Lancet,
374(9683), 86–89.
Chamberlin, T. C. (1890). The Method of Multiple
Working Hypotheses. Science, ns-15(366), 92–96.
https://doi.org/10.1126/science.ns-15.366.92
Chambers, C. D., & Tzavella, L. (2022). The past, present and future
of Registered Reports. Nature Human Behaviour,
6(1, 1), 29–42. https://doi.org/10.1038/s41562-021-01193-7
Chang, H. (2022). Realism for Realistic People: A
New Pragmatist Philosophy of Science. Cambridge
University Press. https://doi.org/10.1017/9781108635738
Chang, M. (2016). Adaptive Design Theory and
Implementation Using SAS and R (2nd
edition). Chapman and Hall/CRC.
Chatziathanasiou, K. (2022). Beware the Lure of
Narratives: “Hungry Judges”
Should not Motivate the Use of
“Artificial Intelligence” in
Law (SSRN Scholarly Paper ID 4011603). Social Science
Research Network. https://doi.org/10.2139/ssrn.4011603
Chin, J. M., Pickett, J. T., Vazire, S., & Holcombe, A. O. (2021).
Questionable Research Practices and Open
Science in Quantitative Criminology. Journal of
Quantitative Criminology. https://doi.org/10.1007/s10940-021-09525-6
Cho, H.-C., & Abe, S. (2013). Is two-tailed testing for directional
research hypotheses tests legitimate? Journal of Business
Research, 66(9), 1261–1266. https://doi.org/10.1016/j.jbusres.2012.02.023
Cohen, J. (1988). Statistical power analysis for the behavioral
sciences (2nd ed). L. Erlbaum Associates.
Cohen, J. (1990). Things I have learned (so far).
American Psychologist, 45(12), 1304–1312. https://doi.org/10.1037/0003-066X.45.12.1304
Cohen, J. (1994). The earth is round (p < .05). American
Psychologist, 49(12), 997–1003. https://doi.org/10.1037/0003-066X.49.12.997
Coles, N. A., March, D. S., Marmolejo-Ramos, F., Larsen, J. T., Arinze,
N. C., Ndukaihe, I. L. G., Willis, M. L., Foroni, F., Reggev, N.,
Mokady, A., Forscher, P. S., Hunter, J. F., Kaminski, G., Yüvrük, E.,
Kapucu, A., Nagy, T., Hajdu, N., Tejada, J., Freitag, R. M. K., …
Liuzza, M. T. (2022). A multi-lab test of the facial feedback hypothesis
by the Many Smiles Collaboration. Nature Human
Behaviour, 6(12, 12), 1731–1742. https://doi.org/10.1038/s41562-022-01458-9
Colling, L. J., Szűcs, D., De Marco, D., Cipora, K., Ulrich, R., Nuerk,
H.-C., Soltanlou, M., Bryce, D., Chen, S.-C., Schroeder, P. A., Henare,
D. T., Chrystall, C. K., Corballis, P. M., Ansari, D., Goffin, C.,
Sokolowski, H. M., Hancock, P. J. B., Millen, A. E., Langton, S. R. H.,
… McShane, B. B. (2020). Registered Replication Report on
Fischer, Castel, Dodd, and
Pratt (2003). Advances in Methods and Practices in
Psychological Science, 3(2), 143–162. https://doi.org/10.1177/2515245920903079
Colquhoun, D. (2019). The False Positive Risk: A
Proposal Concerning What to Do About
p-Values. The American Statistician, 73,
192–201. https://doi.org/10.1080/00031305.2018.1529622
Cook, J., Hislop, J., Adewuyi, T., Harrild, K., Altman, D., Ramsay, C.,
Fraser, C., Buckley, B., Fayers, P., Harvey, I., Briggs, A., Norrie, J.,
Fergusson, D., Ford, I., & Vale, L. (2014). Assessing methods to
specify the target difference for a randomised controlled trial:
DELTA (Difference ELicitation in
TriAls) review. Health Technology Assessment,
18(28). https://doi.org/10.3310/hta18280
Cook, T. D. (2002). P-Value Adjustment in Sequential
Clinical Trials. Biometrics, 58(4), 1005–1011.
Cooper, H. (2020). Reporting quantitative research in psychology:
How to meet APA Style Journal Article Reporting
Standards (2nd ed.). American Psychological Association. https://doi.org/10.1037/0000178-000
Cooper, H. M., Hedges, L. V., & Valentine, J. C. (Eds.). (2009).
The handbook of research synthesis and meta-analysis (2nd ed).
Russell Sage Foundation.
Copay, A. G., Subach, B. R., Glassman, S. D., Polly, D. W., &
Schuler, T. C. (2007). Understanding the minimum clinically important
difference: A review of concepts and methods. The Spine
Journal, 7(5), 541–546. https://doi.org/10.1016/j.spinee.2007.01.008
Corneille, O., Havemann, J., Henderson, E. L., IJzerman, H., Hussey, I.,
Orban de Xivry, J.-J., Jussim, L., Holmes, N. P., Pilacinski, A.,
Beffara, B., Carroll, H., Outa, N. O., Lush, P., & Lotter, L. D.
(2023). Beware “persuasive communication devices” when
writing and reading scientific articles. eLife, 12,
e88654. https://doi.org/10.7554/eLife.88654
Correll, J., Mellinger, C., McClelland, G. H., & Judd, C. M. (2020).
Avoid Cohen’s “Small,”
“Medium,” and
“Large” for Power Analysis.
Trends in Cognitive Sciences, 24(3), 200–207. https://doi.org/10.1016/j.tics.2019.12.009
Cousineau, D., & Chiasson, F. (2019). Superb:
Computes standard error and confidence interval of means
under various designs and sampling schemes [Manual].
Cowles, M., & Davis, C. (1982). On the origins of the. 05 level of
statistical significance. American Psychologist,
37(5), 553. http://psycnet.apa.org/journals/amp/37/5/553/
Cox, D. R. (1958). Some Problems Connected with
Statistical Inference. Annals of Mathematical
Statistics, 29(2), 357–372. https://doi.org/10.1214/aoms/1177706618
Cribbie, R. A., Gruman, J. A., & Arpin-Cribbie, C. A. (2004).
Recommendations for applying tests of equivalence. Journal of
Clinical Psychology, 60(1), 1–10.
Crusius, J., Gonzalez, M. F., Lange, J., & Cohen-Charash, Y. (2020).
Envy: An Adversarial Review and Comparison of
Two Competing Views. Emotion Review,
12(1), 3–21. https://doi.org/10.1177/1754073919873131
Crüwell, S., Apthorp, D., Baker, B. J., Colling, L., Elson, M., Geiger,
S. J., Lobentanzer, S., Monéger, J., Patterson, A., Schwarzkopf, D. S.,
Zaneva, M., & Brown, N. J. L. (2023). What’s in a
Badge? A Computational Reproducibility
Investigation of the Open Data Badge Policy in
One Issue of Psychological Science.
Psychological Science, 09567976221140828. https://doi.org/10.1177/09567976221140828
Cumming, G. (2008). Replication and p
Intervals: p Values
Predict the Future Only Vaguely, but
Confidence Intervals Do Much Better. Perspectives on
Psychological Science, 3(4), 286–300. https://doi.org/10.1111/j.1745-6924.2008.00079.x
Cumming, G. (2013). Understanding the new statistics:
Effect sizes, confidence intervals, and meta-analysis.
Routledge. https://books.google.nl/books?hl=nl&lr=&id=1W6laNc7Xt8C&oi=fnd&pg=PR1&dq=cumming+new+statistics&ots=PuKTSHb51T&sig=U-_k1y1YyREjLxP-FMfw4ood9OI
Cumming, G. (2014). The New Statistics: Why
and How. Psychological Science, 25(1),
7–29. https://doi.org/10.1177/0956797613504966
Cumming, G., & Calin-Jageman, R. (2016). Introduction to the
New Statistics: Estimation, Open
Science, and Beyond. Routledge. https://books.google.com?id=KR8xDQAAQBAJ
Cumming, G., & Maillardet, R. (2006). Confidence intervals and
replication: Where will the next mean fall?
Psychological Methods, 11(3), 217–227. https://doi.org/10.1037/1082-989X.11.3.217
Danziger, S., Levav, J., & Avnaim-Pesso, L. (2011). Extraneous
factors in judicial decisions. Proceedings of the National Academy
of Sciences, 108(17), 6889–6892. https://doi.org/10.1073/PNAS.1018033108
de Groot, A. D. (1969). Methodology (Vol. 6). Mouton & Co.
de Heide, R., & Grünwald, P. D. (2017). Why optional stopping is
a problem for Bayesians. http://arxiv.org/abs/1708.08278
DeBruine, L. M., & Barr, D. J. (2021). Understanding
Mixed-Effects Models Through Data Simulation. Advances
in Methods and Practices in Psychological Science, 4(1),
2515245920965119. https://doi.org/10.1177/2515245920965119
Delacre, M., Lakens, D., Ley, C., Liu, L., & Leys, C. (2021).
Why Hedges’ g*s based on the non-pooled standard
deviation should be reported with Welch’s t-test. https://doi.org/10.31234/osf.io/tu6mp
Delacre, M., Lakens, D., & Leys, C. (2017). Why Psychologists
Should by Default Use Welch’s
t-test Instead of
Student’s t-test. International
Review of Social Psychology, 30(1). https://doi.org/10.5334/irsp.82
Detsky, A. S. (1990). Using cost-effectiveness analysis to improve the
efficiency of allocating funds to clinical trials. Statistics in
Medicine, 9(1-2), 173–184. https://doi.org/10.1002/sim.4780090124
Dienes, Z. (2008). Understanding psychology as a science:
An introduction to scientific and statistical
inference. Palgrave Macmillan.
Dienes, Z. (2014). Using Bayes to get the most out of
non-significant results. Frontiers in Psychology, 5.
https://doi.org/10.3389/fpsyg.2014.00781
Dmitrienko, A., & D’Agostino Sr, R. (2013). Traditional multiplicity
adjustment methods in clinical trials. Statistics in Medicine,
32(29), 5172–5218. https://doi.org/10.1002/sim.5990
Dodge, H. F., & Romig, H. G. (1929). A Method of
Sampling Inspection. Bell System Technical
Journal, 8(4), 613–631. https://doi.org/10.1002/j.1538-7305.1929.tb01240.x
Dongen, N. N. N. van, Doorn, J. B. van, Gronau, Q. F., Ravenzwaaij, D.
van, Hoekstra, R., Haucke, M. N., Lakens, D., Hennig, C., Morey, R. D.,
Homer, S., Gelman, A., Sprenger, J., & Wagenmakers, E.-J. (2019).
Multiple Perspectives on Inference for
Two Simple Statistical Scenarios. The American
Statistician, 73, 328–339. https://doi.org/10.1080/00031305.2019.1565553
Douglas, H. E. (2009). Science, policy, and the value-free
ideal. University of Pittsburgh Press.
Dubin, R. (1969). Theory building. Free Press. http://catalog.hathitrust.org/api/volumes/oclc/160506.html
Duhem, P. (1954). The aim and structure of physical theory.
Princeton University Press.
Dupont, W. D. (1983). Sequential stopping rules and sequentially
adjusted P values: Does one require the other?
Controlled Clinical Trials, 4(1), 3–10. https://doi.org/10.1016/S0197-2456(83)80003-8
Duyx, B., Urlings, M. J. E., Swaen, G. M. H., Bouter, L. M., &
Zeegers, M. P. (2017). Scientific citations favor positive results: A
systematic review and meta-analysis. Journal of Clinical
Epidemiology, 88, 92–101. https://doi.org/10.1016/j.jclinepi.2017.06.002
Ebersole, C. R., Atherton, O. E., Belanger, A. L., Skulborstad, H. M.,
Allen, J. M., Banks, J. B., Baranski, E., Bernstein, M. J., Bonfiglio,
D. B. V., Boucher, L., Brown, E. R., Budiman, N. I., Cairo, A. H.,
Capaldi, C. A., Chartier, C. R., Chung, J. M., Cicero, D. C., Coleman,
J. A., Conway, J. G., … Nosek, B. A. (2016). Many Labs 3:
Evaluating participant pool quality across the academic
semester via replication. Journal of Experimental Social
Psychology, 67, 68–82. https://doi.org/10.1016/j.jesp.2015.10.012
Ebersole, C. R., Mathur, M. B., Baranski, E., Bart-Plange, D.-J.,
Buttrick, N. R., Chartier, C. R., Corker, K. S., Corley, M., Hartshorne,
J. K., IJzerman, H., Lazarević, L. B., Rabagliati, H., Ropovik, I.,
Aczel, B., Aeschbach, L. F., Andrighetto, L., Arnal, J. D., Arrow, H.,
Babincak, P., … Nosek, B. A. (2020). Many Labs 5:
Testing Pre-Data-Collection Peer Review as an
Intervention to Increase Replicability.
Advances in Methods and Practices in Psychological Science,
3(3), 309–331. https://doi.org/10.1177/2515245920958687
Eckermann, S., Karnon, J., & Willan, A. R. (2010). The
Value of Value of Information.
PharmacoEconomics, 28(9), 699–709. https://doi.org/10.2165/11537370-000000000-00000
Edwards, M. A., & Roy, S. (2017). Academic Research in
the 21st Century: Maintaining Scientific
Integrity in a Climate of Perverse
Incentives and Hypercompetition. Environmental
Engineering Science, 34(1), 51–61. https://doi.org/10.1089/ees.2016.0223
Elson, M., Mohseni, M. R., Breuer, J., Scharkow, M., & Quandt, T.
(2014). Press CRTT to measure aggressive behavior: The
unstandardized use of the competitive reaction time task in aggression
research. Psychological Assessment, 26(2), 419–432. https://doi.org/10.1037/a0035569
Epstein, S. (1980). The stability of behavior: II.
Implications for psychological research. American
Psychologist, 35(9), 790–806. https://doi.org/10.1037/0003-066X.35.9.790
Erdfelder, E., Faul, F., & Buchner, A. (1996). GPOWER:
A general power analysis program. Behavior Research
Methods, Instruments, & Computers, 28(1), 1–11. https://doi.org/10.3758/BF03203630
Eysenck, H. J. (1978). An exercise in mega-silliness. American
Psychologist, 33(5), 517–517. https://doi.org/10.1037/0003-066X.33.5.517.a
Fanelli, D. (2010). “Positive” Results
Increase Down the Hierarchy of the
Sciences. PLoS ONE, 5(4). https://doi.org/10.1371/journal.pone.0010068
Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007).
GPower 3: A flexible statistical power
analysis program for the social, behavioral, and biomedical sciences.
Behavior Research Methods, 39(2), 175–191. https://doi.org/10.3758/BF03193146
Ferguson, C. J. (2014). Comment: Why meta-analyses rarely
resolve ideological debates. Emotion Review, 6(3),
251–252. http://journals.sagepub.com/doi/abs/10.1177/1754073914523046
Ferguson, C. J., & Heene, M. (2012). A vast graveyard of undead
theories publication bias and psychological science’s aversion to the
null. Perspectives on Psychological Science, 7(6),
555–561. http://pps.sagepub.com/content/7/6/555.short
Ferguson, C. J., & Heene, M. (2021). Providing a lower-bound
estimate for psychology’s “crud factor”: The
case of aggression. Professional Psychology: Research and
Practice, 52(6), 620–626. https://doi.org/http://dx.doi.org/10.1037/pro0000386
Ferguson, C., Marcus, A., & Oransky, I. (2014). Publishing:
The peer-review scam. Nature, 515(7528,
7528), 480–482. https://doi.org/10.1038/515480a
Ferron, J., & Onghena, P. (1996). The Power of
Randomization Tests for Single-Case Phase
Designs. The Journal of Experimental Education,
64(3), 231–239. https://doi.org/10.1080/00220973.1996.9943805
Feyerabend, P. (1993). Against method (3rd ed). Verso.
Feynman, R. P. (1974). Cargo cult science. Engineering and
Science, 37(7), 10–13.
Fiedler, K. (2004). Tools, toys, truisms, and theories:
Some thoughts on the creative cycle of theory formation.
Personality and Social Psychology Review, 8(2),
123–131. https://doi.org/10.1207/s15327957pspr0802_5
Fiedler, K., & Schwarz, N. (2016). Questionable Research
Practices Revisited. Social Psychological and Personality
Science, 7(1), 45–52. https://doi.org/10.1177/1948550615612150
Field, S. A., Tyre, A. J., Jonzén, N., Rhodes, J. R., & Possingham,
H. P. (2004). Minimizing the cost of environmental management decisions
by optimizing statistical thresholds. Ecology Letters,
7(8), 669–675. https://doi.org/10.1111/j.1461-0248.2004.00625.x
Fisher, Ronald Aylmer. (1935). The design of experiments.
Oliver And Boyd; Edinburgh; London.
Fisher, Ronald A. (1936). Has Mendel’s work been
rediscovered? Annals of Science, 1(2), 115–137.
Fisher, Ronald A. (1956). Statistical methods and scientific
inference: Vol. viii. Hafner Publishing Co.
Fishman, D. B., & Neigher, W. D. (1982). American psychology in the
eighties: Who will buy? American Psychologist,
37(5), 533–546. https://doi.org/10.1037/0003-066X.37.5.533
Fraley, R. C., & Vazire, S. (2014). The N-Pact Factor:
Evaluating the Quality of Empirical
Journals with Respect to Sample Size
and Statistical Power. PLOS ONE, 9(10),
e109019. https://doi.org/10.1371/journal.pone.0109019
Francis, G. (2014). The frequency of excess success for articles in
Psychological Science. Psychonomic Bulletin &
Review, 21(5), 1180–1187. https://doi.org/10.3758/s13423-014-0601-x
Francis, G. (2016). Equivalent statistics and data interpretation.
Behavior Research Methods, 1–15. https://doi.org/10.3758/s13428-016-0812-3
Franco, A., Malhotra, N., & Simonovits, G. (2014). Publication bias
in the social sciences: Unlocking the file drawer.
Science, 345(6203), 1502–1505. https://doi.org/10.1126/SCIENCE.1255484
Frankenhuis, W. E., Panchanathan, K., & Smaldino, P. E. (2022).
Strategic ambiguity in the social sciences. Social Psychological
Bulletin. https://www.psycharchives.org/en/item/e5bb9192-80a4-4ae4-9cda-5d144008196e
Fraser, H., Parker, T., Nakagawa, S., Barnett, A., & Fidler, F.
(2018). Questionable research practices in ecology and evolution.
PLOS ONE, 13(7), e0200303. https://doi.org/10.1371/journal.pone.0200303
Freedman, J. L., & Fraser, S. C. (1966). Compliance without
pressure: The foot-in-the-door technique. Journal of
Personality and Social Psychology, 4(2), 195–202. https://doi.org/10.1037/h0023552
Freiman, J. A., Chalmers, T. C., Smith, H., & Kuebler, R. R. (1978).
The importance of beta, the type II error and sample size
in the design and interpretation of the randomized control trial.
Survey of 71 "negative" trials. The New England Journal
of Medicine, 299(13), 690–694. https://doi.org/10.1056/NEJM197809282991304
Frick, R. W. (1996). The appropriate use of null hypothesis testing.
Psychological Methods, 1(4), 379–390. https://doi.org/10.1037/1082-989X.1.4.379
Fricker, R. D., Burke, K., Han, X., & Woodall, W. H. (2019).
Assessing the Statistical Analyses Used in
Basic and Applied Social Psychology After
Their p-Value Ban. The American
Statistician, 73, 374–384. https://doi.org/10.1080/00031305.2018.1537892
Fried, B. J., Boers, M., & Baker, P. R. (1993). A method for
achieving consensus on rheumatoid arthritis outcome measures: The
OMERACT conference process. The Journal of
Rheumatology, 20(3), 548–551.
Friede, T., & Kieser, M. (2006). Sample size recalculation in
internal pilot study designs: A review. Biometrical Journal: Journal
of Mathematical Methods in Biosciences, 48(4), 537–555. https://doi.org/10.1002/bimj.200510238
Friedlander, F. (1964). Type I and Type II
Bias. American Psychologist, 19(3), 198–199. https://doi.org/10.1037/h0038977
Fugard, A. J. B., & Potts, H. W. W. (2015). Supporting thinking on
sample sizes for thematic analyses: A quantitative tool.
International Journal of Social Research Methodology,
18(6), 669–684. https://doi.org/10.1080/13645579.2015.1005453
Funder, D. C., & Ozer, D. J. (2019). Evaluating effect size in
psychological research: Sense and nonsense. Advances in
Methods and Practices in Psychological Science, 2(2),
156–168. https://doi.org/10.1177/2515245919847202
Gannon, M. A., de Bragança Pereira, C. A., & Polpo, A. (2019).
Blending Bayesian and Classical Tools to
Define Optimal Sample-Size-Dependent Significance Levels.
The American Statistician, 73, 213–222. https://doi.org/10.1080/00031305.2018.1518268
Gelman, A., & Carlin, J. (2014). Beyond Power
Calculations: Assessing Type S (Sign)
and Type M (Magnitude) Errors.
Perspectives on Psychological Science, 9(6), 641–651.
https://pdfs.semanticscholar.org/8658/058a37d6e2d0d58b4da7efbc5a653544ee64.pdf
Gerring, J. (2012). Mere Description. British Journal
of Political Science, 42(4), 721–746. https://doi.org/10.1017/S0007123412000130
Gillon, R. (1994). Medical ethics: Four principles plus attention to
scope. BMJ, 309(6948), 184. https://doi.org/10.1136/bmj.309.6948.184
Glöckner, A. (2016). The irrational hungry judge effect revisited:
Simulations reveal that the magnitude of the effect is
overestimated. Judgment and Decision Making, 11(6),
601–610.
Glover, S., & Dixon, P. (2004). Likelihood ratios: A
simple and flexible statistic for empirical psychologists.
Psychonomic Bulletin & Review, 11(5), 791–806.
Goldacre, B., DeVito, N. J., Heneghan, C., Irving, F., Bacon, S.,
Fleminger, J., & Curtis, H. (2018). Compliance with requirement to
report results on the EU Clinical Trials Register: Cohort
study and web resource. BMJ, 362, k3218. https://doi.org/10.1136/bmj.k3218
Good, I. J. (1992). The Bayes/Non-Bayes
compromise: A brief review. Journal of the American
Statistical Association, 87(419), 597–606. https://doi.org/10.2307/2290192
Goodyear-Smith, F. A., van Driel, M. L., Arroll, B., & Del Mar, C.
(2012). Analysis of decisions made in meta-analyses of depression
screening and the risk of confirmation bias: A case study.
BMC Medical Research Methodology, 12, 76. https://doi.org/10.1186/1471-2288-12-76
Gopalakrishna, G., Riet, G. ter, Vink, G., Stoop, I., Wicherts, J. M.,
& Bouter, L. M. (2022). Prevalence of questionable research
practices, research misconduct and their potential explanatory factors:
A survey among academic researchers in The
Netherlands. PLOS ONE, 17(2), e0263023. https://doi.org/10.1371/journal.pone.0263023
Gosset, W. S. (1904). The Application of the
"Law of Error" to the Work of the
Brewery (1 vol 8; pp. 3–16). Arthur Guinness &
Son, Ltd.
Green, P., & MacLeod, C. J. (2016). SIMR: An
R package for power analysis of generalized linear mixed
models by simulation. Methods in Ecology and Evolution,
7(4), 493–498. https://doi.org/10.1111/2041-210X.12504
Green, S. B. (1991). How Many Subjects Does It Take To Do A
Regression Analysis. Multivariate Behavioral Research,
26(3), 499–510. https://doi.org/10.1207/s15327906mbr2603_7
Greenwald, A. G. (1975). Consequences of prejudice against the null
hypothesis. Psychological Bulletin, 82(1), 1–20. http://psycnet.apa.org/journals/bul/82/1/1/
Greenwald, A. G. (Ed.). (1976). An editorial. Journal of Personality
and Social Psychology, 33(1), 1–7. https://doi.org/10.1037/h0078635
Grünwald, P., de Heide, R., & Koolen, W. (2019). Safe
Testing. http://arxiv.org/abs/1906.07801
Gupta, S. K. (2011). Intention-to-treat concept: A review.
Perspectives in Clinical Research, 2(3), 109–112. https://doi.org/10.4103/2229-3485.83221
Hacking, I. (1965). Logic of Statistical
Inference. Cambridge University Press.
Hagger, M. S., Chatzisarantis, N. L. D., Alberts, H., Anggono, C. O.,
Batailler, C., Birt, A. R., Brand, R., Brandt, M. J., Brewer, G.,
Bruyneel, S., Calvillo, D. P., Campbell, W. K., Cannon, P. R., Carlucci,
M., Carruth, N. P., Cheung, T., Crowell, A., De Ridder, D. T. D.,
Dewitte, S., … Zwienenberg, M. (2016). A Multilab Preregistered
Replication of the Ego-Depletion Effect.
Perspectives on Psychological Science, 11(4), 546–573.
https://doi.org/10.1177/1745691616652873
Hallahan, M., & Rosenthal, R. (1996). Statistical power:
Concepts, procedures, and applications. Behaviour
Research and Therapy, 34(5), 489–499. https://doi.org/10.1016/0005-7967(95)00082-8
Hallinan, D., Boehm, F., Külpmann, A., & Elson, M. (2023).
Information Provision for Informed Consent
Procedures in Psychological Research Under the
General Data Protection Regulation: A Practical
Guide. Advances in Methods and Practices in Psychological
Science, 6(1), 25152459231151944. https://doi.org/10.1177/25152459231151944
Halpern, J., Brown Jr, B. W., & Hornberger, J. (2001). The sample
size for a clinical trial: A Bayesian decision theoretic
approach. Statistics in Medicine, 20(6), 841–858. https://doi.org/10.1002/sim.703
Halpern, S. D., Karlawish, J. H., & Berlin, J. A. (2002). The
continuing unethical conduct of underpowered clinical trials.
Jama, 288(3), 358–362. https://doi.org/doi:10.1001/jama.288.3.358
Hand, D. J. (1994). Deconstructing Statistical Questions.
Journal of the Royal Statistical Society. Series A (Statistics in
Society), 157(3), 317–356. https://doi.org/10.2307/2983526
Hardwicke, T. E., Mathur, M. B., MacDonald, K., Nilsonne, G., Banks, G.
C., Kidwell, M. C., Mohr, A. H., Clayton, E., Yoon, E. J., Tessler, M.
H., Lenne, R. L., Altman, S., Long, B., & Frank, M. C. (2018). Data
availability, reusability, and analytic reproducibility: Evaluating the
impact of a mandatory open data policy at the journal
Cognition. Open Science, 5(8), 180448. https://doi.org/10.1098/rsos.180448
Harms, C., & Lakens, D. (2018). Making ’null effects’ informative:
Statistical techniques and inferential frameworks. Journal of
Clinical and Translational Research, 3, 382–393. https://doi.org/10.18053/jctres.03.2017S2.007
Harrer, M., Cuijpers, P., Furukawa, T. A., & Ebert, D. D. (2021).
Doing Meta-Analysis with R: A
Hands-On Guide. Chapman and Hall/CRC. https://doi.org/10.1201/9781003107347
Hauck, D. W. W., & Anderson, S. (1984). A new statistical procedure
for testing equivalence in two-group comparative bioavailability trials.
Journal of Pharmacokinetics and Biopharmaceutics,
12(1), 83–91. https://doi.org/10.1007/BF01063612
Hedges, L. V., & Pigott, T. D. (2001). The power of statistical
tests in meta-analysis. Psychological Methods, 6(3),
203–217. https://doi.org/10.1037/1082-989X.6.3.203
Hempel, C. G. (1966). Philosophy of natural science (Nachdr.).
Prentice-Hall.
Hilgard, J. (2021). Maximal positive controls: A method for
estimating the largest plausible effect size. Journal of
Experimental Social Psychology, 93. https://doi.org/10.1016/j.jesp.2020.104082
Hill, C. J., Bloom, H. S., Black, A. R., & Lipsey, M. W. (2008).
Empirical Benchmarks for Interpreting Effect
Sizes in Research. Child Development
Perspectives, 2(3), 172–177. https://doi.org/10.1111/j.1750-8606.2008.00061.x
Hodges, J. L., & Lehmann, E. L. (1954). Testing the
Approximate Validity of Statistical
Hypotheses. Journal of the Royal Statistical Society. Series
B (Methodological), 16(2), 261–268. https://doi.org/10.1111/j.2517-6161.1954.tb00169.x
Hoenig, J. M., & Heisey, D. M. (2001). The abuse of power: The
pervasive fallacy of power calculations for data analysis. The
American Statistician, 55(1), 19–24. https://doi.org/10.1198/000313001300339897
Huedo-Medina, T. B., Sánchez-Meca, J., Marín-Martínez, F., &
Botella, J. (2006). Assessing heterogeneity in meta-analysis:
Q statistic or I$^2$ index? Psychological
Methods, 11(2), 193. http://psycnet.apa.org/journals/met/11/2/193/
Hung, H. M. J., O’Neill, R. T., Bauer, P., & Kohne, K. (1997). The
Behavior of the P-Value When the
Alternative Hypothesis is True.
Biometrics, 53(1), 11–22. https://doi.org/10.2307/2533093
Hunt, K. (1975). Do we really need more replications? Psychological
Reports, 36(2), 587–593.
Hyde, J. S., Lindberg, S. M., Linn, M. C., Ellis, A. B., & Williams,
C. C. (2008). Gender Similarities Characterize Math
Performance. Science, 321(5888), 494–495. https://doi.org/10.1126/science.1160364
Ioannidis, J. P. A. (2005). Why Most Published Research Findings
Are False. PLoS Medicine, 2(8), e124. https://doi.org/10.1371/journal.pmed.0020124
Ioannidis, J. P. A., & Trikalinos, T. A. (2007). An exploratory test
for an excess of significant findings. Clinical Trials,
4(3), 245–253. https://doi.org/10.1177/1740774507079441
Isager, P. M., van Aert, R. C. M., Bahník, Š., Brandt, M. J., DeSoto, K.
A., Giner-Sorolla, R., Krueger, J. I., Perugini, M., Ropovik, I., van ’t
Veer, A. E., Vranka, M., & Lakens, D. (2023). Deciding what to
replicate: A decision model for replication study selection
under resource and knowledge constraints. Psychological
Methods, 28(2), 438–451. https://doi.org/10.1037/met0000438
Iyengar, S., & Greenhouse, J. B. (1988). Selection
Models and the File Drawer Problem.
Statistical Science, 3(1), 109–117. http://www.jstor.org/stable/2245925
Jaeschke, R., Singer, J., & Guyatt, G. H. (1989). Measurement of
health status: Ascertaining the minimal clinically
important difference. Controlled Clinical Trials,
10(4), 407–415. https://doi.org/10.1016/0197-2456(89)90005-6
Jeffreys, H. (1939). Theory of probability (1st ed). Oxford
University Press.
Jennison, C., & Turnbull, B. W. (2000). Group sequential methods
with applications to clinical trials. Chapman & Hall/CRC.
Johansson, T. (2011). Hail the impossible: P-values, evidence, and
likelihood. Scandinavian Journal of Psychology, 52(2),
113–125. https://doi.org/10.1111/j.1467-9450.2010.00852.x
John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the
prevalence of questionable research practices with incentives for truth
telling. Psychological Science, 23(5), 524–532. https://doi.org/10.1177/0956797611430953
Johnson, V. E. (2013). Revised standards for statistical evidence.
Proceedings of the National Academy of Sciences,
110(48), 19313–19317. https://doi.org/10.1073/pnas.1313476110
Jones, L. V. (1952). Test of hypotheses: One-sided vs. Two-sided
alternatives. Psychological Bulletin, 49(1), 43–46.
https://doi.org/http://dx.doi.org/10.1037/h0056832
Jostmann, N. B., Lakens, D., & Schubert, T. W. (2009). Weight as an
Embodiment of Importance. Psychological
Science, 20(9), 1169–1174. https://doi.org/10.1111/j.1467-9280.2009.02426.x
Jostmann, N. B., Lakens, D., & Schubert, T. W. (2016). A short
history of the weight-importance effect and a recommendation for
pre-testing: Commentary on Ebersole et al.
(2016). Journal of Experimental Social Psychology, 67,
93–94. https://doi.org/10.1016/j.jesp.2015.12.001
Julious, S. A. (2004). Sample sizes for clinical trials with normal
data. Statistics in Medicine, 23(12), 1921–1986. https://doi.org/10.1002/sim.1783
Junk, T., & Lyons, L. (2020). Reproducibility and
Replication of Experimental Particle Physics
Results. Harvard Data Science Review, 2(4). https://doi.org/10.1162/99608f92.250f995b
Kaiser, H. F. (1960). Directional statistical decisions.
Psychological Review, 67(3), 160–167. https://doi.org/10.1037/h0047595
Kaplan, R. M., & Irvin, V. L. (2015). Likelihood of Null
Effects of Large NHLBI Clinical Trials Has Increased
over Time. PLOS ONE, 10(8), e0132382. https://doi.org/10.1371/journal.pone.0132382
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of
the American Statistical Association, 90(430), 773–795. https://doi.org/10.1080/01621459.1995.10476572
Keefe, R. S. E., Kraemer, H. C., Epstein, R. S., Frank, E., Haynes, G.,
Laughren, T. P., Mcnulty, J., Reed, S. D., Sanchez, J., & Leon, A.
C. (2013). Defining a Clinically Meaningful Effect for the
Design and Interpretation of Randomized
Controlled Trials. Innovations in Clinical Neuroscience,
10, 4S–19S. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3719483/
Kelley, K. (2007). Confidence Intervals for
Standardized Effect Sizes: Theory,
Application, and Implementation. Journal
of Statistical Software, 20(8). https://doi.org/10.18637/JSS.V020.I08
Kelley, K., & Preacher, K. J. (2012). On effect size.
Psychological Methods, 17(2), 137–152. https://doi.org/10.1037/a0028086
Kelley, K., & Rausch, J. R. (2006). Sample size planning for the
standardized mean difference: Accuracy in parameter estimation via
narrow confidence intervals. Psychological Methods,
11(4), 363–385. https://doi.org/10.1037
Kelter, R. (2021). Analysis of type I and II
error rates of Bayesian and frequentist parametric and
nonparametric two-sample hypothesis tests under preliminary assessment
of normality. Computational Statistics, 36(2),
1263–1288. https://doi.org/10.1007/s00180-020-01034-7
Kenett, R. S., Shmueli, G., & Kenett, R. (2016). Information
Quality: The Potential of Data
and Analytics to Generate Knowledge (1st
edition). Wiley.
Kennedy-Shaffer, L. (2019). Before p < 0.05 to Beyond p
< 0.05: Using
History to Contextualize p-Values and
Significance Testing. The American Statistician,
73, 82–90. https://doi.org/10.1080/00031305.2018.1537891
Kenny, D. A., & Judd, C. M. (2019). The unappreciated heterogeneity
of effect sizes: Implications for power, precision,
planning of research, and replication. Psychological Methods,
24(5), 578–589. https://doi.org/10.1037/met0000209
Keppel, G. (1991). Design and analysis: A researcher’s
handbook, 3rd ed (pp. xiii, 594). Prentice-Hall, Inc.
Kerr, N. L. (1998). HARKing: Hypothesizing
After the Results are Known.
Personality and Social Psychology Review, 2(3),
196–217. https://doi.org/10.1207/s15327957pspr0203_4
King, M. T. (2011). A point of minimal important difference
(MID): A critique of terminology and methods. Expert
Review of Pharmacoeconomics & Outcomes Research,
11(2), 171–184. https://doi.org/10.1586/erp.11.9
Kish, L. (1959). Some Statistical Problems in
Research Design. American Sociological Review,
24(3), 328–338. https://doi.org/10.2307/2089381
Kish, L. (1965). Survey Sampling. Wiley.
Komić, D., Marušić, S. L., & Marušić, A. (2015). Research
Integrity and Research Ethics in
Professional Codes of Ethics:
Survey of Terminology Used by
Professional Organizations across Research
Disciplines. PLOS ONE, 10(7), e0133662. https://doi.org/10.1371/journal.pone.0133662
Koole, S. L., & Lakens, D. (2012). Rewarding replications
A sure and simple way to improve psychological science.
Perspectives on Psychological Science, 7(6), 608–614.
https://doi.org/10.1177/1745691612462586
Kraft, M. A. (2020). Interpreting effect sizes of education
interventions. Educational Researcher, 49(4), 241–253.
https://doi.org/10.3102/0013189X20912798
Kruschke, J. K. (2011). Bayesian assessment of null values via parameter
estimation and model comparison. Perspectives on Psychological
Science, 6(3), 299–312.
Kruschke, J. K. (2013). Bayesian estimation supersedes the t test.
Journal of Experimental Psychology: General, 142(2),
573–603. https://doi.org/10.1037/a0029146
Kruschke, J. K. (2014). Doing Bayesian Data Analysis,
Second Edition: A Tutorial with
R, JAGS, and Stan (2
edition). Academic Press.
Kruschke, J. K. (2018). Rejecting or Accepting Parameter
Values in Bayesian Estimation. Advances in
Methods and Practices in Psychological Science, 1(2),
270–280. https://doi.org/10.1177/2515245918771304
Kruschke, J. K., & Liddell, T. M. (2017). The Bayesian New
Statistics: Hypothesis testing, estimation,
meta-analysis, and power analysis from a Bayesian
perspective. Psychonomic Bulletin & Review. https://doi.org/10.3758/s13423-016-1221-4
Kuhn, T. S. (1962). The Structure of Scientific
Revolutions. University of Chicago Press.
Kuipers, T. A. F. (2016). Models, postulates, and generalized nomic
truth approximation. Synthese, 193(10), 3057–3077. https://doi.org/10.1007/s11229-015-0916-9
Lakatos, I. (1978). The methodology of scientific research
programmes: Volume 1: Philosophical
papers. Cambridge University Press.
Lakens, Daniël. (2013). Calculating and reporting effect sizes to
facilitate cumulative science: A practical primer for t-tests and
ANOVAs. Frontiers in Psychology, 4. https://doi.org/10.3389/fpsyg.2013.00863
Lakens, Daniël. (2014). Performing high-powered studies efficiently with
sequential analyses: Sequential analyses. European
Journal of Social Psychology, 44(7), 701–710. https://doi.org/10.1002/ejsp.2023
Lakens, Daniël. (2017). Equivalence Tests: A
Practical Primer for t Tests,
Correlations, and Meta-Analyses. Social
Psychological and Personality Science, 8(4), 355–362. https://doi.org/10.1177/1948550617697177
Lakens, Daniël. (2019). The value of preregistration for psychological
science: A conceptual analysis. Japanese Psychological
Review, 62(3), 221–230. https://doi.org/10.24602/sjpr.62.3_221
Lakens, Daniël. (2020). Pandemic researchers — recruit your own best
critics. Nature, 581(7807, 7807), 121–121. https://doi.org/10.1038/d41586-020-01392-8
Lakens, Daniël. (2021). The practical alternative to the p value is the
correctly used p value. Perspectives on Psychological Science,
16(3), 639–648. https://doi.org/10.1177/1745691620958012
Lakens, Daniël. (2022a). Sample Size Justification.
Collabra: Psychology. https://doi.org/10.31234/osf.io/9d3yf
Lakens, Daniël. (2022b). Why P values are not measures of
evidence. Trends in Ecology & Evolution, 37(4),
289–290. https://doi.org/10.1016/j.tree.2021.12.006
Lakens, Daniël. (2023). Is my study useless? Why
researchers need methodological review boards. Nature,
613(7942, 7942), 9–9. https://doi.org/10.1038/d41586-022-04504-8
Lakens, Daniel. (2023, December 18). When and How to
Deviate from a Preregistration. https://doi.org/10.31234/osf.io/ha29k
Lakens, Daniël, Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A.
J., Argamon, S. E., Baguley, T., Becker, R. B., Benning, S. D.,
Bradford, D. E., Buchanan, E. M., Caldwell, A. R., Calster, B.,
Carlsson, R., Chen, S.-C., Chung, B., Colling, L. J., Collins, G. S.,
Crook, Z., … Zwaan, R. A. (2018). Justify your alpha. Nature Human
Behaviour, 2, 168–171. https://doi.org/10.1038/s41562-018-0311-x
Lakens, Daniël, & Caldwell, A. R. (2021). Simulation-Based
Power Analysis for Factorial Analysis of
Variance Designs. Advances in Methods and Practices in
Psychological Science, 4(1). https://doi.org/10.1177/2515245920951503
Lakens, Daniël, & DeBruine, L. (2020). Improving
Transparency, Falsifiability, and
Rigour by Making Hypothesis Tests Machine
Readable. https://doi.org/10.31234/osf.io/5xcda
Lakens, Daniël, & Etz, A. J. (2017). Too True to be
Bad: When Sets of Studies With
Significant and Nonsignificant Findings Are Probably
True. Social Psychological and Personality Science,
8(8), 875–881. https://doi.org/10.1177/1948550617693058
Lakens, Daniël, Hilgard, J., & Staaks, J. (2016). On the
reproducibility of meta-analyses: Six practical recommendations. BMC
Psychology, 4, 24. https://doi.org/10.1186/s40359-016-0126-3
Lakens, Daniël, McLatchie, N., Isager, P. M., Scheel, A. M., &
Dienes, Z. (2020). Improving Inferences About Null Effects With
Bayes Factors and Equivalence Tests. The
Journals of Gerontology: Series B, 75(1), 45–57. https://doi.org/10.1093/geronb/gby065
Lakens, Daniël, Scheel, A. M., & Isager, P. M. (2018). Equivalence
testing for psychological research: A tutorial.
Advances in Methods and Practices in Psychological Science,
1(2), 259–269. https://doi.org/10.1177/2515245918770963
Lan, K. K. G., & DeMets, D. L. (1983). Discrete Sequential
Boundaries for Clinical Trials. Biometrika,
70(3), 659. https://doi.org/10.2307/2336502
Langmuir, I., & Hall, R. N. (1989). Pathological
Science. Physics Today, 42(10), 36–48. https://doi.org/10.1063/1.881205
Latan, H., Chiappetta Jabbour, C. J., Lopes de Sousa Jabbour, A. B.,
& Ali, M. (2021). Crossing the Red Line?
Empirical Evidence and Useful Recommendations
on Questionable Research Practices among Business
Scholars. Journal of Business Ethics, 1–21. https://doi.org/10.1007/s10551-021-04961-7
Laudan, L. (1981). Science and Hypothesis.
Springer Netherlands. https://doi.org/10.1007/978-94-015-7288-0
Laudan, L. (1986). Science and Values: The
Aims of Science and Their Role in
Scientific Debate.
Lawrence, J. M., Meyerowitz-Katz, G., Heathers, J. A. J., Brown, N. J.
L., & Sheldrick, K. A. (2021). The lesson of ivermectin:
Meta-analyses based on summary data alone are inherently unreliable.
Nature Medicine, 27(11, 11), 1853–1854. https://doi.org/10.1038/s41591-021-01535-y
Leamer, E. E. (1978). Specification Searches: Ad
Hoc Inference with Nonexperimental Data (1
edition). Wiley.
Lehmann, E. L., & Romano, J. P. (2005). Testing statistical
hypotheses (3rd ed). Springer.
Lenth, R. V. (2001). Some practical guidelines for effective sample size
determination. The American Statistician, 55(3),
187–193. https://doi.org/10.1198/000313001317098149
Lenth, R. V. (2007). Post hoc power: Tables and commentary. Iowa
City: Department of Statistics and Actuarial Science, University of
Iowa. https://pdfs.semanticscholar.org/fbfb/cab4b59e54c6a3ed39ba3656f35ef86c5ee3.pdf
Leon, A. C., Davis, L. L., & Kraemer, H. C. (2011). The
Role and Interpretation of Pilot
Studies in Clinical Research. Journal of
Psychiatric Research, 45(5), 626–629. https://doi.org/10.1016/j.jpsychires.2010.10.008
Letrud, K., & Hernes, S. (2019). Affirmative citation bias in
scientific myth debunking: A three-in-one case study.
PLOS ONE, 14(9), e0222213. https://doi.org/10.1371/journal.pone.0222213
Leung, P. T. M., Macdonald, E. M., Stanbrook, M. B., Dhalla, I. A.,
& Juurlink, D. N. (2017). A 1980 Letter on the
Risk of Opioid Addiction. New England
Journal of Medicine, 376(22), 2194–2195. https://doi.org/10.1056/NEJMc1700150
Levine, T. R., Weber, R., Park, H. S., & Hullett, C. R. (2008). A
communication researchers’ guide to null hypothesis significance testing
and alternatives. Human Communication Research, 34(2),
188–209. http://onlinelibrary.wiley.com/doi/10.1111/j.1468-2958.2008.00318.x/full
Leys, C., Delacre, M., Mora, Y. L., Lakens, D., & Ley, C. (2019).
How to Classify, Detect, and Manage
Univariate and Multivariate Outliers, With
Emphasis on Pre-Registration. International
Review of Social Psychology, 32(1), 5. https://doi.org/10.5334/irsp.289
Linden, A. H., & Hönekopp, J. (2021). Heterogeneity of
Research Results: A New Perspective From Which
to Assess and Promote Progress in
Psychological Science. Perspectives on Psychological
Science, 16(2), 358–376. https://doi.org/10.1177/1745691620964193
Lindley, D. V. (1957). A statistical paradox. Biometrika,
44(1/2), 187–192.
Lindsay, D. S. (2015). Replication in Psychological
Science. Psychological Science, 26(12),
1827–1832. https://doi.org/10.1177/0956797615616374
Loevinger, J. (1968). The "information explosion.". American
Psychologist, 23(6), 455–455. https://doi.org/10.1037/h0020800
Longino, H. E. (1990). Science as Social Knowledge:
Values and Objectivity in Scientific
Inquiry. Princeton University Press. https://books.google.com?id=S8fIbD19BisC
Louis, T. A., & Zeger, S. L. (2009). Effective communication of
standard errors and confidence intervals. Biostatistics,
10(1), 1–2. https://doi.org/10.1093/biostatistics/kxn014
Lovakov, A., & Agadullina, E. R. (2021). Empirically derived
guidelines for effect size interpretation in social psychology.
European Journal of Social Psychology, 51(3), 485–504.
https://doi.org/10.1002/ejsp.2752
Lubin, A. (1957). Replicability as a publication criterion. American
Psychologist, 12, 519–520. https://doi.org/10.1037/h0039746
Luttrell, A., Petty, R. E., & Xu, M. (2017). Replicating and fixing
failed replications: The case of need for cognition and
argument quality. Journal of Experimental Social Psychology,
69, 178–183. https://doi.org/10.1016/j.jesp.2016.09.006
Lykken, D. T. (1968). Statistical significance in psychological
research. Psychological Bulletin, 70, 151–159. https://doi.org/10.1037/h0026141
Lyons, I. M., Nuerk, H.-C., & Ansari, D. (2015). Rethinking the
implications of numerical ratio effects for understanding the
development of representational precision and numerical processing
across formats. Journal of Experimental Psychology: General,
144(5), 1021–1035. https://doi.org/10.1037/xge0000094
MacCoun, R., & Perlmutter, S. (2015). Blind analysis:
Hide results to seek the truth. Nature,
526(7572, 7572), 187–189. https://doi.org/10.1038/526187a
Mack, R. W. (1951). The Need for Replication
Research in Sociology. American Sociological
Review, 16(1), 93–94. https://doi.org/10.2307/2087978
Mahoney, M. J. (1979). Psychology of the scientist: An
evaluative review. Social Studies of Science, 9(3),
349–375. https://doi.org/10.1177/030631277900900304
Maier, M., & Lakens, D. (2022). Justify your alpha: A
primer on two practical approaches. Advances in Methods and
Practices in Psychological Science. https://doi.org/10.31234/osf.io/ts4r6
Makel, M. C., Hodges, J., Cook, B. G., & Plucker, J. A. (2021). Both
Questionable and Open Research Practices Are
Prevalent in Education Research. Educational
Researcher, 50(8), 493–504. https://doi.org/10.3102/0013189X211001356
Marshall, B., Cardon, P., Poddar, A., & Fontenot, R. (2013). Does
Sample Size Matter in Qualitative Research?:
A Review of Qualitative Interviews in is
Research. Journal of Computer Information Systems,
54(1), 11–22. https://doi.org/10.1080/08874417.2013.11645667
Maxwell, S. E., & Delaney, H. D. (2004). Designing experiments
and analyzing data: A model comparison perspective (2nd ed).
Lawrence Erlbaum Associates.
Maxwell, S. E., Delaney, H. D., & Kelley, K. (2017). Designing
Experiments and Analyzing Data: A Model
Comparison Perspective, Third Edition (3
edition). Routledge.
Maxwell, S. E., & Kelley, K. (2011). Ethics and sample size
planning. In Handbook of ethics in quantitative methodology
(pp. 179–204). Routledge.
Maxwell, S. E., Kelley, K., & Rausch, J. R. (2008). Sample
Size Planning for Statistical Power and
Accuracy in Parameter Estimation. Annual
Review of Psychology, 59(1), 537–563. https://doi.org/10.1146/annurev.psych.59.103006.093735
Mayo, D. G. (1996). Error and the growth of experimental
knowledge. University of Chicago Press.
Mayo, D. G. (2018). Statistical inference as severe testing: How to
get beyond the statistics wars. Cambridge University Press.
Mayo, D. G., & Spanos, A. (2011). Error statistics. Philosophy
of Statistics, 7, 152–198. http://books.google.com/books?hl=en&lr=&id=mPG5RupkTX0C&oi=fnd&pg=PA153&dq=%22%EF%AC%81t.+This+is+understandable+for+two+main+reasons:+%EF%AC%81rst,+it+is+not+the+job%22+%22What+is+the+role+of+probability+in+making+reliable%22+%22serve+both+to+direct+and+justify+the+use+of+formal+statistical+tools.+Next%22+&ots=w0BijrIuke&sig=hNF3O36y8VTq9WNt8k5vp9kCIM0
Mazzolari, R., Porcelli, S., Bishop, D. J., & Lakens, D. (2022).
Myths and methodologies: The use of equivalence and
non-inferiority tests for interventional studies in exercise physiology
and sport science. Experimental Physiology, 107(3),
201–212. https://doi.org/10.1113/EP090171
McCarthy, R. J., Skowronski, J. J., Verschuere, B., Meijer, E. H., Jim,
A., Hoogesteyn, K., Orthey, R., Acar, O. A., Aczel, B., Bakos, B. E.,
Barbosa, F., Baskin, E., Bègue, L., Ben-Shakhar, G., Birt, A. R., Blatz,
L., Charman, S. D., Claesen, A., Clay, S. L., … Yıldız, E. (2018).
Registered Replication Report on Srull and
Wyer (1979). Advances in Methods and Practices in
Psychological Science, 1(3), 321–336. https://doi.org/10.1177/2515245918777487
McElreath, R. (2016). Statistical Rethinking: A
Bayesian Course with Examples in R and
Stan (Vol. 122). CRC Press. http://books.google.com/books?hl=en&lr=&id=mDo0CwAAQBAJ&oi=fnd&pg=PP1&dq=%22Maximum+entropy%22+%2213.+Adventures+in%22+%22sure+that+planks+won%E2%80%99t+slip+at+busy%22+%22Ordered+categorical%22+%2214.+Missing+Data+and+Other%22+%22Other+count%22+%22Continuous+categories+and+the+Gaussian%22+&ots=fD6h3iyM2m&sig=pEXBn5Ex-ps9Fhw-b6juTVKIngY
McGrath, R. E., & Meyer, G. J. (2006). When effect sizes disagree:
The case of r and d. Psychological Methods,
11(4), 386–401. https://doi.org/10.1037/1082-989X.11.4.386
McGraw, K. O., & Wong, S. P. (1992). A common language effect size
statistic. Psychological Bulletin, 111(2), 361–365. https://doi.org/10.1037/0033-2909.111.2.361
McGuire, W. J. (2004). A Perspectivist Approach to
Theory Construction. Personality and Social Psychology
Review, 8(2), 173–182. https://doi.org/10.1207/s15327957pspr0802_11
McIntosh, R. D., & Rittmo, J. Ö. (2021). Power calculations in
single-case neuropsychology: A practical primer.
Cortex, 135, 146–158. https://doi.org/10.1016/j.cortex.2020.11.005
Meehl, P. E. (1967). Theory-testing in psychology and physics:
A methodological paradox. Philosophy of Science,
103–115. http://www.jstor.org/stable/186099
Meehl, P. E. (1978). Theoretical Risks and Tabular
Asterisks: Sir Karl, Sir Ronald, and
the Slow Progress of Soft Psychology.
Journal of Consulting and Clinical Psychology, 46(4),
806–834. https://doi.org/10.1037/0022-006X.46.4.806
Meehl, P. E. (1990a). Appraising and amending theories: The
strategy of Lakatosian defense and two principles that
warrant it. Psychological Inquiry, 1(2), 108–141. https://doi.org/10.1207/s15327965pli0102_1
Meehl, P. E. (1990b). Why Summaries of
Research on Psychological Theories are
Often Uninterpretable: Psychological Reports,
66(1), 195–244. https://doi.org/10.2466/pr0.1990.66.1.195
Meehl, P. E. (2004). Cliometric metatheory III:
Peircean consensus, verisimilitude and asymptotic method.
The British Journal for the Philosophy of Science,
55(4), 615–643.
Melara, R. D., & Algom, D. (2003). Driven by information:
A tectonic theory of Stroop effects.
Psychological Review, 110(3), 422–471. https://doi.org/10.1037/0033-295X.110.3.422
Mellers, B., Hertwig, R., & Kahneman, D. (2001). Do frequency
representations eliminate conjunction effects? An exercise
in adversarial collaboration. Psychological Science,
12(4), 269–275. https://doi.org/10.1111/1467-9280.00350
Merton, R. K. (1942). A Note on Science and
Democracy. Journal of Legal and Political
Sociology, 1, 115–126. https://heinonline.org/HOL/Page?handle=hein.journals/jolegpo1&id=115&div=&collection=
Meyners, M. (2012). Equivalence tests – A review. Food
Quality and Preference, 26(2), 231–245. https://doi.org/10.1016/j.foodqual.2012.05.003
Meyvis, T., & Van Osselaer, S. M. J. (2018). Increasing the
Power of Your Study by Increasing
the Effect Size. Journal of Consumer Research,
44(5), 1157–1173. https://doi.org/10.1093/jcr/ucx110
Millar, R. B. (2011). Maximum likelihood estimation and inference:
With examples in R, SAS, and
ADMB. Wiley.
Miller, J. (2009). What is the probability of replicating a
statistically significant effect? Psychonomic Bulletin &
Review, 16(4), 617–640. https://doi.org/10.3758/PBR.16.4.617
Miller, J., & Ulrich, R. (2019). The quest for an optimal alpha.
PLOS ONE, 14(1), e0208631. https://doi.org/10.1371/journal.pone.0208631
Mitroff, I. I. (1974). Norms and Counter-Norms in a
Select Group of the Apollo Moon Scientists:
A Case Study of the Ambivalence of
Scientists. American Sociological Review,
39(4), 579–595. https://doi.org/10.2307/2094423
Moe, K. (1984). Should the Nazi Research Data Be Cited?
The Hastings Center Report, 14(6), 5–7. https://doi.org/10.2307/3561733
Moran, C., Link to external site, this link will open in a new window,
Richard, A., Link to external site, this link will open in a new window,
Wilson, K., Twomey, R., Link to external site, this link will open in a
new window, Coroiu, A., & Link to external site, this link will open
in a new window. (2022). I know it’s bad, but I have been
pressured into it: Questionable research practices among
psychology students in Canada. Canadian
Psychology/Psychologie Canadienne. https://doi.org/10.1037/cap0000326
Morey, Richard D. (2020, June 12). Power and precision [Blog].
https://medium.com/@richarddmorey/power-and-precision-47f644ddea5e
Morey, Richard D., Hoekstra, R., Rouder, J. N., Lee, M. D., &
Wagenmakers, E.-J. (2016). The fallacy of placing confidence in
confidence intervals. Psychonomic Bulletin & Review,
23(1), 103–123. http://link.springer.com/article/10.3758/s13423-015-0947-8
Morey, Richard D., Kaschak, M. P., Díez-Álamo, A. M., Glenberg, A. M.,
Zwaan, R. A., Lakens, D., Ibáñez, A., García, A., Gianelli, C., Jones,
J. L., Madden, J., Alifano, F., Bergen, B., Bloxsom, N. G., Bub, D. N.,
Cai, Z. G., Chartier, C. R., Chatterjee, A., Conwell, E., … Ziv-Crispel,
N. (2021). A pre-registered, multi-lab non-replication of the
action-sentence compatibility effect (ACE). Psychonomic
Bulletin & Review. https://doi.org/10.3758/s13423-021-01927-8
Morey, Richard D., & Lakens, D. (2016). Why most of psychology
is statistically unfalsifiable. https://raw.githubusercontent.com/richarddmorey/psychology_resolution/master/paper/response.pdf
Morris, T. P., White, I. R., & Crowther, M. J. (2019). Using
simulation studies to evaluate statistical methods. Statistics in
Medicine, 38(11), 2074–2102. https://doi.org/10.1002/sim.8086
Morse, J. M. (1995). The Significance of
Saturation. Qualitative Health Research,
5(2), 147–149. https://doi.org/10.1177/104973239500500201
Moscovici, S. (1972). Society and theory in social psychology. In
Context of social psychology (pp. 17–81).
Moshontz, H., Campbell, L., Ebersole, C. R., IJzerman, H., Urry, H. L.,
Forscher, P. S., Grahe, J. E., McCarthy, R. J., Musser, E. D., &
Antfolk, J. (2018). The Psychological Science Accelerator:
Advancing psychology through a distributed collaborative
network. Advances in Methods and Practices in Psychological
Science, 1(4), 501–515. https://doi.org/10.1177/2515245918797607
Motyl, M., Demos, A. P., Carsel, T. S., Hanson, B. E., Melton, Z. J.,
Mueller, A. B., Prims, J. P., Sun, J., Washburn, A. N., Wong, K. M.,
Yantis, C., & Skitka, L. J. (2017). The state of social and
personality science: Rotten to the core, not so bad,
getting better, or getting worse? Journal of Personality and Social
Psychology, 113, 34–58. https://doi.org/10.1037/pspa0000084
Mrozek, J. R., & Taylor, L. O. (2002). What determines the value of
life? A meta-analysis. Journal of Policy Analysis and
Management, 21(2), 253–270. https://doi.org/10.1002/pam.10026
Mudge, J. F., Baker, L. F., Edge, C. B., & Houlahan, J. E. (2012).
Setting an Optimal α That Minimizes Errors in
Null Hypothesis Significance Tests. PLOS ONE,
7(2), e32734. https://doi.org/10.1371/journal.pone.0032734
Mullan, F., & Jacoby, I. (1985). The town meeting for technology:
The maturation of consensus conferences. JAMA,
254(8), 1068–1072. https://doi.org/10.1001/jama.1985.03360080080035
Mulligan, A., Hall, L., & Raphael, E. (2013). Peer review in a
changing world: An international study measuring the
attitudes of researchers. Journal of the American Society for
Information Science and Technology, 64(1), 132–161. https://doi.org/10.1002/asi.22798
Murphy, K. R., & Myors, B. (1999). Testing the hypothesis that
treatments have negligible effects: Minimum-effect tests in the general linear model.
Journal of Applied Psychology, 84(2), 234–248. https://doi.org/10.1037/0021-9010.84.2.234
Murphy, K. R., Myors, B., & Wolach, A. H. (2014). Statistical
power analysis: A simple and general model for traditional and modern
hypothesis tests (Fourth edition). Routledge, Taylor & Francis
Group.
National Academy of Sciences, National Academy of Engineering, &
Institute of Medicine. (2009). On being a scientist: A
guide to responsible conduct in research: Third
edition. The National Academies Press. https://doi.org/10.17226/12192
Neher, A. (1967). Probability Pyramiding, Research
Error and the Need for Independent
Replication. The Psychological Record, 17(2),
257–262. https://doi.org/10.1007/BF03393713
Nemeth, C., Brown, K., & Rogers, J. (2001). Devil’s advocate versus
authentic dissent: Stimulating quantity and quality. European
Journal of Social Psychology, 31(6), 707–720. https://doi.org/10.1002/ejsp.58
Neyman, J. (1957). "Inductive Behavior" as a Basic
Concept of Philosophy of Science.
Revue de l’Institut International de Statistique / Review of the
International Statistical Institute, 25(1/3), 7. https://doi.org/10.2307/1401671
Neyman, J., & Pearson, E. S. (1933). On the problem of the most
efficient tests of statistical hypotheses. Philosophical
Transactions of the Royal Society of London A: Mathematical, Physical
and Engineering Sciences, 231(694-706), 289–337. https://doi.org/10.1098/rsta.1933.0009
Nickerson, R. S. (1998). Confirmation bias: A ubiquitous
phenomenon in many guises. Review of General Psychology,
2(2), 175–220.
Nickerson, R. S. (2000). Null hypothesis significance testing:
A review of an old and continuing controversy.
Psychological Methods, 5(2), 241–301. https://doi.org/10.1037//1082-989X.5.2.241
Niiniluoto, I. (1998). Verisimilitude: The Third Period.
The British Journal for the Philosophy of Science, 49,
1–29. https://academic.oup.com/bjps/article-abstract/49/1/1/1448878/Verisimilitude-The-Third-Period
Niiniluoto, I. (1999). Critical Scientific
Realism. Oxford University Press. https://books.google.com?id=Ng_p_3XCHxAC
Norman, G. R., Sloan, J. A., & Wyrwich, K. W. (2004). The truly
remarkable universality of half a standard deviation: Confirmation
through another look. Expert Review of Pharmacoeconomics &
Outcomes Research, 4(5), 581–585.
Nosek, B. A., & Errington, T. M. (2020). What is replication?
PLOS Biology, 18(3), e3000691. https://doi.org/10.1371/journal.pbio.3000691
Nosek, B. A., & Lakens, D. (2014). Registered reports:
A method to increase the credibility of published results.
Social Psychology, 45(3), 137–141. https://doi.org/10.1027/1864-9335/a000192
Nuijten, M. B., Hartgerink, C. H. J., van Assen, M. A. L. M., Epskamp,
S., & Wicherts, J. M. (2015). The prevalence of statistical
reporting errors in psychology (1985–2013). Behavior Research
Methods. https://doi.org/10.3758/s13428-015-0664-2
Nuijten, M. B., & Wicherts, J. (2023, January 31). The
effectiveness of implementing statcheck in the peer review process to
avoid statistical reporting errors. https://doi.org/10.31234/osf.io/bxau9
Nunnally, J. (1960). The place of statistics in psychology.
Educational and Psychological Measurement, 20(4),
641–650. https://doi.org/10.1177/001316446002000401
O’Donnell, M., Nelson, L. D., Ackermann, E., Aczel, B., Akhtar, A.,
Aldrovandi, S., Alshaif, N., Andringa, R., Aveyard, M., Babincak, P.,
Balatekin, N., Baldwin, S. A., Banik, G., Baskin, E., Bell, R.,
Białobrzeska, O., Birt, A. R., Boot, W. R., Braithwaite, S. R., …
Zrubka, M. (2018). Registered Replication Report:
Dijksterhuis and van Knippenberg (1998).
Perspectives on Psychological Science, 13(2), 268–294.
https://doi.org/10.1177/1745691618755704
Obels, P., Lakens, D., Coles, N. A., Gottfried, J., & Green, S. A.
(2020). Analysis of Open Data and Computational
Reproducibility in Registered Reports in
Psychology. Advances in Methods and Practices in
Psychological Science, 3(2), 229–237. https://doi.org/10.1177/2515245920918872
Oddie, G. (2013). The content, consequence and likeness approaches to
verisimilitude: Compatibility, trivialization, and underdetermination.
Synthese, 190(9), 1647–1687. https://doi.org/10.1007/s11229-011-9930-8
Okada, K. (2013). Is Omega Squared Less Biased? A
Comparison of Three Major Effect Size Indices
in One-Way Anova. Behaviormetrika, 40(2),
129–147. https://doi.org/10.2333/bhmk.40.129
Olejnik, S., & Algina, J. (2003). Generalized Eta and
Omega Squared Statistics: Measures of
Effect Size for Some Common Research Designs.
Psychological Methods, 8(4), 434–447. https://doi.org/10.1037/1082-989X.8.4.434
Olsson-Collentine, A., Wicherts, J. M., & van Assen, M. A. L. M.
(2020). Heterogeneity in direct replications in psychology and its
association with effect size. Psychological Bulletin,
146(10), 922–940. https://doi.org/10.1037/bul0000294
Open Science Collaboration. (2015). Estimating the reproducibility of
psychological science. Science, 349(6251),
aac4716–aac4716. https://doi.org/10.1126/science.aac4716
Orben, A., & Lakens, D. (2020). Crud
(Re)Defined. Advances in Methods and
Practices in Psychological Science, 3(2), 238–247. https://doi.org/10.1177/2515245920917961
Parker, R. A., & Berman, N. G. (2003). Sample Size.
The American Statistician, 57(3), 166–170. https://doi.org/10.1198/0003130031919
Parkhurst, D. F. (2001). Statistical significance tests:
Equivalence and reverse tests should reduce
misinterpretation. Bioscience, 51(12), 1051–1057. https://doi.org/10.1641/0006-3568(2001)051[1051:SSTEAR]2.0.CO;2
Parsons, S., Kruijt, A.-W., & Fox, E. (2019). Psychological
Science Needs a Standard Practice of
Reporting the Reliability of
Cognitive-Behavioral Measurements. Advances in Methods
and Practices in Psychological Science, 2(4), 378–395. https://doi.org/10.1177/2515245919879695
Pawitan, Y. (2001). In all likelihood: Statistical modelling and
inference using likelihood. Clarendon Press ; Oxford University
Press.
Pemberton, M., Hall, S., Moskovitz, C., & Anson, C. M. (2019). Text
recycling: Views of North American journal
editors from an interview-based study. Learned Publishing,
32(4), 355–366. https://doi.org/10.1002/leap.1259
Pereboom, A. C. (1971). Some Fundamental Problems in
Experimental Psychology: An Overview.
Psychological Reports, 28(2). https://doi.org/10.2466/pr0.1971.28.2.439
Perneger, T. V. (1998). What’s wrong with Bonferroni
adjustments. Bmj, 316(7139), 1236–1238. http://www.bmj.com/content/316/7139/1236.short
Perugini, M., Gallucci, M., & Costantini, G. (2014). Safeguard power
as a protection against imprecise power estimates. Perspectives on
Psychological Science, 9(3), 319–332. https://doi.org/10.1177/1745691614528519
Perugini, M., Gallucci, M., & Costantini, G. (2018). A
Practical Primer To Power Analysis for Simple
Experimental Designs. International Review of Social
Psychology, 31(1), 20. https://doi.org/10.5334/irsp.181
Peters, J. L., Sutton, A. J., Jones, D. R., Abrams, K. R., &
Rushton, L. (2007). Performance of the trim and fill method in the
presence of publication bias and between-study heterogeneity.
Statistics in Medicine, 26(25), 4544–4562. https://doi.org/10.1002/sim.2889
Phillips, B. M., Hunt, J. W., Anderson, B. S., Puckett, H. M., Fairey,
R., Wilson, C. J., & Tjeerdema, R. (2001). Statistical significance
of sediment toxicity test results: Threshold values derived
by the detectable significance approach. Environmental Toxicology
and Chemistry, 20(2), 371–373. https://doi.org/10.1002/etc.5620200218
Pickett, J. T., & Roche, S. P. (2017). Questionable,
Objectionable or Criminal? Public
Opinion on Data Fraud and Selective
Reporting in Science. Science and Engineering
Ethics, 1–21. https://doi.org/10.1007/s11948-017-9886-2
Platt, J. R. (1964). Strong Inference: Certain
systematic methods of scientific thinking may produce much more rapid
progress than others. Science, 146(3642), 347–353. https://doi.org/10.1126/science.146.3642.347
Pocock, S. J. (1977). Group sequential methods in the design and
analysis of clinical trials. Biometrika, 64(2),
191–199. https://doi.org/10.1093/biomet/64.2.191
Polanin, J. R., Hennessy, E. A., & Tsuji, S. (2020). Transparency
and Reproducibility of Meta-Analyses in
Psychology: A Meta-Review. Perspectives on
Psychological Science, 15(4), 1026–1041. https://doi.org/10.1177/1745691620906416
Popper, K. R. (2002). The logic of scientific discovery.
Routledge.
Primbs, M., Pennington, C. R., Lakens, D., Silan, M. A., Lieck, D. S.
N., Forscher, P., Buchanan, E. M., & Westwood, S. J. (2022). Are
Small Effects the Indispensable Foundation for
a Cumulative Psychological Science? A Reply to
Götz et al. (2022). Perspectives on Psychological
Science. https://doi.org/10.31234/osf.io/6s8bj
Proschan, M. A. (2005). Two-Stage Sample Size Re-Estimation
Based on a Nuisance Parameter: A
Review. Journal of Biopharmaceutical Statistics,
15(4), 559–574. https://doi.org/10.1081/BIP-200062852
Proschan, M. A., Lan, K. K. G., & Wittes, J. T. (2006).
Statistical monitoring of clinical trials: A unified approach.
Springer.
Psillos, S. (1999). Scientific realism: How science tracks
truth. Routledge.
Quertemont, E. (2011). How to Statistically Show the
Absence of an Effect. Psychologica
Belgica, 51(2), 109–127. https://doi.org/10.5334/pb-51-2-109
Rabelo, A. L. A., Farias, J. E. M., Sarmet, M. M., Joaquim, T. C. R.,
Hoersting, R. C., Victorino, L., Modesto, J. G. N., & Pilati, R.
(2020). Questionable research practices among Brazilian
psychological researchers: Results from a replication study
and an international comparison. International Journal of
Psychology, 55(4), 674–683. https://doi.org/10.1002/ijop.12632
Radick, G. (2022). Mendel the fraud? A social history of
truth in genetics. Studies in History and Philosophy of
Science, 93, 39–46. https://doi.org/10.1016/j.shpsa.2021.12.012
Reif, F. (1961). The Competitive World of the Pure
Scientist. Science, 134(3494), 1957–1962. https://doi.org/10.1126/science.134.3494.1957
Rice, W. R., & Gaines, S. D. (1994). ’Heads I win,
tails you lose’: Testing directional alternative hypotheses in
ecological and evolutionary research. Trends in Ecology &
Evolution, 9(6), 235–237. https://doi.org/10.1016/0169-5347(94)90258-5
Richard, F. D., Bond, C. F., & Stokes-Zoota, J. J. (2003). One
Hundred Years of Social Psychology Quantitatively
Described. Review of General Psychology, 7(4),
331–363. https://doi.org/10.1037/1089-2680.7.4.331
Richardson, J. T. E. (2011). Eta squared and partial eta squared as
measures of effect size in educational research. Educational
Research Review, 6(2), 135–147. https://doi.org/10.1016/j.edurev.2010.12.001
Rijnsoever, F. J. van. (2017). (I Can’t Get
No) Saturation: A simulation and
guidelines for sample sizes in qualitative research. PLOS ONE,
12(7), e0181689. https://doi.org/10.1371/journal.pone.0181689
Rogers, J. L., Howard, K. I., & Vessey, J. T. (1993). Using
significance tests to evaluate equivalence between two experimental
groups. Psychological Bulletin, 113(3), 553–565.
https://doi.org/http://dx.doi.org/10.1037/0033-2909.113.3.553
Rogers, S. (1992–1993). How a publicity blitz created the myth of
subliminal advertising. Public Relations Quarterly,
37(4), 12. https://www.proquest.com/docview/222493951/abstract/99E0E14044C24A96PQ/1
Ropovik, I., Adamkovic, M., & Greger, D. (2021). Neglect of
publication bias compromises meta-analyses of educational research.
PLOS ONE, 16(6), e0252415. https://doi.org/10.1371/journal.pone.0252415
Rosenthal, R. (1966). Experimenter effects in behavioral
research. Appleton-Century-Crofts.
Rosnow, R. L., & Rosenthal, R. (2009). Effect Sizes:
Why, When, and How to Use
Them. Zeitschrift Für Psychologie / Journal of
Psychology, 217(1), 6–14. https://doi.org/10.1027/0044-3409.217.1.6
Ross-Hellauer, T., Deppe, A., & Schmidt, B. (2017). Survey on open
peer review: Attitudes and experience amongst editors,
authors and reviewers. PLOS ONE, 12(12), e0189311. https://doi.org/10.1371/journal.pone.0189311
Rouder, J. N. (2014). Optional stopping: No problem for
Bayesians. Psychonomic Bulletin & Review,
21(2), 301–308. http://link.springer.com/article/10.3758/s13423-014-0595-4
Rouder, J. N., Haaf, J. M., & Snyder, H. K. (2019). Minimizing
Mistakes in Psychological Science.
Advances in Methods and Practices in Psychological Science,
2(1), 3–11. https://doi.org/10.1177/2515245918801915
Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G.
(2009). Bayesian t tests for accepting and rejecting the null
hypothesis. Psychonomic Bulletin & Review, 16(2),
225–237. https://doi.org/10.3758/PBR.16.2.225
Royall, R. (1997). Statistical Evidence: A
Likelihood Paradigm. Chapman and Hall/CRC.
Rozeboom, W. W. (1960). The fallacy of the null-hypothesis significance
test. Psychological Bulletin, 57(5), 416–428. https://doi.org/10.1037/h0042040
Rücker, G., Schwarzer, G., Carpenter, J. R., & Schumacher, M.
(2008). Undue reliance on I(2) in assessing heterogeneity
may mislead. BMC Medical Research Methodology, 8, 79.
https://doi.org/10.1186/1471-2288-8-79
Samelson, F. (1980). J B Watson’s Little
Albert, Cyril Burt’s twins, and the need for a
critical science. American Psychologist, 35(7),
619–625. https://doi.org/10.1037/0003-066X.35.7.619
Sarafoglou, A., Kovacs, M., Bakos, B., Wagenmakers, E.-J., & Aczel,
B. (2022). A survey on how preregistration affects the research
workflow: Better science but more work. Royal Society Open
Science, 9(7), 211997. https://doi.org/10.1098/rsos.211997
Scheel, A. M., Schijen, M. R. M. J., & Lakens, D. (2021). An
Excess of Positive Results:
Comparing the Standard Psychology Literature With
Registered Reports. Advances in Methods and Practices in
Psychological Science, 4(2), 25152459211007467. https://doi.org/10.1177/25152459211007467
Scheel, A. M., Tiokhin, L., Isager, P. M., & Lakens, D. (2021). Why
Hypothesis Testers Should Spend Less Time Testing
Hypotheses. Perspectives on Psychological Science,
16(4), 744–755. https://doi.org/10.1177/1745691620966795
Schimmack, U. (2012). The ironic effect of significant results on the
credibility of multiple-study articles. Psychological Methods,
17(4), 551–566. https://doi.org/10.1037/a0029487
Schmidt, S. (2009). Shall we really do it again? The
powerful concept of replication is neglected in the social sciences.
Review of General Psychology, 13(2), 90–100. https://doi.org/10.1037/a0015108
Schnuerch, M., & Erdfelder, E. (2020). Controlling decision errors
with minimal costs: The sequential probability ratio t
test. Psychological Methods, 25(2), 206–226. https://doi.org/10.1037/met0000234
Schoemann, A. M., Boulton, A. J., & Short, S. D. (2017). Determining
Power and Sample Size for Simple
and Complex Mediation Models. Social Psychological and
Personality Science, 8(4), 379–386. https://doi.org/10.1177/1948550617715068
Schönbrodt, F. D., Wagenmakers, E.-J., Zehetleitner, M., & Perugini,
M. (2017). Sequential hypothesis testing with Bayes
factors: Efficiently testing mean differences.
Psychological Methods, 22(2), 322–339. https://doi.org/10.1037/MET0000061
Schuirmann, D. J. (1987). A comparison of the two one-sided tests
procedure and the power approach for assessing the equivalence of
average bioavailability. Journal of Pharmacokinetics and
Biopharmaceutics, 15(6), 657–680. http://link.springer.com/article/10.1007/BF01068419
Schulz, K. F., & Grimes, D. A. (2005). Sample size calculations in
randomised trials: Mandatory and mystical. The Lancet,
365(9467), 1348–1353. https://doi.org/10.1016/S0140-6736(05)61034-3
Schumi, J., & Wittes, J. T. (2011). Through the looking glass:
Understanding non-inferiority. Trials, 12(1), 106. https://doi.org/10.1186/1745-6215-12-106
Schweder, T., & Hjort, N. L. (2016). Confidence,
Likelihood, Probability: Statistical
Inference with Confidence Distributions.
Cambridge University Press. https://doi.org/10.1017/CBO9781139046671
Scull, A. (2023). Rosenhan revisited: Successful scientific fraud.
History of Psychiatry, 0957154X221150878. https://doi.org/10.1177/0957154X221150878
Seaman, M. A., & Serlin, R. C. (1998). Equivalence confidence
intervals for two-group comparisons of means. Psychological
Methods, 3(4), 403–411. https://doi.org/http://dx.doi.org.dianus.libr.tue.nl/10.1037/1082-989X.3.4.403
Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical
power have an effect on the power of studies? Psychological
Bulletin, 105(2), 309–316. https://doi.org/10.1037/0033-2909.105.2.309
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2001).
Experimental and quasi-experimental designs for generalized causal
inference. Houghton Mifflin.
Shafer, G. (1976). A mathematical theory of evidence. Princeton
University Press.
Shmueli, G. (2010). To explain or to predict? Statistical
Science, 25(3), 289–310.
Sidman, M. (1960). Tactics of Scientific Research:
Evaluating Experimental Data in
Psychology (New edition). Cambridge Center for
Behavioral.
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011).
False-Positive Psychology: Undisclosed
Flexibility in Data Collection and Analysis
Allows Presenting Anything as Significant.
Psychological Science, 22(11), 1359–1366. https://doi.org/10.1177/0956797611417632
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2013, January
17–19). Life after P-Hacking. Meeting of the
Society for Personality and Social
Psychology, New Orleans, LA. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2205186
Simons, D. J., Shoda, Y., & Lindsay, D. S. (2017). Constraints on
Generality (COG): A Proposed
Addition to All Empirical Papers. Perspectives
on Psychological Science, 12(6), 1123–1128. https://doi.org/10.1177/1745691617708630
Simonsohn, U. (2015). Small telescopes: Detectability and
the evaluation of replication results. Psychological Science,
26(5), 559–569. https://doi.org/10.1177/0956797614567341
Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve:
A key to the file-drawer. Journal of Experimental
Psychology: General, 143(2), 534. http://psycnet.apa.org/journals/xge/143/2/534/
Smart, R. G. (1964). The importance of negative results in psychological
research. Canadian Psychologist / Psychologie Canadienne,
5a(4), 225–232. https://doi.org/10.1037/h0083036
Smith, N. C. (1970). Replication studies: A neglected
aspect of psychological research. American Psychologist,
25(10), 970–975. https://doi.org/10.1037/h0029774
Smithson, M. (2003). Confidence intervals. Sage Publications.
Sotola, L. K. (2022). Garbage In, Garbage Out?
Evaluating the Evidentiary Value of Published Meta-analyses Using Z-Curve Analysis.
Collabra: Psychology, 8(1), 32571. https://doi.org/10.1525/collabra.32571
Spanos, A. (1999). Probability theory and statistical inference:
Econometric modeling with observational data. Cambridge University
Press.
Spanos, A. (2013). Who should be afraid of the
Jeffreys-Lindley paradox? Philosophy of Science,
80(1), 73–93. https://doi.org/10.1086/668875
Spellman, B. A. (2015). A Short (Personal)
Future History of Revolution 2.0.
Perspectives on Psychological Science, 10(6), 886–899.
https://doi.org/10.1177/1745691615609918
Spence, J. R., & Stanley, D. J. (2024). Tempered
Expectations: A Tutorial for
Calculating and Interpreting Prediction
Intervals in the Context of
Replications. Advances in Methods and Practices in
Psychological Science, 7(1), 25152459231217932. https://doi.org/10.1177/25152459231217932
Spiegelhalter, D. (2019). The Art of
Statistics: How to Learn from
Data (Illustrated edition). Basic Books.
Spiegelhalter, D. J., Freedman, L. S., & Blackburn, P. R. (1986).
Monitoring clinical trials: Conditional or predictive power?
Controlled Clinical Trials, 7(1), 8–17. https://doi.org/10.1016/0197-2456(86)90003-6
Stanley, T. D., & Doucouliagos, H. (2014). Meta-regression
approximations to reduce publication selection bias. Research
Synthesis Methods, 5(1), 60–78. https://doi.org/10.1002/jrsm.1095
Stanley, T. D., Doucouliagos, H., & Ioannidis, J. P. A. (2017).
Finding the power to reduce publication bias: Finding the
power to reduce publication bias. Statistics in Medicine. https://doi.org/10.1002/sim.7228
Steiger, J. H. (2004). Beyond the F Test: Effect Size
Confidence Intervals and Tests of Close
Fit in the Analysis of Variance and
Contrast Analysis. Psychological Methods,
9(2), 164–182. https://doi.org/10.1037/1082-989X.9.2.164
Sterling, T. D. (1959). Publication Decisions and
Their Possible Effects on Inferences Drawn
from Tests of Significance–Or Vice Versa.
Journal of the American Statistical Association,
54(285), 30–34. https://doi.org/10.2307/2282137
Stewart, L. A., & Tierney, J. F. (2002). To IPD or not
to IPD?: Advantages and
Disadvantages of Systematic Reviews Using Individual
Patient Data. Evaluation & the Health Professions,
25(1), 76–97. https://doi.org/10.1177/0163278702025001006
Stodden, V., Seiler, J., & Ma, Z. (2018). An empirical analysis of
journal policy effectiveness for computational reproducibility.
Proceedings of the National Academy of Sciences,
115(11), 2584–2589. https://doi.org/10.1073/pnas.1708290115
Strand, J. F. (2023). Error tight: Exercises for lab groups
to prevent research mistakes. Psychological Methods, No
Pagination Specified–No Pagination Specified. https://doi.org/10.1037/met0000547
Stroebe, W., & Strack, F. (2014). The Alleged Crisis
and the Illusion of Exact Replication.
Perspectives on Psychological Science, 9(1), 59–71. https://doi.org/10.1177/1745691613514450
Stroop, J. R. (1935). Studies of interference in serial verbal
reactions. Journal of Experimental Psychology, 18(6),
643–662.
Swift, J. K., Link to external site, this link will open in a new
window, Christopherson, C. D., Link to external site, this link will
open in a new window, Bird, M. O., Link to external site, this link will
open in a new window, Zöld, A., Link to external site, this link will
open in a new window, Goode, J., & Link to external site, this link
will open in a new window. (2022). Questionable research practices among
faculty and students in APA-accredited
clinical and counseling psychology doctoral programs. Training and
Education in Professional Psychology, 16(3), 299–305. https://doi.org/10.1037/tep0000322
Taper, M. L., & Lele, S. R. (2011). Philosophy of
Statistics. In P. S. Bandyophadhyay & M. R. Forster
(Eds.), Evidence, evidence functions, and error probabilities
(pp. 513–531). Elsevier, USA.
Taylor, D. J., & Muller, K. E. (1996). Bias in linear model power
and sample size calculation due to estimating noncentrality.
Communications in Statistics-Theory and Methods,
25(7), 1595–1610. https://doi.org/10.1080/03610929608831787
Teare, M. D., Dimairo, M., Shephard, N., Hayman, A., Whitehead, A.,
& Walters, S. J. (2014). Sample size requirements to estimate key
design parameters from external pilot randomised controlled trials: A
simulation study. Trials, 15(1), 264. https://doi.org/10.1186/1745-6215-15-264
Tendeiro, J. N., & Kiers, H. A. L. (2019). A review of issues about
null hypothesis Bayesian testing. Psychological
Methods. https://doi.org/10.1037/met0000221
Tendeiro, J. N., Kiers, H. A. L., Hoekstra, R., Wong, T. K., &
Morey, R. D. (2024). Diagnosing the Misuse of the
Bayes Factor in Applied Research. Advances
in Methods and Practices in Psychological Science, 7(1),
25152459231213371. https://doi.org/10.1177/25152459231213371
ter Schure, J., & Grünwald, P. D. (2019). Accumulation
Bias in Meta-Analysis: The Need
to Consider Time in Error Control. http://arxiv.org/abs/1905.13494
Terrin, N., Schmid, C. H., Lau, J., & Olkin, I. (2003). Adjusting
for publication bias in the presence of heterogeneity. Statistics in
Medicine, 22(13), 2113–2126. https://doi.org/10.1002/sim.1461
Thompson, B. (2007). Effect sizes, confidence intervals, and confidence
intervals for effect sizes. Psychology in the Schools,
44(5), 423–432. https://doi.org/10.1002/pits.20234
Tunç, D. U., & Tunç, M. N. (2023). A Falsificationist
Treatment of Auxiliary Hypotheses in
Social and Behavioral Sciences:
Systematic Replications Framework.
Meta-Psychology, 7. https://doi.org/10.15626/MP.2021.2756
Tversky, A. (1977). Features of similarity. Psychological
Review, 84(4), 327–352. https://doi.org/10.1037/0033-295X.84.4.327
Tversky, A., & Kahneman, D. (1971). Belief in the law of small
numbers. Psychological Bulletin, 76(2), 105–110. https://doi.org/10.1037/h0031322
Ulrich, R., & Miller, J. (2018). Some properties of p-curves, with
an application to gradual publication bias. Psychological
Methods, 23(3), 546–560. https://doi.org/10.1037/met0000125
Uygun Tunç, D., & Tunç, M. N. (2022). A Falsificationist
Treatment of Auxiliary Hypotheses in
Social and Behavioral Sciences:
Systematic Replications Framework.
Meta-Psychology. https://doi.org/10.31234/osf.io/pdm7y
Uygun Tunç, D., Tunç, M. N., & Lakens, D. (2023). The epistemic and
pragmatic function of dichotomous claims based on statistical hypothesis
tests. Theory & Psychology, 33(3), 403–423. https://doi.org/10.1177/09593543231160112
Valentine, J. C., Pigott, T. D., & Rothstein, H. R. (2010). How
Many Studies Do You Need?: A Primer on
Statistical Power for Meta-Analysis.
Journal of Educational and Behavioral Statistics,
35(2), 215–247. https://doi.org/10.3102/1076998609346961
van de Schoot, R., Winter, S. D., Griffioen, E., Grimmelikhuijsen, S.,
Arts, I., Veen, D., Grandfield, E. M., & Tummers, L. G. (2021). The
Use of Questionable Research Practices to
Survive in Academia Examined With Expert
Elicitation, Prior-Data Conflicts, Bayes
Factors for Replication Effects, and the Bayes
Truth Serum. Frontiers in Psychology, 12. https://www.frontiersin.org/article/10.3389/fpsyg.2021.621547
van de Schoot, R., Winter, S. D., Ryan, O., Zondervan-Zwijnenburg, M.,
& Depaoli, S. (2017). A systematic review of Bayesian
articles in psychology: The last 25 years.
Psychological Methods, 22(2), 217–239. https://doi.org/10.1037/met0000100
Van Fraassen, B. C. (1980). The scientific image. Clarendon
Press ; Oxford University Press.
van ’t Veer, A. E., & Giner-Sorolla, R. (2016). Pre-registration in
social psychology—A discussion and suggested template.
Journal of Experimental Social Psychology, 67, 2–12.
https://doi.org/10.1016/j.jesp.2016.03.004
Varkey, B. (2021). Principles of Clinical Ethics and
Their Application to Practice. Medical
Principles and Practice: International Journal of the Kuwait University,
Health Science Centre, 30(1), 17–28. https://doi.org/10.1159/000509119
Vazire, S. (2017). Quality Uncertainty Erodes Trust in
Science. Collabra: Psychology, 3(1), 1.
https://doi.org/10.1525/collabra.74
Vazire, S., & Holcombe, A. O. (2022). Where Are the
Self-Correcting Mechanisms in Science?
Review of General Psychology, 26(2), 212–223. https://doi.org/10.1177/10892680211033912
Verschuere, B., Meijer, E. H., Jim, A., Hoogesteyn, K., Orthey, R.,
McCarthy, R. J., Skowronski, J. J., Acar, O. A., Aczel, B., Bakos, B.
E., Barbosa, F., Baskin, E., Bègue, L., Ben-Shakhar, G., Birt, A. R.,
Blatz, L., Charman, S. D., Claesen, A., Clay, S. L., … Yıldız, E.
(2018). Registered Replication Report on
Mazar, Amir, and Ariely (2008).
Advances in Methods and Practices in Psychological Science,
1(3), 299–317. https://doi.org/10.1177/2515245918781032
Viamonte, S. M., Ball, K. K., & Kilgore, M. (2006). A
Cost-Benefit Analysis of Risk-Reduction Strategies
Targeted at Older Drivers. Traffic Injury
Prevention, 7(4), 352–359. https://doi.org/10.1080/15389580600791362
Viechtbauer, W. (2010). Conducting meta-analyses in R with
the metafor package. J Stat Softw, 36(3), 1–48.
https://doi.org/http://dx.doi.org/10.18637/jss.v036.i03
Vohs, K. D., Schmeichel, B. J., Lohmann, S., Gronau, Q. F., Finley, A.
J., Ainsworth, S. E., Alquist, J. L., Baker, M. D., Brizi, A., Bunyi,
A., Butschek, G. J., Campbell, C., Capaldi, J., Cau, C., Chambers, H.,
Chatzisarantis, N. L. D., Christensen, W. J., Clay, S. L., Curtis, J., …
Albarracín, D. (2021). A Multisite Preregistered Paradigmatic
Test of the Ego-Depletion Effect. Psychological
Science, 32(10), 1566–1581. https://doi.org/10.1177/0956797621989733
Vosgerau, J., Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2019).
99% impossible: A valid, or falsifiable, internal
meta-analysis. Journal of Experimental Psychology. General,
148(9), 1628–1639. https://doi.org/10.1037/xge0000663
Vuorre, M., & Curley, J. P. (2018). Curating Research
Assets: A Tutorial on the Git Version Control
System. Advances in Methods and Practices in Psychological
Science, 1(2), 219–236. https://doi.org/10.1177/2515245918754826
Wacholder, S., Chanock, S., Garcia-Closas, M., El ghormli, L., &
Rothman, N. (2004). Assessing the Probability That a
Positive Report is False: An
Approach for Molecular Epidemiology Studies.
JNCI Journal of the National Cancer Institute, 96(6),
434–442. https://doi.org/10.1093/jnci/djh075
Wagenmakers, E.-J. (2007). A practical solution to the pervasive
problems of p values. Psychonomic Bulletin & Review,
14(5), 779–804. https://doi.org/10.3758/BF03194105
Wagenmakers, E.-J., Beek, T., Dijkhoff, L., Gronau, Q. F., Acosta, A.,
Adams, R. B., Albohn, D. N., Allard, E. S., Benning, S. D.,
Blouin-Hudon, E.-M., Bulnes, L. C., Caldwell, T. L., Calin-Jageman, R.
J., Capaldi, C. A., Carfagno, N. S., Chasten, K. T., Cleeremans, A.,
Connell, L., DeCicco, J. M., … Zwaan, R. A. (2016). Registered
Replication Report: Strack,
Martin, & Stepper (1988). Perspectives
on Psychological Science, 11(6), 917–928. https://doi.org/10.1177/1745691616674458
Wagenmakers, E.-J., Wetzels, R., Borsboom, D., & van der Maas, H. L.
J. (2011). Why psychologists must change the way they analyze their
data: The case of psi: Comment on Bem (2011). Journal
of Personality and Social Psychology, 100(3), 426–432. https://doi.org/10.1037/a0022790
Wald, A. (1945). Sequential tests of statistical hypotheses. The
Annals of Mathematical Statistics, 16(2), 117–186.
https://doi.org/https://www.jstor.org/stable/2240273
Waldron, S., & Allen, C. (2022). Not all pre-registrations are
equal. Neuropsychopharmacology, 47(13, 13), 2181–2183.
https://doi.org/10.1038/s41386-022-01418-x
Wang, B., Zhou, Z., Wang, H., Tu, X. M., & Feng, C. (2019). The
p-value and model specification in statistics. General
Psychiatry, 32(3), e100081. https://doi.org/10.1136/gpsych-2019-100081
Wason, P. C. (1960). On the failure to eliminate hypotheses in a
conceptual task. Quarterly Journal of Experimental Psychology,
12(3), 129–140. https://doi.org/10.1080/17470216008416717
Wassmer, G., & Brannath, W. (2016). Group
Sequential and Confirmatory Adaptive Designs
in Clinical Trials. Springer International Publishing.
https://doi.org/10.1007/978-3-319-32562-0
Weinshall-Margel, K., & Shapard, J. (2011). Overlooked factors in
the analysis of parole decisions. Proceedings of the National
Academy of Sciences, 108(42), E833–E833. https://doi.org/10.1073/pnas.1110910108
Wellek, S. (2010). Testing statistical hypotheses of equivalence and
noninferiority (2nd ed). CRC Press.
Westberg, M. (1985). Combining Independent Statistical
Tests. Journal of the Royal Statistical Society. Series D
(The Statistician), 34(3), 287–296. https://doi.org/10.2307/2987655
Westfall, J., Kenny, D. A., & Judd, C. M. (2014). Statistical power
and optimal design in experiments in which samples of participants
respond to samples of stimuli. Journal of Experimental Psychology:
General, 143(5), 2020–2045. https://doi.org/10.1037/xge0000014
Westlake, W. J. (1972). Use of Confidence Intervals in
Analysis of Comparative Bioavailability
Trials. Journal of Pharmaceutical Sciences,
61(8), 1340–1341. https://doi.org/10.1002/JPS.2600610845
Whitney, S. N. (2016). Balanced Ethics Review.
Springer International Publishing. https://doi.org/10.1007/978-3-319-20705-6
Wicherts, J. M. (2011). Psychology must learn a lesson from fraud case.
Nature, 480(7375, 7375), 7–7. https://doi.org/10.1038/480007a
Wicherts, J. M., Veldkamp, C. L. S., Augusteijn, H. E. M., Bakker, M.,
Aert, V., M, R. C., Assen, V., & M, M. A. L. (2016). Degrees of
Freedom in Planning, Running,
Analyzing, and Reporting Psychological
Studies: A Checklist to Avoid
p-Hacking. Frontiers in Psychology, 7. https://doi.org/10.3389/fpsyg.2016.01832
Wiebels, K., & Moreau, D. (2021). Leveraging Containers
for Reproducible Psychological Research. Advances in
Methods and Practices in Psychological Science, 4(2),
25152459211017853. https://doi.org/10.1177/25152459211017853
Wigboldus, D. H. J., & Dotsch, R. (2016). Encourage
Playing with Data and Discourage
Questionable Reporting Practices. Psychometrika,
81(1), 27–32. https://doi.org/10.1007/s11336-015-9445-1
Williams, R. H., Zimmerman, D. W., & Zumbo, B. D. (1995). Impact of
Measurement Error on Statistical Power:
Review of an Old Paradox. The Journal of
Experimental Education, 63(4), 363–370. https://doi.org/10.1080/00220973.1995.9943470
Wilson, E. C. F. (2015). A Practical Guide to
Value of Information Analysis.
PharmacoEconomics, 33(2), 105–121. https://doi.org/10.1007/s40273-014-0219-x
Wilson VanVoorhis, C. R., & Morgan, B. L. (2007). Understanding
power and rules of thumb for determining sample sizes. Tutorials in
Quantitative Methods for Psychology, 3(2), 43–50. https://doi.org/10.20982/tqmp.03.2.p043
Winer, B. J. (1962). Statistical principles in experimental
design. New York : McGraw-Hill. https://trove.nla.gov.au/version/39914160
Wingen, T., Berkessel, J. B., & Englich, B. (2020). No
Replication, No Trust? How Low
Replicability Influences Trust in Psychology.
Social Psychological and Personality Science, 11(4),
454–463. https://doi.org/10.1177/1948550619877412
Wiseman, R., Watt, C., & Kornbrot, D. (2019). Registered reports: An
early example and analysis. PeerJ, 7, e6232. https://doi.org/10.7717/peerj.6232
Wittes, J., & Brittain, E. (1990). The role of internal pilot
studies in increasing the efficiency of clinical trials. Statistics
in Medicine, 9(1-2), 65–72. https://doi.org/10.1002/sim.4780090113
Wong, T. K., Kiers, H., & Tendeiro, J. (2022). On the
Potential Mismatch Between the Function of the
Bayes Factor and Researchers’
Expectations. Collabra: Psychology, 8(1),
36357. https://doi.org/10.1525/collabra.36357
Wynants, L., Calster, B. V., Collins, G. S., Riley, R. D., Heinze, G.,
Schuit, E., Bonten, M. M. J., Dahly, D. L., Damen, J. A., Debray, T. P.
A., Jong, V. M. T. de, Vos, M. D., Dhiman, P., Haller, M. C., Harhay, M.
O., Henckaerts, L., Heus, P., Kammer, M., Kreuzberger, N., … Smeden, M.
van. (2020). Prediction models for diagnosis and prognosis of covid-19:
Systematic review and critical appraisal. BMJ, 369,
m1328. https://doi.org/10.1136/bmj.m1328
Yarkoni, T., & Westfall, J. (2017). Choosing Prediction Over
Explanation in Psychology: Lessons From
Machine Learning. Perspectives on Psychological Science,
12(6), 1100–1122. https://doi.org/10.1177/1745691617693393
Yuan, K.-H., & Maxwell, S. (2005). On the Post Hoc
Power in Testing Mean Differences. Journal of
Educational and Behavioral Statistics, 30(2), 141–167. https://doi.org/10.3102/10769986030002141
Zabell, S. L. (1992). R. A. Fisher and
Fiducial Argument. Statistical Science,
7(3), 369–387. https://doi.org/10.1214/ss/1177011233
Zenko, M. (2015). Red Team: How to
Succeed By Thinking Like the Enemy (1st
edition). Basic Books.
Zumbo, B. D., & Hubley, A. M. (1998). A note on misconceptions
concerning prospective and retrospective power. Journal of the Royal
Statistical Society: Series D (The Statistician), 47(2),
385–388. https://doi.org/10.1111/1467-9884.00139