参考
Abelson, P. (2003). The Value of Life and
Health for Public Policy. Economic
Record, 79, S2–S13. https://doi.org/10.1111/1475-4932.00087
Aberson, C. L. (2019). Applied Power Analysis for the
Behavioral Sciences (2nd ed.). Routledge.
Aert, R. C. M. van, & Assen, M. A. L. M. van. (2018). Correcting
for Publication Bias in a Meta-Analysis with
the P-uniform* Method.
MetaArXiv. https://doi.org/10.31222/osf.io/zqjr9
Agnoli, F., Wicherts, J. M., Veldkamp, C. L. S., Albiero, P., &
Cubelli, R. (2017). Questionable research practices among italian
research psychologists. PLOS ONE, 12(3), e0172792. https://doi.org/10.1371/journal.pone.0172792
Albers, C. J., Kiers, H. A. L., & Ravenzwaaij, D. van. (2018).
Credible Confidence: A Pragmatic View on the
Frequentist vs Bayesian Debate. Collabra:
Psychology, 4(1), 31. https://doi.org/10.1525/collabra.149
Albers, C. J., & Lakens, D. (2018). When power analyses based on
pilot data are biased: Inaccurate effect size estimators
and follow-up bias. Journal of Experimental Social Psychology,
74, 187–195. https://doi.org/10.1016/j.jesp.2017.09.004
Allison, D. B., Allison, R. L., Faith, M. S., Paultre, F., &
Pi-Sunyer, F. X. (1997). Power and money: Designing
statistically powerful studies while minimizing financial costs.
Psychological Methods, 2(1), 20–33. https://doi.org/10.1037/1082-989X.2.1.20
Altman, D. G., & Bland, J. M. (1995). Statistics notes:
Absence of evidence is not evidence of absence.
BMJ, 311(7003), 485. https://doi.org/10.1136/bmj.311.7003.485
Anderson, M. S., Ronning, E. A., De Vries, R., & Martinson, B. C.
(2007). The perverse effects of competition on scientists’ work and
relationships. Science and Engineering Ethics, 13(4),
437–461.
Anderson, S. F., Kelley, K., & Maxwell, S. E. (2017). Sample-size
planning for more accurate statistical power: A method
adjusting sample effect sizes for publication bias and uncertainty.
Psychological Science, 28(11), 1547–1562. https://doi.org/10.1177/0956797617723724
Anderson, S. F., & Maxwell, S. E. (2016). There’s more than one way
to conduct a replication study: Beyond statistical
significance. Psychological Methods, 21(1), 1–12. https://doi.org/10.1037/met0000051
Anvari, F., Kievit, R., Lakens, D., Pennington, C. R., Przybylski, A.
K., Tiokhin, L., Wiernik, B. M., & Orben, A. (2021). Not all effects
are indispensable: Psychological science requires
verifiable lines of reasoning for whether an effect matters.
Perspectives on Psychological Science. https://doi.org/10.31234/osf.io/g3vtr
Anvari, F., & Lakens, D. (2018). The replicability crisis and public
trust in psychological science. Comprehensive Results in Social
Psychology, 3(3), 266–286. https://doi.org/10.1080/23743603.2019.1684822
Anvari, F., & Lakens, D. (2021). Using anchor-based methods to
determine the smallest effect size of interest. Journal of
Experimental Social Psychology, 96, 104159. https://doi.org/10.1016/j.jesp.2021.104159
Appelbaum, M., Cooper, H., Kline, R. B., Mayo-Wilson, E., Nezu, A. M.,
& Rao, S. M. (2018). Journal article reporting standards for
quantitative research in psychology: The APA Publications
and Communications Board task force report. American
Psychologist, 73(1), 3. https://doi.org/10.1037/amp0000191
Armitage, P., McPherson, C. K., & Rowe, B. C. (1969). Repeated
significance tests on accumulating data. Journal of the Royal
Statistical Society: Series A (General), 132(2), 235–244.
Bacchetti, P. (2010). Current sample size conventions:
Flaws, harms, and alternatives. BMC Medicine,
8(1), 17. https://doi.org/10.1186/1741-7015-8-17
Baguley, T. (2004). Understanding statistical power in the context of
applied research. Applied Ergonomics, 35(2), 73–80. https://doi.org/10.1016/j.apergo.2004.01.002
Baguley, T. (2009). Standardized or simple effect size:
What should be reported? British Journal of
Psychology, 100(3), 603–617. https://doi.org/10.1348/000712608X377117
Bakker, B. N., Kokil, J., Dörr, T., Fasching, N., & Lelkes, Y.
(2021). Questionable and Open Research Practices:
Attitudes and Perceptions among
Quantitative Communication Researchers. Journal of
Communication, 71(5), 715–738. https://doi.org/10.1093/joc/jqab031
Ball, K., Berch, D. B., Helmers, K. F., Jobe, J. B., Leveck, M. D.,
Marsiske, M., Morris, J. N., Rebok, G. W., Smith, D. M., &
Tennstedt, S. L. (2002). Effects of cognitive training interventions
with older adults: A randomized controlled trial. Jama,
288(18), 2271–2281.
Bartoš, F., & Schimmack, U. (2020). Z-Curve.2.0:
Estimating Replication Rates and Discovery
Rates. https://doi.org/10.31234/osf.io/urgtn
Bauer, P., & Kieser, M. (1996). A unifying approach for confidence
intervals and testing of equivalence and difference.
Biometrika, 83(4), 934–937.
Bausell, R. B., & Li, Y.-F. (2002). Power Analysis
for Experimental Research: A Practical Guide
for the Biological, Medical and Social
Sciences (1st edition). Cambridge University
Press.
Becker, B. J. (2005). Failsafe N or File-Drawer
Number. In Publication Bias in
Meta-Analysis (pp. 111–125). John Wiley &
Sons, Ltd. https://doi.org/10.1002/0470870168.ch7
Bem, D. J. (2011). Feeling the future: Experimental evidence for
anomalous retroactive influences on cognition and affect. Journal of
Personality and Social Psychology, 100(3), 407–425. https://doi.org/10.1037/a0021524
Berkeley, G. (1735). A defence of free-thinking in mathematics, in
answer to a pamphlet of Philalethes Cantabrigiensis
entitled Geometry No Friend to Infidelity.
Also an appendix concerning mr. Walton’s
Vindication of the principles of fluxions against the
objections contained in The analyst. By the
author of The minute philosopher (Vol. 3).
Bird, S. B., & Sivilotti, M. L. A. (2008). Self-plagiarism,
recycling fraud, and the intent to mislead. Journal of Medical
Toxicology, 4(2), 69–70. https://doi.org/10.1007/BF03160957
Bishop, D. V. M. (2018). Fallibility in Science:
Responding to Errors in the Work
of Oneself and Others. Advances in Methods
and Practices in Psychological Science, 2515245918776632. https://doi.org/10.1177/2515245918776632
Bland, M. (2015). An introduction to medical statistics (Fourth
edition). Oxford University Press.
Bosco, F. A., Aguinis, H., Singh, K., Field, J. G., & Pierce, C. A.
(2015). Correlational effect size benchmarks. The Journal of Applied
Psychology, 100(2), 431–449. https://doi.org/10.1037/a0038047
Brown, G. W. (1983). Errors, Types I and II.
American Journal of Diseases of Children, 137(6),
586–591. https://doi.org/10.1001/archpedi.1983.02140320062014
Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM
Test: A Simple Technique Detects Numerous Anomalies
in the Reporting of Results in
Psychology. Social Psychological and Personality
Science, 8(4), 363–369. https://doi.org/10.1177/1948550616673876
Brunner, J., & Schimmack, U. (2020). Estimating Population
Mean Power Under Conditions of Heterogeneity and
Selection for Significance.
Meta-Psychology, 4. https://doi.org/10.15626/MP.2018.874
Brysbaert, M. (2019). How many participants do we have to include in
properly powered experiments? A tutorial of power analysis
with reference tables. Journal of Cognition, 2(1), 16.
https://doi.org/10.5334/joc.72
Brysbaert, M., & Stevens, M. (2018). Power Analysis and
Effect Size in Mixed Effects Models: A
Tutorial. Journal of Cognition, 1(1). https://doi.org/10.5334/joc.10
Bulus, M., & Dong, N. (2021). Bound Constrained
Optimization of Sample Sizes Subject to
Monetary Restrictions in Planning Multilevel
Randomized Trials and Regression Discontinuity
Studies. The Journal of Experimental Education,
89(2), 379–401. https://doi.org/10.1080/00220973.2019.1636197
Burriss, R. P., Troscianko, J., Lovell, P. G., Fulford, A. J. C.,
Stevens, M., Quigley, R., Payne, J., Saxton, T. K., & Rowland, H. M.
(2015). Changes in women’s facial skin color over the ovulatory cycle
are not detectable by the human visual system. PLOS ONE,
10(7), e0130093. https://doi.org/10.1371/journal.pone.0130093
Caplan, A. L. (2021). How Should We Regard Information
Gathered in Nazi Experiments? AMA Journal of
Ethics, 23(1), 55–58. https://doi.org/10.1001/amajethics.2021.55
Carter, E. C., & McCullough, M. E. (2014). Publication bias and the
limited strength model of self-control: Has the evidence for ego
depletion been overestimated? Frontiers in Psychology,
5. https://doi.org/10.3389/fpsyg.2014.00823
Carter, E. C., Schönbrodt, F. D., Gervais, W. M., & Hilgard, J.
(2019). Correcting for Bias in Psychology:
A Comparison of Meta-Analytic Methods.
Advances in Methods and Practices in Psychological Science,
2(2), 115–144. https://doi.org/10.1177/2515245919847196
Cascio, W. F., & Zedeck, S. (1983). Open a New Window
in Rational Research Planning: Adjust Alpha to
Maximize Statistical Power. Personnel Psychology,
36(3), 517–526. https://doi.org/10.1111/j.1744-6570.1983.tb02233.x
Chalmers, I., & Glasziou, P. (2009). Avoidable waste in the
production and reporting of research evidence. The Lancet,
374(9683), 86–89.
Chambers, C. D., & Tzavella, L. (2022). The past, present and future
of Registered Reports. Nature Human Behaviour,
6(1), 29–42. https://doi.org/10.1038/s41562-021-01193-7
Chang, M. (2016). Adaptive Design Theory and
Implementation Using SAS and R (2nd
edition). Chapman and Hall/CRC.
Chin, J. M., Pickett, J. T., Vazire, S., & Holcombe, A. O. (2021).
Questionable Research Practices and Open
Science in Quantitative Criminology. Journal of
Quantitative Criminology. https://doi.org/10.1007/s10940-021-09525-6
Cho, H.-C., & Abe, S. (2013). Is two-tailed testing for directional
research hypotheses tests legitimate? Journal of Business
Research, 66(9), 1261–1266. https://doi.org/10.1016/j.jbusres.2012.02.023
Cohen, J. (1988). Statistical power analysis for the behavioral
sciences (2nd ed). L. Erlbaum Associates.
Cook, J., Hislop, J., Adewuyi, T., Harrild, K., Altman, D., Ramsay, C.,
Fraser, C., Buckley, B., Fayers, P., Harvey, I., Briggs, A., Norrie, J.,
Fergusson, D., Ford, I., & Vale, L. (2014). Assessing methods to
specify the target difference for a randomised controlled trial:
DELTA (Difference ELicitation in
TriAls) review. Health Technology Assessment,
18(28). https://doi.org/10.3310/hta18280
Cook, T. D. (2002). P-Value Adjustment in Sequential
Clinical Trials. Biometrics, 58(4), 1005–1011.
Correll, J., Mellinger, C., McClelland, G. H., & Judd, C. M. (2020).
Avoid Cohen’s “Small,”
“Medium,” and
“Large” for Power Analysis.
Trends in Cognitive Sciences, 24(3), 200–207. https://doi.org/10.1016/j.tics.2019.12.009
Cousineau, D., & Chiasson, F. (2019). Superb:
Computes standard error and confidence interval of means
under various designs and sampling schemes [Manual].
Cox, D. R. (1958). Some Problems Connected with
Statistical Inference. Annals of Mathematical
Statistics, 29(2), 357–372. https://doi.org/10.1214/aoms/1177706618
Cribbie, R. A., Gruman, J. A., & Arpin-Cribbie, C. A. (2004).
Recommendations for applying tests of equivalence. Journal of
Clinical Psychology, 60(1), 1–10.
Cumming, G. (2008). Replication and p
Intervals: p Values
Predict the Future Only Vaguely, but
Confidence Intervals Do Much Better. Perspectives on
Psychological Science, 3(4), 286–300. https://doi.org/10.1111/j.1745-6924.2008.00079.x
Cumming, G. (2013). Understanding the new statistics:
Effect sizes, confidence intervals, and meta-analysis.
Routledge.
Cumming, G. (2014). The New Statistics: Why
and How. Psychological Science, 25(1),
7–29. https://doi.org/10.1177/0956797613504966
Cumming, G., & Calin-Jageman, R. (2016). Introduction to the
New Statistics: Estimation, Open
Science, and Beyond. Routledge.
DeBruine, L. M., & Barr, D. J. (2021). Understanding
Mixed-Effects Models Through Data Simulation. Advances
in Methods and Practices in Psychological Science, 4(1),
2515245920965119. https://doi.org/10.1177/2515245920965119
Delacre, M., Lakens, D., & Leys, C. (2017). Why Psychologists
Should by Default Use Welch’s
t-test Instead of
Student’s t-test. International
Review of Social Psychology, 30(1). https://doi.org/10.5334/irsp.82
Detsky, A. S. (1990). Using cost-effectiveness analysis to improve the
efficiency of allocating funds to clinical trials. Statistics in
Medicine, 9(1-2), 173–184. https://doi.org/10.1002/sim.4780090124
Dienes, Z. (2008). Understanding psychology as a science:
An introduction to scientific and statistical
inference. Palgrave Macmillan.
Dienes, Z. (2014). Using Bayes to get the most out of
non-significant results. Frontiers in Psychology, 5.
https://doi.org/10.3389/fpsyg.2014.00781
Dodge, H. F., & Romig, H. G. (1929). A Method of
Sampling Inspection. Bell System Technical
Journal, 8(4), 613–631. https://doi.org/10.1002/j.1538-7305.1929.tb01240.x
Dupont, W. D. (1983). Sequential stopping rules and sequentially
adjusted P values: Does one require the other?
Controlled Clinical Trials, 4(1), 3–10. https://doi.org/10.1016/S0197-2456(83)80003-8
Ebersole, C. R., Atherton, O. E., Belanger, A. L., Skulborstad, H. M.,
Allen, J. M., Banks, J. B., Baranski, E., Bernstein, M. J., Bonfiglio,
D. B. V., Boucher, L., Brown, E. R., Budiman, N. I., Cairo, A. H.,
Capaldi, C. A., Chartier, C. R., Chung, J. M., Cicero, D. C., Coleman,
J. A., Conway, J. G., … Nosek, B. A. (2016). Many Labs 3:
Evaluating participant pool quality across the academic
semester via replication. Journal of Experimental Social
Psychology, 67, 68–82. https://doi.org/10.1016/j.jesp.2015.10.012
Eckermann, S., Karnon, J., & Willan, A. R. (2010). The
Value of Value of Information.
PharmacoEconomics, 28(9), 699–709. https://doi.org/10.2165/11537370-000000000-00000
Edwards, M. A., & Roy, S. (2017). Academic Research in
the 21st Century: Maintaining Scientific
Integrity in a Climate of Perverse
Incentives and Hypercompetition. Environmental
Engineering Science, 34(1), 51–61. https://doi.org/10.1089/ees.2016.0223
Erdfelder, E., Faul, F., & Buchner, A. (1996). GPOWER:
A general power analysis program. Behavior Research
Methods, Instruments, & Computers, 28(1), 1–11. https://doi.org/10.3758/BF03203630
Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007).
GPower 3: A flexible statistical power
analysis program for the social, behavioral, and biomedical sciences.
Behavior Research Methods, 39(2), 175–191. https://doi.org/10.3758/BF03193146
Ferguson, C. J., & Heene, M. (2012). A vast graveyard of undead
theories publication bias and psychological science’s aversion to the
null. Perspectives on Psychological Science, 7(6),
555–561.
Ferguson, C. J., & Heene, M. (2021). Providing a lower-bound
estimate for psychology’s “crud factor”: The
case of aggression. Professional Psychology: Research and
Practice, 52(6), 620–626. https://doi.org/http://dx.doi.org/10.1037/pro0000386
Ferron, J., & Onghena, P. (1996). The Power of
Randomization Tests for Single-Case Phase
Designs. The Journal of Experimental Education,
64(3), 231–239. https://doi.org/10.1080/00220973.1996.9943805
Fiedler, K., & Schwarz, N. (2016). Questionable Research
Practices Revisited. Social Psychological and Personality
Science, 7(1), 45–52. https://doi.org/10.1177/1948550615612150
Field, S. A., Tyre, A. J., Jonzén, N., Rhodes, J. R., & Possingham,
H. P. (2004). Minimizing the cost of environmental management decisions
by optimizing statistical thresholds. Ecology Letters,
7(8), 669–675. https://doi.org/10.1111/j.1461-0248.2004.00625.x
Fisher, Ronald Aylmer. (1935). The design of experiments.
Oliver And Boyd; Edinburgh; London.
Fisher, Ronald A. (1956). Statistical methods and scientific
inference: Vol. viii. Hafner Publishing Co.
Fraley, R. C., & Vazire, S. (2014). The N-Pact Factor:
Evaluating the Quality of Empirical
Journals with Respect to Sample Size
and Statistical Power. PLOS ONE, 9(10),
e109019. https://doi.org/10.1371/journal.pone.0109019
Francis, G. (2014). The frequency of excess success for articles in
Psychological Science. Psychonomic Bulletin &
Review, 21(5), 1180–1187. https://doi.org/10.3758/s13423-014-0601-x
Franco, A., Malhotra, N., & Simonovits, G. (2014). Publication bias
in the social sciences: Unlocking the file drawer.
Science, 345(6203), 1502–1505. https://doi.org/10.1126/SCIENCE.1255484
Fraser, H., Parker, T., Nakagawa, S., Barnett, A., & Fidler, F.
(2018). Questionable research practices in ecology and evolution.
PLOS ONE, 13(7), e0200303. https://doi.org/10.1371/journal.pone.0200303
Fried, B. J., Boers, M., & Baker, P. R. (1993). A method for
achieving consensus on rheumatoid arthritis outcome measures: The
OMERACT conference process. The Journal of
Rheumatology, 20(3), 548–551.
Friede, T., & Kieser, M. (2006). Sample size recalculation in
internal pilot study designs: A review. Biometrical Journal: Journal
of Mathematical Methods in Biosciences, 48(4), 537–555. https://doi.org/10.1002/bimj.200510238
Fugard, A. J. B., & Potts, H. W. W. (2015). Supporting thinking on
sample sizes for thematic analyses: A quantitative tool.
International Journal of Social Research Methodology,
18(6), 669–684. https://doi.org/10.1080/13645579.2015.1005453
Funder, D. C., & Ozer, D. J. (2019). Evaluating effect size in
psychological research: Sense and nonsense. Advances in
Methods and Practices in Psychological Science, 2(2),
156–168. https://doi.org/10.1177/2515245919847202
Gillon, R. (1994). Medical ethics: Four principles plus attention to
scope. BMJ, 309(6948), 184. https://doi.org/10.1136/bmj.309.6948.184
Good, I. J. (1992). The Bayes/Non-Bayes
compromise: A brief review. Journal of the American
Statistical Association, 87(419), 597–606. https://doi.org/10.2307/2290192
Gopalakrishna, G., Riet, G. ter, Vink, G., Stoop, I., Wicherts, J. M.,
& Bouter, L. M. (2022). Prevalence of questionable research
practices, research misconduct and their potential explanatory factors:
A survey among academic researchers in The
Netherlands. PLOS ONE, 17(2), e0263023. https://doi.org/10.1371/journal.pone.0263023
Gosset, W. S. (1904). The Application of the
"Law of Error" to the Work of the
Brewery (1 vol 8; pp. 3–16). Arthur Guinness
& Son, Ltd.
Green, P., & MacLeod, C. J. (2016). SIMR: An
R package for power analysis of generalized linear mixed
models by simulation. Methods in Ecology and Evolution,
7(4), 493–498. https://doi.org/10.1111/2041-210X.12504
Green, S. B. (1991). How Many Subjects Does It Take To Do A
Regression Analysis. Multivariate Behavioral Research,
26(3), 499–510. https://doi.org/10.1207/s15327906mbr2603_7
Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C.,
Goodman, S. N., & Altman, D. G. (2016). Statistical tests,
P values, confidence intervals, and power: A guide to
misinterpretations. European Journal of Epidemiology,
31(4), 337–350. https://doi.org/10.1007/s10654-016-0149-3
Greenwald, A. G. (1975). Consequences of prejudice against the null
hypothesis. Psychological Bulletin, 82(1), 1–20.
Grünwald, P., de Heide, R., & Koolen, W. (2019). Safe
Testing. arXiv:1906.07801 [Cs, Math, Stat]. https://arxiv.org/abs/1906.07801
Gupta, S. K. (2011). Intention-to-treat concept: A review.
Perspectives in Clinical Research, 2(3), 109–112. https://doi.org/10.4103/2229-3485.83221
Hacking, I. (1965). Logic of Statistical
Inference. Cambridge University Press.
Hagger, M. S., Chatzisarantis, N. L. D., Alberts, H., Anggono, C. O.,
Batailler, C., Birt, A. R., Brand, R., Brandt, M. J., Brewer, G.,
Bruyneel, S., Calvillo, D. P., Campbell, W. K., Cannon, P. R., Carlucci,
M., Carruth, N. P., Cheung, T., Crowell, A., De Ridder, D. T. D.,
Dewitte, S., … Zwienenberg, M. (2016). A Multilab Preregistered
Replication of the Ego-Depletion Effect.
Perspectives on Psychological Science, 11(4), 546–573.
https://doi.org/10.1177/1745691616652873
Hallahan, M., & Rosenthal, R. (1996). Statistical power:
Concepts, procedures, and applications. Behaviour
Research and Therapy, 34(5), 489–499. https://doi.org/10.1016/0005-7967(95)00082-8
Halpern, J., Brown Jr, B. W., & Hornberger, J. (2001). The sample
size for a clinical trial: A Bayesian decision theoretic
approach. Statistics in Medicine, 20(6), 841–858. https://doi.org/10.1002/sim.703
Halpern, S. D., Karlawish, J. H., & Berlin, J. A. (2002). The
continuing unethical conduct of underpowered clinical trials.
Jama, 288(3), 358–362. https://doi.org/doi:10.1001/jama.288.3.358
Harms, C., & Lakens, D. (2018). Making ’null effects’ informative:
Statistical techniques and inferential frameworks. Journal of
Clinical and Translational Research, 3, 382–393. https://doi.org/10.18053/jctres.03.2017S2.007
Hauck, D. W. W., & Anderson, S. (1984). A new statistical procedure
for testing equivalence in two-group comparative bioavailability trials.
Journal of Pharmacokinetics and Biopharmaceutics,
12(1), 83–91. https://doi.org/10.1007/BF01063612
Hempel, C. G. (1966). Philosophy of natural science (Nachdr.).
Prentice-Hall.
Hilgard, J. (2021). Maximal positive controls: A method for
estimating the largest plausible effect size. Journal of
Experimental Social Psychology, 93. https://doi.org/10.1016/j.jesp.2020.104082
Hill, C. J., Bloom, H. S., Black, A. R., & Lipsey, M. W. (2008).
Empirical Benchmarks for Interpreting Effect
Sizes in Research. Child Development
Perspectives, 2(3), 172–177. https://doi.org/10.1111/j.1750-8606.2008.00061.x
Hodges, J. L., & Lehmann, E. L. (1954). Testing the
Approximate Validity of Statistical
Hypotheses. Journal of the Royal Statistical Society. Series
B (Methodological), 16(2), 261–268. https://doi.org/10.1111/j.2517-6161.1954.tb00169.x
Hoenig, J. M., & Heisey, D. M. (2001). The abuse of power: The
pervasive fallacy of power calculations for data analysis. The
American Statistician, 55(1), 19–24. https://doi.org/10.1198/000313001300339897
Hung, H. M. J., O’Neill, R. T., Bauer, P., & Kohne, K. (1997). The
Behavior of the P-Value When the
Alternative Hypothesis is True.
Biometrics, 53(1), 11–22. https://doi.org/10.2307/2533093
Hyde, J. S., Lindberg, S. M., Linn, M. C., Ellis, A. B., & Williams,
C. C. (2008). Gender Similarities Characterize Math
Performance. Science, 321(5888), 494–495. https://doi.org/10.1126/science.1160364
Ioannidis, J. P. A., & Trikalinos, T. A. (2007). An exploratory test
for an excess of significant findings. Clinical Trials,
4(3), 245–253. https://doi.org/10.1177/1740774507079441
Iyengar, S., & Greenhouse, J. B. (1988). Selection
Models and the File Drawer Problem.
Statistical Science, 3(1), 109–117. https://www.jstor.org/stable/2245925
Jaeschke, R., Singer, J., & Guyatt, G. H. (1989). Measurement of
health status: Ascertaining the minimal clinically
important difference. Controlled Clinical Trials,
10(4), 407–415. https://doi.org/10.1016/0197-2456(89)90005-6
Jeffreys, H. (1939). Theory of probability (1st ed).
Oxford University Press.
Jennison, C., & Turnbull, B. W. (2000). Group sequential methods
with applications to clinical trials. Chapman &
Hall/CRC.
Johansson, T. (2011). Hail the impossible: P-values, evidence, and
likelihood. Scandinavian Journal of Psychology, 52(2),
113–125. https://doi.org/10.1111/j.1467-9450.2010.00852.x
John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the
prevalence of questionable research practices with incentives for truth
telling. Psychological Science, 23(5), 524–532. https://doi.org/10.1177/0956797611430953
Johnson, V. E. (2013). Revised standards for statistical evidence.
Proceedings of the National Academy of Sciences,
110(48), 19313–19317. https://doi.org/10.1073/pnas.1313476110
Jostmann, N. B., Lakens, D., & Schubert, T. W. (2009). Weight as an
Embodiment of Importance. Psychological
Science, 20(9), 1169–1174. https://doi.org/10.1111/j.1467-9280.2009.02426.x
Jostmann, N. B., Lakens, D., & Schubert, T. W. (2016). A short
history of the weight-importance effect and a recommendation for
pre-testing: Commentary on Ebersole et al.
(2016). Journal of Experimental Social Psychology, 67,
93–94. https://doi.org/10.1016/j.jesp.2015.12.001
Julious, S. A. (2004). Sample sizes for clinical trials with normal
data. Statistics in Medicine, 23(12), 1921–1986. https://doi.org/10.1002/sim.1783
Keefe, R. S. E., Kraemer, H. C., Epstein, R. S., Frank, E., Haynes, G.,
Laughren, T. P., Mcnulty, J., Reed, S. D., Sanchez, J., & Leon, A.
C. (2013). Defining a
Clinically Meaningful Effect for the Design
and Interpretation of Randomized Controlled
Trials. Innovations in Clinical Neuroscience,
10(5-6 Suppl A), 4S–19S.
Kelley, K. (2007). Confidence Intervals for
Standardized Effect Sizes: Theory,
Application, and Implementation. Journal
of Statistical Software, 20(8). https://doi.org/10.18637/JSS.V020.I08
Kelley, K., & Preacher, K. J. (2012). On effect size.
Psychological Methods, 17(2), 137–152. https://doi.org/10.1037/a0028086
Kelley, K., & Rausch, J. R. (2006). Sample size planning for the
standardized mean difference: Accuracy in parameter estimation via
narrow confidence intervals. Psychological Methods,
11(4), 363–385. https://doi.org/10.1037
Kenny, D. A., & Judd, C. M. (2019). The unappreciated heterogeneity
of effect sizes: Implications for power, precision,
planning of research, and replication. Psychological Methods,
24(5), 578–589. https://doi.org/10.1037/met0000209
King, M. T. (2011). A point of minimal important difference
(MID): A critique of terminology and methods. Expert
Review of Pharmacoeconomics & Outcomes Research,
11(2), 171–184. https://doi.org/10.1586/erp.11.9
Kish, L. (1965). Survey Sampling.
Wiley.
Komić, D., Marušić, S. L., & Marušić, A. (2015). Research
Integrity and Research Ethics in
Professional Codes of Ethics:
Survey of Terminology Used by
Professional Organizations across Research
Disciplines. PLOS ONE, 10(7), e0133662. https://doi.org/10.1371/journal.pone.0133662
Kraft, M. A. (2020). Interpreting effect sizes of education
interventions. Educational Researcher, 49(4), 241–253.
https://doi.org/10.3102/0013189X20912798
Kruschke, J. K. (2011). Bayesian assessment of null values via parameter
estimation and model comparison. Perspectives on Psychological
Science, 6(3), 299–312.
Kruschke, J. K. (2013). Bayesian estimation supersedes the t test.
Journal of Experimental Psychology: General, 142(2),
573–603. https://doi.org/10.1037/a0029146
Kruschke, J. K. (2014). Doing Bayesian Data Analysis,
Second Edition: A Tutorial with
R, JAGS, and Stan (2
edition). Academic Press.
Kruschke, J. K. (2018). Rejecting or Accepting Parameter
Values in Bayesian Estimation. Advances in
Methods and Practices in Psychological Science, 1(2),
270–280. https://doi.org/10.1177/2515245918771304
Kruschke, J. K., & Liddell, T. M. (2017). The Bayesian New
Statistics: Hypothesis testing, estimation,
meta-analysis, and power analysis from a Bayesian
perspective. Psychonomic Bulletin & Review. https://doi.org/10.3758/s13423-016-1221-4
Lakens, D. (2014). Performing high-powered studies efficiently with
sequential analyses: Sequential analyses. European
Journal of Social Psychology, 44(7), 701–710. https://doi.org/10.1002/ejsp.2023
Lakens, D. (2017). Equivalence Tests: A Practical
Primer for t Tests, Correlations, and
Meta-Analyses. Social Psychological and Personality
Science, 8(4), 355–362. https://doi.org/10.1177/1948550617697177
Lakens, D. (2019). The value of preregistration for psychological
science: A conceptual analysis. Japanese Psychological
Review, 62(3), 221–230. https://doi.org/10.24602/sjpr.62.3_221
Lakens, D. (2021). The practical alternative to the p value is the
correctly used p value. Perspectives on Psychological Science,
16(3), 639–648. https://doi.org/10.1177/1745691620958012
Lakens, D. (2022). Why P values are not measures of
evidence. Trends in Ecology & Evolution, 37(4),
289–290. https://doi.org/10.1016/j.tree.2021.12.006
Lakens, D., Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A. J.,
Argamon, S. E., Baguley, T., Becker, R. B., Benning, S. D., Bradford, D.
E., Buchanan, E. M., Caldwell, A. R., Calster, B., Carlsson, R., Chen,
S.-C., Chung, B., Colling, L. J., Collins, G. S., Crook, Z., … Zwaan, R.
A. (2018). Justify your alpha. Nature Human Behaviour,
2, 168–171. https://doi.org/10.1038/s41562-018-0311-x
Lakens, D., & Caldwell, A. R. (2021). Simulation-Based Power
Analysis for Factorial Analysis of Variance
Designs. Advances in Methods and Practices in Psychological
Science, 4(1). https://doi.org/10.1177/2515245920951503
Lakens, D., & Etz, A. J. (2017). Too True to be
Bad: When Sets of Studies With
Significant and Nonsignificant Findings Are Probably
True. Social Psychological and Personality Science,
8(8), 875–881. https://doi.org/10.1177/1948550617693058
Lakens, D., Scheel, A. M., & Isager, P. M. (2018). Equivalence
testing for psychological research: A tutorial.
Advances in Methods and Practices in Psychological Science,
1(2), 259–269. https://doi.org/10.1177/2515245918770963
Lan, K. K. G., & DeMets, D. L. (1983). Discrete Sequential
Boundaries for Clinical Trials. Biometrika,
70(3), 659. https://doi.org/10.2307/2336502
Latan, H., Chiappetta Jabbour, C. J., Lopes de Sousa Jabbour, A. B.,
& Ali, M. (2021). Crossing the Red Line?
Empirical Evidence and Useful Recommendations
on Questionable Research Practices among Business
Scholars. Journal of Business Ethics, 1–21. https://doi.org/10.1007/s10551-021-04961-7
Leamer, E. E. (1978). Specification Searches: Ad
Hoc Inference with Nonexperimental Data (1
edition). Wiley.
Lehmann, E. L., & Romano, J. P. (2005). Testing statistical
hypotheses (3rd ed). Springer.
Lenth, R. V. (2001). Some practical guidelines for effective sample size
determination. The American Statistician, 55(3),
187–193. https://doi.org/10.1198/000313001317098149
Lenth, R. V. (2007). Post hoc power: Tables and commentary. Iowa
City: Department of Statistics and Actuarial Science, University of
Iowa.
Leon, A. C., Davis, L. L., & Kraemer, H. C. (2011). The
Role and Interpretation of Pilot
Studies in Clinical Research. Journal of
Psychiatric Research, 45(5), 626–629. https://doi.org/10.1016/j.jpsychires.2010.10.008
Levine, T. R., Weber, R., Park, H. S., & Hullett, C. R. (2008). A
communication researchers’ guide to null hypothesis significance testing
and alternatives. Human Communication Research, 34(2),
188–209.
Lindley, D. V. (1957). A statistical paradox. Biometrika,
44(1/2), 187–192.
Lindsay, D. S. (2015). Replication in Psychological
Science. Psychological Science, 26(12),
1827–1832. https://doi.org/10.1177/0956797615616374
Lovakov, A., & Agadullina, E. R. (2021). Empirically derived
guidelines for effect size interpretation in social psychology.
European Journal of Social Psychology, 51(3), 485–504.
https://doi.org/10.1002/ejsp.2752
Maier, M., & Lakens, D. (2022). Justify your alpha: A
primer on two practical approaches. Advances in Methods and
Practices in Psychological Science. https://doi.org/10.31234/osf.io/ts4r6
Makel, M. C., Hodges, J., Cook, B. G., & Plucker, J. A. (2021). Both
Questionable and Open Research Practices Are
Prevalent in Education Research. Educational
Researcher, 50(8), 493–504. https://doi.org/10.3102/0013189X211001356
Marshall, B., Cardon, P., Poddar, A., & Fontenot, R. (2013). Does
Sample Size Matter in Qualitative Research?:
A Review of Qualitative Interviews in is
Research. Journal of Computer Information Systems,
54(1), 11–22. https://doi.org/10.1080/08874417.2013.11645667
Maxwell, S. E., Delaney, H. D., & Kelley, K. (2017). Designing
Experiments and Analyzing Data: A Model
Comparison Perspective, Third Edition (3
edition). Routledge.
Maxwell, S. E., & Kelley, K. (2011). Ethics and sample size
planning. In Handbook of ethics in quantitative methodology
(pp. 179–204). Routledge.
Maxwell, S. E., Kelley, K., & Rausch, J. R. (2008). Sample
Size Planning for Statistical Power and
Accuracy in Parameter Estimation. Annual
Review of Psychology, 59(1), 537–563. https://doi.org/10.1146/annurev.psych.59.103006.093735
Mayo, D. G. (2018). Statistical inference as severe testing: How to
get beyond the statistics wars. Cambridge University
Press.
Mazzolari, R., Porcelli, S., Bishop, D. J., & Lakens, D. (2022).
Myths and methodologies: The use of equivalence and
non-inferiority tests for interventional studies in exercise physiology
and sport science. Experimental Physiology, 107(3),
201–212. https://doi.org/10.1113/EP090171
McElreath, R. (2016). Statistical Rethinking: A
Bayesian Course with Examples in R and
Stan (Vol. 122). CRC Press.
McIntosh, R. D., & Rittmo, J. Ö. (2021). Power calculations in
single-case neuropsychology: A practical primer.
Cortex, 135, 146–158. https://doi.org/10.1016/j.cortex.2020.11.005
Meehl, P. E. (1990). Appraising and amending theories: The
strategy of Lakatosian defense and two principles that
warrant it. Psychological Inquiry, 1(2), 108–141. https://doi.org/10.1207/s15327965pli0102_1
Meyners, M. (2012). Equivalence tests A review. Food
Quality and Preference, 26(2), 231–245. https://doi.org/10.1016/j.foodqual.2012.05.003
Meyvis, T., & Van Osselaer, S. M. J. (2018). Increasing the
Power of Your Study by Increasing
the Effect Size. Journal of Consumer Research,
44(5), 1157–1173. https://doi.org/10.1093/jcr/ucx110
Miller, J. (2009). What is the probability of replicating a
statistically significant effect? Psychonomic Bulletin &
Review, 16(4), 617–640. https://doi.org/10.3758/PBR.16.4.617
Miller, J., & Ulrich, R. (2019). The quest for an optimal alpha.
PLOS ONE, 14(1), e0208631. https://doi.org/10.1371/journal.pone.0208631
Moe, K. (1984). Should the Nazi Research Data Be Cited?
The Hastings Center Report, 14(6), 5–7. https://doi.org/10.2307/3561733
Moran, C., Link to external site, this link will open in a new window,
Richard, A., Link to external site, this link will open in a new window,
Wilson, K., Twomey, R., Link to external site, this link will open in a
new window, Coroiu, A., & Link to external site, this link will open
in a new window. (2022). I know it’s bad, but I have been
pressured into it: Questionable research practices among
psychology students in Canada. Canadian
Psychology/Psychologie Canadienne. https://doi.org/10.1037/cap0000326
Morey, R. D. (2020). Power and precision [Blog].
https://medium.com/@richarddmorey/power-and-precision-47f644ddea5e.
Morris, T. P., White, I. R., & Crowther, M. J. (2019). Using
simulation studies to evaluate statistical methods. Statistics in
Medicine, 38(11), 2074–2102. https://doi.org/10.1002/sim.8086
Morse, J. M. (1995). The Significance of
Saturation. Qualitative Health Research,
5(2), 147–149. https://doi.org/10.1177/104973239500500201
Moshontz, H., Campbell, L., Ebersole, C. R., IJzerman, H., Urry, H. L.,
Forscher, P. S., Grahe, J. E., McCarthy, R. J., Musser, E. D., &
Antfolk, J. (2018). The Psychological Science Accelerator:
Advancing psychology through a distributed collaborative
network. Advances in Methods and Practices in Psychological
Science, 1(4), 501–515. https://doi.org/10.1177/2515245918797607
Motyl, M., Demos, A. P., Carsel, T. S., Hanson, B. E., Melton, Z. J.,
Mueller, A. B., Prims, J. P., Sun, J., Washburn, A. N., Wong, K. M.,
Yantis, C., & Skitka, L. J. (2017). The state of social and
personality science: Rotten to the core, not so bad,
getting better, or getting worse? Journal of Personality and Social
Psychology, 113, 34–58. https://doi.org/10.1037/pspa0000084
Mrozek, J. R., & Taylor, L. O. (2002). What determines the value of
life? A meta-analysis. Journal of Policy Analysis and
Management, 21(2), 253–270. https://doi.org/10.1002/pam.10026
Mudge, J. F., Baker, L. F., Edge, C. B., & Houlahan, J. E. (2012).
Setting an Optimal α That Minimizes
Errors in Null Hypothesis Significance Tests.
PLOS ONE, 7(2), e32734. https://doi.org/10.1371/journal.pone.0032734
Mullan, F., & Jacoby, I. (1985). The town meeting for technology:
The maturation of consensus conferences. JAMA,
254(8), 1068–1072. https://doi.org/10.1001/jama.1985.03360080080035
Murphy, K. R., & Myors, B. (1999). Testing the hypothesis that
treatments have negligible effects: Minimum-effect tests in the general linear model.
Journal of Applied Psychology, 84(2), 234–248. https://doi.org/10.1037/0021-9010.84.2.234
Murphy, K. R., Myors, B., & Wolach, A. H. (2014). Statistical
power analysis: A simple and general model for traditional and modern
hypothesis tests (Fourth edition). Routledge, Taylor &
Francis Group.
Neyman, J. (1957). "Inductive Behavior" as a Basic
Concept of Philosophy of Science.
Revue de l’Institut International de Statistique / Review of the
International Statistical Institute, 25(1/3), 7. https://doi.org/10.2307/1401671
Neyman, J., & Pearson, E. S. (1933). On the problem of the most
efficient tests of statistical hypotheses. Philosophical
Transactions of the Royal Society of London A: Mathematical, Physical
and Engineering Sciences, 231(694-706), 289–337. https://doi.org/10.1098/rsta.1933.0009
Nickerson, R. S. (2000). Null hypothesis significance testing:
A review of an old and continuing controversy.
Psychological Methods, 5(2), 241–301. https://doi.org/10.1037//1082-989X.5.2.241
Niiniluoto, I. (1998). Verisimilitude: The Third Period.
The British Journal for the Philosophy of Science, 49,
1–29.
Norman, G. R., Sloan, J. A., & Wyrwich, K. W. (2004). The truly
remarkable universality of half a standard deviation: Confirmation
through another look. Expert Review of Pharmacoeconomics &
Outcomes Research, 4(5), 581–585.
Nosek, B. A., & Lakens, D. (2014). Registered reports:
A method to increase the credibility of published results.
Social Psychology, 45(3), 137–141. https://doi.org/10.1027/1864-9335/a000192
Nuijten, M. B., Hartgerink, C. H. J., van Assen, M. A. L. M., Epskamp,
S., & Wicherts, J. M. (2015). The prevalence of statistical
reporting errors in psychology (1985). Behavior Research
Methods. https://doi.org/10.3758/s13428-015-0664-2
Nunnally, J. (1960). The place of statistics in psychology.
Educational and Psychological Measurement, 20(4),
641–650. https://doi.org/10.1177/001316446002000401
Olsson-Collentine, A., Wicherts, J. M., & van Assen, M. A. L. M.
(2020). Heterogeneity in direct replications in psychology and its
association with effect size. Psychological Bulletin,
146(10), 922–940. https://doi.org/10.1037/bul0000294
Orben, A., & Lakens, D. (2020). Crud
(Re)Defined. Advances in Methods and
Practices in Psychological Science, 3(2), 238–247. https://doi.org/10.1177/2515245920917961
Parker, R. A., & Berman, N. G. (2003). Sample Size.
The American Statistician, 57(3), 166–170. https://doi.org/10.1198/0003130031919
Parkhurst, D. F. (2001). Statistical significance tests:
Equivalence and reverse tests should reduce
misinterpretation. Bioscience, 51(12), 1051–1057. https://doi.org/10.1641/0006-3568(2001)051[1051:SSTEAR]2.0.CO;2
Parsons, S., Kruijt, A.-W., & Fox, E. (2019). Psychological
Science Needs a Standard Practice of
Reporting the Reliability of
Cognitive-Behavioral Measurements. Advances in Methods
and Practices in Psychological Science, 2(4), 378–395. https://doi.org/10.1177/2515245919879695
Pemberton, M., Hall, S., Moskovitz, C., & Anson, C. M. (2019). Text
recycling: Views of North American journal
editors from an interview-based study. Learned Publishing,
32(4), 355–366. https://doi.org/10.1002/leap.1259
Perugini, M., Gallucci, M., & Costantini, G. (2014). Safeguard power
as a protection against imprecise power estimates. Perspectives on
Psychological Science, 9(3), 319–332. https://doi.org/10.1177/1745691614528519
Perugini, M., Gallucci, M., & Costantini, G. (2018). A
Practical Primer To Power Analysis for Simple
Experimental Designs. International Review of Social
Psychology, 31(1), 20. https://doi.org/10.5334/irsp.181
Peters, J. L., Sutton, A. J., Jones, D. R., Abrams, K. R., &
Rushton, L. (2007). Performance of the trim and fill method in the
presence of publication bias and between-study heterogeneity.
Statistics in Medicine, 26(25), 4544–4562. https://doi.org/10.1002/sim.2889
Phillips, B. M., Hunt, J. W., Anderson, B. S., Puckett, H. M., Fairey,
R., Wilson, C. J., & Tjeerdema, R. (2001). Statistical significance
of sediment toxicity test results: Threshold values derived
by the detectable significance approach. Environmental Toxicology
and Chemistry, 20(2), 371–373. https://doi.org/10.1002/etc.5620200218
Pickett, J. T., & Roche, S. P. (2017). Questionable,
Objectionable or Criminal? Public
Opinion on Data Fraud and Selective
Reporting in Science. Science and Engineering
Ethics, 1–21. https://doi.org/10.1007/s11948-017-9886-2
Pocock, S. J. (1977). Group sequential methods in the design and
analysis of clinical trials. Biometrika, 64(2),
191–199. https://doi.org/10.1093/biomet/64.2.191
Polanin, J. R., Hennessy, E. A., & Tsuji, S. (2020). Transparency
and Reproducibility of Meta-Analyses in
Psychology: A Meta-Review. Perspectives on
Psychological Science, 15(4), 1026–1041. https://doi.org/10.1177/1745691620906416
Popper, K. R. (2002). The logic of scientific
discovery. Routledge.
Proschan, M. A. (2005). Two-Stage Sample Size Re-Estimation
Based on a Nuisance Parameter: A
Review. Journal of Biopharmaceutical Statistics,
15(4), 559–574. https://doi.org/10.1081/BIP-200062852
Proschan, M. A., Lan, K. K. G., & Wittes, J. T. (2006).
Statistical monitoring of clinical trials: A unified approach.
Springer.
Quertemont, E. (2011). How to Statistically Show the
Absence of an Effect. Psychologica
Belgica, 51(2), 109–127. https://doi.org/10.5334/pb-51-2-109
Rabelo, A. L. A., Farias, J. E. M., Sarmet, M. M., Joaquim, T. C. R.,
Hoersting, R. C., Victorino, L., Modesto, J. G. N., & Pilati, R.
(2020). Questionable research practices among Brazilian
psychological researchers: Results from a replication study
and an international comparison. International Journal of
Psychology, 55(4), 674–683. https://doi.org/10.1002/ijop.12632
Rice, W. R., & Gaines, S. D. (1994). ’Heads I win,
tails you lose’: Testing directional alternative hypotheses in
ecological and evolutionary research. Trends in Ecology &
Evolution, 9(6), 235–237. https://doi.org/10.1016/0169-5347(94)90258-5
Richard, F. D., Bond, C. F., & Stokes-Zoota, J. J. (2003). One
Hundred Years of Social Psychology Quantitatively
Described. Review of General Psychology, 7(4),
331–363. https://doi.org/10.1037/1089-2680.7.4.331
Richardson, J. T. E. (2011). Eta squared and partial eta squared as
measures of effect size in educational research. Educational
Research Review, 6(2), 135–147. https://doi.org/10.1016/j.edurev.2010.12.001
Rijnsoever, F. J. van. (2017). (I Can’t Get
No) Saturation: A simulation and
guidelines for sample sizes in qualitative research. PLOS ONE,
12(7), e0181689. https://doi.org/10.1371/journal.pone.0181689
Rogers, J. L., Howard, K. I., & Vessey, J. T. (1993). Using
significance tests to evaluate equivalence between two experimental
groups. Psychological Bulletin, 113(3), 553–565.
https://doi.org/http://dx.doi.org/10.1037/0033-2909.113.3.553
Rogers, S. (1992). How a publicity blitz created the myth of subliminal
advertising. Public Relations Quarterly, 37(4), 12.
Ropovik, I., Adamkovic, M., & Greger, D. (2021). Neglect of
publication bias compromises meta-analyses of educational research.
PLOS ONE, 16(6), e0252415. https://doi.org/10.1371/journal.pone.0252415
Scheel, A. M., Schijen, M. R. M. J., & Lakens, D. (2021). An
Excess of Positive Results:
Comparing the Standard Psychology Literature With
Registered Reports. Advances in Methods and Practices in
Psychological Science, 4(2), 25152459211007467. https://doi.org/10.1177/25152459211007467
Scheel, A. M., Tiokhin, L., Isager, P. M., & Lakens, D. (2021). Why
Hypothesis Testers Should Spend Less Time Testing
Hypotheses. Perspectives on Psychological Science,
16(4), 744–755. https://doi.org/10.1177/1745691620966795
Schimmack, U. (2012). The ironic effect of significant results on the
credibility of multiple-study articles. Psychological Methods,
17(4), 551–566. https://doi.org/10.1037/a0029487
Schnuerch, M., & Erdfelder, E. (2020). Controlling decision errors
with minimal costs: The sequential probability ratio t
test. Psychological Methods, 25(2), 206–226. https://doi.org/10.1037/met0000234
Schoemann, A. M., Boulton, A. J., & Short, S. D. (2017). Determining
Power and Sample Size for Simple
and Complex Mediation Models. Social Psychological and
Personality Science, 8(4), 379–386. https://doi.org/10.1177/1948550617715068
Schönbrodt, F. D., Wagenmakers, E.-J., Zehetleitner, M., & Perugini,
M. (2017). Sequential hypothesis testing with Bayes
factors: Efficiently testing mean differences.
Psychological Methods, 22(2), 322–339. https://doi.org/10.1037/MET0000061
Schuirmann, D. J. (1987). A comparison of the two one-sided tests
procedure and the power approach for assessing the equivalence of
average bioavailability. Journal of Pharmacokinetics and
Biopharmaceutics, 15(6), 657–680.
Schulz, K. F., & Grimes, D. A. (2005). Sample size calculations in
randomised trials: Mandatory and mystical. The Lancet,
365(9467), 1348–1353. https://doi.org/10.1016/S0140-6736(05)61034-3
Schumi, J., & Wittes, J. T. (2011). Through the looking glass:
Understanding non-inferiority. Trials, 12(1), 106. https://doi.org/10.1186/1745-6215-12-106
Schweder, T., & Hjort, N. L. (2016). Confidence,
Likelihood, Probability: Statistical
Inference with Confidence Distributions.
Cambridge University Press. https://doi.org/10.1017/CBO9781139046671
Scull, A. (2023). Rosenhan revisited: Successful scientific fraud.
History of Psychiatry, 0957154X221150878. https://doi.org/10.1177/0957154X221150878
Seaman, M. A., & Serlin, R. C. (1998). Equivalence confidence
intervals for two-group comparisons of means. Psychological
Methods, 3(4), 403–411. https://doi.org/http://dx.doi.org.dianus.libr.tue.nl/10.1037/1082-989X.3.4.403
Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical
power have an effect on the power of studies? Psychological
Bulletin, 105(2), 309–316. https://doi.org/10.1037/0033-2909.105.2.309
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011).
False-Positive Psychology: Undisclosed
Flexibility in Data Collection and Analysis
Allows Presenting Anything as Significant.
Psychological Science, 22(11), 1359–1366. https://doi.org/10.1177/0956797611417632
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2013). Life
after P-Hacking.
Simonsohn, U. (2015). Small telescopes: Detectability and
the evaluation of replication results. Psychological Science,
26(5), 559–569. https://doi.org/10.1177/0956797614567341
Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve:
A key to the file-drawer. Journal of Experimental
Psychology: General, 143(2), 534.
Smithson, M. (2003). Confidence intervals. Sage
Publications.
Sotola, L. K. (2022). Garbage In, Garbage Out?
Evaluating the Evidentiary Value of Published Meta-analyses Using Z-Curve Analysis.
Collabra: Psychology, 8(1), 32571. https://doi.org/10.1525/collabra.32571
Spanos, A. (2013). Who should be afraid of the
Jeffreys-Lindley paradox? Philosophy of Science,
80(1), 73–93. https://doi.org/10.1086/668875
Spiegelhalter, D. J., Freedman, L. S., & Blackburn, P. R. (1986).
Monitoring clinical trials: Conditional or predictive power?
Controlled Clinical Trials, 7(1), 8–17. https://doi.org/10.1016/0197-2456(86)90003-6
Stanley, T. D., & Doucouliagos, H. (2014). Meta-regression
approximations to reduce publication selection bias. Research
Synthesis Methods, 5(1), 60–78. https://doi.org/10.1002/jrsm.1095
Stanley, T. D., Doucouliagos, H., & Ioannidis, J. P. A. (2017).
Finding the power to reduce publication bias: Finding the
power to reduce publication bias. Statistics in Medicine. https://doi.org/10.1002/sim.7228
Sterling, T. D. (1959). Publication Decisions and
Their Possible Effects on Inferences Drawn
from Tests of Significance–Or Vice Versa.
Journal of the American Statistical Association,
54(285), 30–34. https://doi.org/10.2307/2282137
Swift, J. K., Link to external site, this link will open in a new
window, Christopherson, C. D., Link to external site, this link will
open in a new window, Bird, M. O., Link to external site, this link will
open in a new window, Zöld, A., Link to external site, this link will
open in a new window, Goode, J., & Link to external site, this link
will open in a new window. (2022). Questionable research practices among
faculty and students in APA-accredited
clinical and counseling psychology doctoral programs. Training and
Education in Professional Psychology, 16(3), 299–305. https://doi.org/10.1037/tep0000322
Taylor, D. J., & Muller, K. E. (1996). Bias in linear model power
and sample size calculation due to estimating noncentrality.
Communications in Statistics-Theory and Methods,
25(7), 1595–1610. https://doi.org/10.1080/03610929608831787
Teare, M. D., Dimairo, M., Shephard, N., Hayman, A., Whitehead, A.,
& Walters, S. J. (2014). Sample size requirements to estimate key
design parameters from external pilot randomised controlled trials: A
simulation study. Trials, 15(1), 264. https://doi.org/10.1186/1745-6215-15-264
ter Schure, J., & Grünwald, P. D. (2019). Accumulation
Bias in Meta-Analysis: The Need
to Consider Time in Error Control.
arXiv:1905.13494 [Math, Stat]. https://arxiv.org/abs/1905.13494
Terrin, N., Schmid, C. H., Lau, J., & Olkin, I. (2003). Adjusting
for publication bias in the presence of heterogeneity. Statistics in
Medicine, 22(13), 2113–2126. https://doi.org/10.1002/sim.1461
Tversky, A. (1977). Features of similarity. Psychological
Review, 84(4), 327–352. https://doi.org/10.1037/0033-295X.84.4.327
Tversky, A., & Kahneman, D. (1971). Belief in the law of small
numbers. Psychological Bulletin, 76(2), 105–110. https://doi.org/10.1037/h0031322
Ulrich, R., & Miller, J. (2018). Some properties of p-curves, with
an application to gradual publication bias. Psychological
Methods, 23(3), 546–560. https://doi.org/10.1037/met0000125
van de Schoot, R., Winter, S. D., Griffioen, E., Grimmelikhuijsen, S.,
Arts, I., Veen, D., Grandfield, E. M., & Tummers, L. G. (2021). The
Use of Questionable Research Practices to
Survive in Academia Examined With Expert
Elicitation, Prior-Data Conflicts, Bayes
Factors for Replication Effects, and the Bayes
Truth Serum. Frontiers in Psychology, 12.
Varkey, B. (2021). Principles of Clinical Ethics and
Their Application to Practice. Medical
Principles and Practice: International Journal of the Kuwait University,
Health Science Centre, 30(1), 17–28. https://doi.org/10.1159/000509119
Viamonte, S. M., Ball, K. K., & Kilgore, M. (2006). A
Cost-Benefit Analysis of Risk-Reduction Strategies
Targeted at Older Drivers. Traffic Injury
Prevention, 7(4), 352–359. https://doi.org/10.1080/15389580600791362
Vohs, K. D., Schmeichel, B. J., Lohmann, S., Gronau, Q. F., Finley, A.
J., Ainsworth, S. E., Alquist, J. L., Baker, M. D., Brizi, A., Bunyi,
A., Butschek, G. J., Campbell, C., Capaldi, J., Cau, C., Chambers, H.,
Chatzisarantis, N. L. D., Christensen, W. J., Clay, S. L., Curtis, J., …
Albarracín, D. (2021). A Multisite Preregistered Paradigmatic
Test of the Ego-Depletion Effect. Psychological
Science, 32(10), 1566–1581. https://doi.org/10.1177/0956797621989733
Wald, A. (1945). Sequential tests of statistical hypotheses. The
Annals of Mathematical Statistics, 16(2), 117–186.
https://doi.org/https://www.jstor.org/stable/2240273
Wassmer, G., & Brannath, W. (2016). Group
Sequential and Confirmatory Adaptive Designs
in Clinical Trials. Springer International
Publishing. https://doi.org/10.1007/978-3-319-32562-0
Wellek, S. (2010). Testing statistical hypotheses of equivalence and
noninferiority (2nd ed). CRC Press.
Westfall, J., Kenny, D. A., & Judd, C. M. (2014). Statistical power
and optimal design in experiments in which samples of participants
respond to samples of stimuli. Journal of Experimental Psychology:
General, 143(5), 2020–2045. https://doi.org/10.1037/xge0000014
Westlake, W. J. (1972). Use of Confidence Intervals in
Analysis of Comparative Bioavailability
Trials. Journal of Pharmaceutical Sciences,
61(8), 1340–1341. https://doi.org/10.1002/JPS.2600610845
Whitney, S. N. (2016). Balanced Ethics Review.
Springer International Publishing. https://doi.org/10.1007/978-3-319-20705-6
Wigboldus, D. H. J., & Dotsch, R. (2016). Encourage
Playing with Data and Discourage
Questionable Reporting Practices. Psychometrika,
81(1), 27–32. https://doi.org/10.1007/s11336-015-9445-1
Williams, R. H., Zimmerman, D. W., & Zumbo, B. D. (1995). Impact of
Measurement Error on Statistical Power:
Review of an Old Paradox. The Journal of
Experimental Education, 63(4), 363–370. https://doi.org/10.1080/00220973.1995.9943470
Wilson, E. C. F. (2015). A Practical Guide to
Value of Information Analysis.
PharmacoEconomics, 33(2), 105–121. https://doi.org/10.1007/s40273-014-0219-x
Wilson VanVoorhis, C. R., & Morgan, B. L. (2007). Understanding
power and rules of thumb for determining sample sizes. Tutorials in
Quantitative Methods for Psychology, 3(2), 43–50. https://doi.org/10.20982/tqmp.03.2.p043
Winer, B. J. (1962). Statistical principles in experimental
design. New York : McGraw-Hill.
Wingen, T., Berkessel, J. B., & Englich, B. (2020). No
Replication, No Trust? How Low
Replicability Influences Trust in Psychology.
Social Psychological and Personality Science, 11(4),
454–463. https://doi.org/10.1177/1948550619877412
Wittes, J., & Brittain, E. (1990). The role of internal pilot
studies in increasing the efficiency of clinical trials. Statistics
in Medicine, 9(1-2), 65–72. https://doi.org/10.1002/sim.4780090113
Yuan, K.-H., & Maxwell, S. (2005). On the Post Hoc
Power in Testing Mean Differences. Journal of
Educational and Behavioral Statistics, 30(2), 141–167. https://doi.org/10.3102/10769986030002141
Zabell, S. L. (1992). R. A. Fisher and
Fiducial Argument. Statistical Science,
7(3), 369–387. https://doi.org/10.1214/ss/1177011233
Zumbo, B. D., & Hubley, A. M. (1998). A note on misconceptions
concerning prospective and retrospective power. Journal of the Royal
Statistical Society: Series D (The Statistician), 47(2),
385–388. https://doi.org/10.1111/1467-9884.00139