New Paper by Robbie van Aert

Title: Multistep estimators of the between‐study variance: The relationship with the Paule‐Mandel estimator

Authors: Robbie C. M. van Aert & Dan Jackson

Published in: Statistics in Medicine


A wide variety of estimators of the between‐study variance are available in random‐effects meta‐analysis. Many, but not all, of these estimators are based on the method of moments. The DerSimonian‐Laird estimator is widely used in applications, but the Paule‐Mandel estimator is an alternative that is now recommended. Recently, DerSimonian and Kacker have developed two‐step moment‐based estimators of the between‐study variance. We extend these two‐step estimators so that multiple (more than two) steps are used. We establish the surprising result that the multistep estimator tends towards the Paule‐Mandel estimator as the number of steps becomes large. Hence, the iterative scheme underlying our new multistep estimator provides a hitherto unknown relationship between two‐step estimators and Paule‐Mandel estimator. Our analysis suggests that two‐step estimators are not necessarily distinct estimators in their own right; instead, they are quantities that are closely related to the usual iterative scheme that is used to calculate the Paule‐Mandel estimate. The relationship that we establish between the multistep and Paule‐Mandel estimator is another justification for the use of the latter estimator. Two‐step and multistep estimators are perhaps best conceptualized as approximate Paule‐Mandel estimators.


New Paper on “Bayesian evaluation of effect size after replicating an original study”


Title: Bayesian evaluation of effect size after replicating an original study

Authors: Robbie C. M. van Aert & Marcel A. L. M. van Assen

Published in: PLOS One


The vast majority of published results in the literature is statistically significant, which raises concerns about their reliability. The Reproducibility Project Psychology (RPP) and Experimental Economics Replication Project (EE-RP) both replicated a large number of published studies in psychology and economics. The original study and replication were statistically significant in 36.1% in RPP and 68.8% in EE-RP suggesting many null effects among the replicated studies.

However, evidence in favor of the null hypothesis cannot be examined with null hypothesis significance testing. We developed a Bayesian meta-analysis method called snapshot hybrid that is easy to use and understand and quantifies the amount of evidence in favor of a zero, small, medium and large effect. The method computes posterior model probabilities for a zero, small, medium, and large effect and adjusts for publication bias by taking into account that the original study is statistically significant.

We first analytically approximate the methods performance, and demonstrate the necessity to control for the original study’s significance to enable the accumulation of evidence for a true zero effect. Then we applied the method to the data of RPP and EE-RP, showing that the underlying effect sizes of the included studies in EE-RP are generally larger than in RPP, but that the sample sizes of especially the included studies in RPP are often too small to draw definite conclusions about the true effect size. We also illustrate how snapshot hybrid can be used to determine the required sample size of the replication akin to power analysis in null hypothesis significance testing and present an easy to use web application ( and R code for applying the method.


Postprint "Who Believes in the Storybook Image of the Scientist?"

A new manuscript by the Meta Research group is in press at Accountability in Research; primary author Coosje Veldkamp. The final paper will be available Open Access, but in the meantime find the abstract below and the postprint on PsyArxiv. Abstract:

Do lay people and scientists themselves recognize that scientists are human and therefore prone to human fallibilities such as error, bias, and even dishonesty? In a series of three experimental studies and one correlational study (total N = 3,278) we found that the ‘storybook image of the scientist’ is pervasive: American lay people and scientists from over 60 countries attributed considerably more objectivity, rationality, open-mindedness, intelligence, integrity, and communality to scientists than other highly-educated people. Moreover, scientists perceived even larger differences than lay people did. Some groups of scientists also differentiated between different categories of scientists: established scientists attributed higher levels of the scientific traits to established scientists than to early-career scientists and PhD students, and higher levels to PhD students than to early-career scientists. Female scientists attributed considerably higher levels of the scientific traits to female scientists than to male scientists. A strong belief in the storybook image and the (human) tendency to attribute higher levels of desirable traits to people in one’s own group than to people in other groups may decrease scientists’ willingness to adopt recently proposed practices to reduce error, bias and dishonesty in science.

New Paper on Researcher's Intuitions About Statistical Power

Our team member Marjan Bakker has just published a paper in Psychological Science, together with Chris Hartgerink, Jelte Wicherts and Han van der Maas. The abstract: Many psychology studies are statistically underpowered. In part, this may be because many researchers rely on intuition, rules of thumb, and prior practice (along with practical considerations) to determine the number of subjects to test. In Study 1, we surveyed 291 published research psychologists and found large discrepancies between their reports of their preferred amount of power and the actual power of their studies (calculated from their reported typical cell size, typical effect size, and acceptable alpha). Furthermore, in Study 2, 89% of the 214 respondents overestimated the power of specific research designs with a small expected effect size, and 95% underestimated the sample size needed to obtain .80 power for detecting a small effect. Neither researchers’ experience nor their knowledge predicted the bias in their self-reported power intuitions. Because many respondents reported that they based their sample sizes on rules of thumb or common practice in the field, we recommend that researchers conduct and report formal power analyses for their studies.

The paper is available here (Open Access).

Preprint of a New Paper Comparing p-curve and p-uniform

Our team member Robbie van Aert recently got his paper accepted for publication in Perspectives on Psychological Science, together with Jelte Wicherts and Marcel van Assen. The abstract: Because evidence of publication bias in psychology is overwhelming, it is important to develop techniques that correct meta-analytic estimates for publication bias. Van Assen, Van Aert, and Wicherts (2015) and Simonsohn, Nelson, and Simmons (2014a) developed p-uniform and p-curve, respectively. The methodology on which these methods are based has great promise for providing accurate meta-analytic estimates in the presence of publication bias. However, we show that in some situations p-curve behaves erratically while p-uniform may yield implausible negative effect size estimates. Moreover, we show that (and explain why) p-curve and p-uniform overestimate effect size under moderate to large heterogeneity, and may yield unpredictable bias when researchers employ p-hacking. We offer hands-on recommendations on applying and interpreting results of meta-analysis in general and p-uniform and p-curve in particular. Both methods as well as traditional methods are applied to a meta-analysis on the effect of weight on judgments of importance. We offer guidance for applying p-uniform or p-curve using R and a user-friendly web application for applying p-uniform (

An interesting read for anyone using these methods or interested in applying these methods! The paper will be published in a special issue on Methods and Practices.

Download the preprint here.