Meta-Research Center at ICPS Paris

March 7-9, 2019, the International Convention of Psychological Science (ICPS) of the Association for Psychological Science (APS) was held in Paris, France. The Meta-research group Tilburg (co-)organized three sessions at the ICPS. Here a short overview of the three sessions and their presentations, including links to the presentations.

Preregistration: Common Issues and Best Practices (Chair: Marjan Bakker)

Preregistration has been lauded as one of the key solutions to the many issues in the field of psychology (Nosek, Ebersole, DeHaven, & Mellor, 2018). For example, researchers have argued that preregistration tackles the problems of publication bias, reporting bias, and the opportunistic use of researchers degrees of freedom in data analysis (also called questionable research practices or p-hacking). However, skeptics have put forward a broad list of issues concerned with preregistration. For example, they have argued that preregistration stifles researchers’ creativity, is not effective in the case secondary data or qualitative data, and is only intended for confirmatory research. In this symposium we aim to touch upon some of these issues.

Andrea Stoevenbelt, in her talk “Challenges to Preregistering a Direct Replication - Experiences from Conducting an RRR on Stereotype Threat”, described the challenges surrounding the preregistration of direct replication studies from her experiences of conducting a registered replication report of the seminal study by Johns, Schmaders, and Martens (2005) on stereotype threat.

Olmo van den Akker, in this talk “The Do’s and Don'ts of Preregistering Secondary Data Analyses”, presented a tutorial for a template that can be used to preregister secondary data analysis. Preregistering secondary data analysis is different from preregistering primary data analysis because mainly because researchers already have some knowledge about the data (through their own work using the data or through reading other people´s work using the data). Olmo´s take home message from this talk is: "Specify your prior knowledge of the data set from your own previous use the data and from other researcher’s previous use of the data, preferably for each author separately."

In all, this symposium touched upon many of the issues that have been raised about preregistration and hopefully encouraged researchers from a wide range of fields to give preregistration a try.

Issues with Meta-Analysis: Bias, Heterogeneity, Reproducibility (Chair: Jelte Wicherts)

The popularity of meta-analysis has been increasing the last decades, which is reflected by the rapid increase of the relative number of published meta-analyses. One question of meta-research is what we learn from all these meta-analyses; about a certain research topic, systematic biases, meta-analytic outcomes, or quality of coding. All talks in this symposium correspond to these meta-questions on meta-analysis.

Jelte Wicherts, in his talk “Effect Sizes, Power, and Biases in Intelligence Research: A Meta-Meta-Analysis”, presents the results of a meta-meta-analysis to estimate the average effect size, median power, and evidence of bias (publication bias, decline effect, early extremes effect, citation bias) in the field of intelligence research.

Anton Olsson Collentine presented on the “Limited evidence for widespread heterogeneity in psychology”. He examined the heterogeneity of all meta-analyses of ManyLab studies and registered multi-lab replication studies, which both are presumably not affected by publication or other bias. This research is important as many researchers stress the potential effect of moderators when trying to explain the failure of replication studies.

Esther Maassen, in her talk “Reproducibility of Psychological Meta-analyses”, systematically assessed the prevalence of reporting errors and inaccuracy of computations within meta-analyses. She documented whether coding errors affected meta-analytic effect sizes and heterogeneity estimates, as well as how issues related to heterogeneity, outlying primary studies, and signs of publication bias were dealt with.

Meta-analysis: Informative Tools (Chair: Marcel van Assen)

Meta-analysis is a statistical technique that statistically combines effect sizes from independent primary studies on the same topic, and is now seen as the “gold standard” for synthesizing and summarizing the results from multiple primary studies. Main research objectives of a meta-analysis are (i) estimating the average effect, (ii) assessing heterogeneity of true effect size, and if true effect size differs across studies (iii) incorporating moderator variables in the meta-analysis to explain this heterogeneity. Many different tools, visual (e.g., the funnel plot) or purely statistical (e.g., techniques to estimate heterogeneity or adjust for publication bias), have been developed to reach these objectives.

In this symposium, four speakers explain visual and statistical tools helping researchers to make sense of information in the meta-analysis and provide recommendations for applying these tools in practice. The focus is more on application than on the statistical background of the tools. Xinru Li from Leiden University will explain how classification and regression trees (CART) can be used to explain heterogeneity in effect size in a meta-analysis. The current meta-analysis methodology lacks appropriate methods to identify interactions between multiple moderators when no a priori hypotheses have been specified. The proposed meta-CART approach has the advantage that it can deal with many moderators and is able to identify interaction effects between them.

Hilde Augusteijn, in her talk “Posterior Probabilities in Meta-Analysis: An Intuitive Approach of Dealing with Publication Bias”, introduced a new meta-analytical method that makes use of both Bayesian and frequentist statistics. This method evaluates the probability of the true effect size being zero, small, medium or large, and the probability of true heterogeneity being zero, small, medium or large, while correcting for publication bias. The approach, which intuitively provides an evaluation of uncertainty in the estimates of effect size and heterogeneity, is illustrated with real-life examples.

Robbie van Aert, in his talk “P-uniform*: A new meta-analytic method to correct for publication bias”, presented a new method to correct for publication bias in a meta-analysis. In contrast to the vast majority of existing methods to correct for publication bias, the proposed p-uniform* method can also be applied if the true effect size in a meta-analysis is heterogeneous. Moreover, the method enables meta-analysts to estimate and test for the presence of heterogeneity while taking into account publication bias. An easy-to-use web application will be presented for applying p-uniform* and recommendations for assessing the impact of publication bias will be given.

Marcel van Assen, in his talk “The Meta-plot: A Descriptive Tool for Meta-analysis”, explained and illustrate the meta-plot using real-life meta-analyses, in this talk “The meta-plot”. The meta-plot improves on the funnel plot and shows in one figure the overall effect size and its confidence interval, the quality of primary studies with respect to their power to detect small, medium, or larger effects, and evidence of publication bias.

Presentation on Teaching Open Science: Turning Students into Skeptics, not Cynics (Presenter: Michèle Nuijten)

Michèle Nuijten, in her presentation “Teaching Open Science: Turning Students into Skeptics, not Cynic”, focused on strategies to teach undergraduates about replicability and open science. Psychology’s “replication crisis” has led to many methodological changes, including preregistration, larger samples, and increased transparency. Nuijten argued that psychology students should learn these open science practices from the start. They should adopt a skeptical attitude – but not a cynical one. 

Michèle Nuijten was also discussant at two sessions:

  • What can you do with nothing? Informative null results in hard-to-reach populations” (discussant). In hard-to-reach populations, it is especially difficult and time consuming to collect data, resulting in smaller sample sizes and inconclusive results. Therefore it is particularly important to understand what null results can mean. In this symposium, we discussed results from our own experimental data and how meta-analyses and Bayes factors can increase informativeness. 

  • Improving the transparency of your research one step at a time” (chair & discussant). Many solutions have been proposed to increase the quality and replicability of psychological science. All these options can be a bit overwhelming, so in this symposium, we focused on some easy-to-implement, pragmatic strategies and tools, including preprints, Bayesian statistics, and multi-lab collaboration.

Plan S: Are the Concerns Warranted?

Blog by Olmo van den Akker. A Dutch version has been published by ScienceGuide.

IMG_20190131_171308957.jpg

Plan S is the ambitious plan of eleven national funding agencies together with the European Commission (cOAlition S) to make all research funded by these organisations publicly accessible from 2020 onward. Since its announcement on September 4th 2018 the plan’s contents and consequences have been widely debated. When the guidelines for the implementation of the plan were presented at the end of November some aspects were clarified, but it also became apparent that a lot of details are still unclear. Here, I will give my thoughts on four main themes surrounding Plan S: early career researchers, researchers with less financial backing, scholarly societies, and academic freedom.

The consequences of Plan S for early career researchers

Because of the low job security in the early stage of an academic career it is possible that early career researchers will be negatively affected by Plan S. Plan S currently involves 14 national funding agencies (including India that announced their participation on January 12th) and draws support from big private funds like the Wellcome Trust and the Bill & Melinda Gates Foundation. Combined, these funds represent not more than 15% of the available research money in the world.

This relatively small market share could hurt young researchers dependent on Plan S funders as they will not be allowed to publish in some prestigious, but closed access journals. When researchers funded by other agencies can put these publications on their CV they would have an unfair advantage on the academic labour market. Only when Plan S or similar initiatives would cover a critical mass of the world’s research output would the playing field be levelled.

A crucial assumption underlying this reasoning is the continuation of the prestige model of scientific journals. However, Plan S specifically expresses the ambition to change the way researchers are being evaluated. Instead of looking at the number of publications in prestigious journals researchers should be evaluated on the quality of their work. This point has been emphasized in the San Francisco Declaration on Research Assessment (DORA).

DORA has been signed by more than 1,000 research organizations and more than 13,500 individuals worldwide, indicating that the scientific community wants to get rid of classical quality indicators like the impact factor and the h-index in favour of a new system of research assessment. One way to evaluate researchers is to look at the extent to which their work is open and reproducible. Plan S strongly supports open science and could therefore even be beneficial to early career researchers. However, it should be noted that cOAlition S should play a proactive role in this culture change. The fact that so many people signed DORA does not mean that they will act on its principles.

The consequences of Plan S researchers with less financial backing 

It is expected that Plan S will cause many journals that currently have a closed subscription model to transition to an author-pays model where the author pays so-called article processing charges (APCs) to get their work published open access. Many researchers have raised concerns that Plan S would make publishers increase their profits by increasing their APCs. Because researchers are forced to publish open access they are also forced to pay these higher APCs. For researchers with less financial backing (for example from smaller institutions or developing countries) the increased APCs may be unaffordable, which would crowd them out of science. However, there are several counterpoints to this scenario.

First, Plan S involves the condition that journals make their APCs reasonable and transparent. If this condition is met, it is expected that journal APCs go down. This is illustrated by the fact that many open access journals that have no or very low APCs. This was underscored by a white paper of the Max Planck Society that shows that an open access system with APCs comes with significantly lower cost than the current system. To attain this scenario, it is important that cOAlition S monitors that journal APCs are indeed reasonable and transparent. Commercial publishers have a lot of market power and will undoubtedly try to artificially increase their APCs. cOAlition S has already announced that they will develop a database like the Directory of Open Access Journals, in which researchers can find journals that comply by the demands set out in Plan S. Hopefully, the necessity for journals to be included in that database will make sure that they set affordable APCs.

Second, representatives of cOAlition S have already clarified that they will instate a fund that can help researchers pay due APCs. This fund will be available for funded researchers as well as non-funded researchers that cannot reasonably be expected to pay APCs. The way this APC fund will be financed is as of yet unclear, but it is clear that individual researchers do not need to come up with the costs of open access themselves.

The consequences of Plan S for scholarly societies

Like regular journals, journals from scholarly societies will have to move from a subscription model to an author-pays model. Representatives of scholarly societies fear that this will be the end of them. Societies would face high investments to make the open access transition. For example, to be Plan S compliant, journals need to make their articles fully machine-readable by transforming them into a JATS XML format. In addition, they need to create an Application Programming Interface (API). Developing a digital infrastructure like this is costly and can be problematic given that societies lose their subscription fees from January 1st 2020.

Therefore, it is essential that cOAlition S plays a proactive role and tries to facilitate the open access transition for society journals on a case-by-case basis. A starting point for cOAlition S could be the results of a study by Wellcome Trust that will investigate how scholarly societies can transition to a Plan S compliant model as efficiently as possible. One possibility is that cOAlition S (partly) subsidizes the transition costs of journals and guides them in developing the required digital infrastructure.

The consequences of Plan S for academic freedom 

One common concern of Plan S is that it restricts the freedom of researchers to determine what and how they do research, and how they disseminate their research results. This academic freedom is guaranteed by governments and academic institutions with the aim of insulating researchers from censorship and other negative consequences of their work. In this way, researchers can focus on their research without having to worry about any outside influence. When Plan S is implemented, researchers can no longer publish in paywalled journals. This would hamper researcher’s freedom to disseminate their research in the way they see fit.


However, one can raise doubts about the extent to which researchers currently do have the freedom to choose where and how to publish their work as researchers’ hands are generally tied by demands from scientific journals. They must abide by strict word limits and specific layout standards, and usually have to hand over their copyright to the commercial publisher. Moreover, to move up in academia, they are almost forced to publish in prestigious journals. Therefore, appealing to academic freedom to criticize Plan S is unconvincing, especially given that Plan S does not place any restrictions on research contents and on the methods researchers employ.

A more ideological point against the academic freedom argument is that academic freedom is part of an unofficial reciprocal arrangement between researchers and society. Researchers receive funding and freedom from society, but in return they should incorporate the interests of society into their decision-making. Publishing in a prestigious but closed journal does not fit with this reciprocal arrangement. Currently, many researchers have access to closed journals because university libraries pay a subscription fee to the publishers of those journals. However, not all researchers can take advantage of these subscriptions because their organisation cannot afford them or because the negotiations about subscription fees were unsuccessful.

Because of the limited access to research results scientific progress slows down. This is problematic in itself, but can have major consequences for research about climate change or contagious diseases. In addition, the subscription fees demanded by publishers is disproportionally high. In 2018, The Netherlands paid more than 12 million euros to one of the main scientific publishers, Elsevier. A big chunk of that money ended up as profit for Elsevier and would not by reinvested into science. Obviously, this practice does not fit with the reciprocal arrangement between researchers and society either.

Conclusion

After their call for feedback cOAlition S was flooded by a wave of comments and ideas about Plan S, of which the mains ones are outlined above. Even alternative plans were proposed with names like Plan U and Plan T, which were often even more radical than Plan S. Although such initiatives are very valuable to the scientific community it is hard to create a new infrastructure for scholarly communication without a large budget and without the support of a critical mass. cOAlition S does have a large budget and is getting increasing support from the scientific community. That’s why I think that Plan S is currently the most efficient way forward, especially because the potential issues with the plan are relatively straightforward to prevent. I have faith that cOAlition S will take the responsibility that follows from intiating this ambitious plan. Let us place our trust as a research community and back cOAlition S toward a more open science.

Open Science: The Way Forward

Blog by Michèle Nuijten for Tilburg University on the occasion of World Digital Preservation Day.

We have all seen headlines about scientific findings that sounded too good to be true. Think about the headline “a glass of red wine is the equivalent of an hour in the gym”. A headline like this may make you skeptical right away, and rightly so. In this particular case, it turned out that several journalists got carried away, and the researchers never made such claims.

However, sometimes the exaggeration of an effect already takes place in the scientific article itself. Indeed, increasing evidence shows that many published results might be overestimated, or even false.

This excess of overestimated results is probably caused by a complex interaction of different factors, but there are several leads of what important problems might be.

The first problem is publication bias: studies that “find something” have a larger probability to be published than studies that don’t find anything. You can imagine that if we only present the success stories, the overall picture gets distorted and overly optimistic.

This publication bias may lead to the second problem: flexible data analysis. Scientists can start showing strategic behavior to increase their chances to publish their findings: “if I leave out this participant, or if I try a different analysis, maybe my data will show me the result I was looking for.” This can even happen completely unconsciously: in hindsight, all these decisions may seem completely justified.

The third problem that can distort scientific results are statistical errors. Unfortunately, it seems that statistical errors in publications are widespread (see, e.g., the prevalence of errors in psychology).

The fact that we make mistakes and have human biases, doesn’t make us bad scientists. However, it does mean that we have to come up with ways to avoid or detect these mistakes, and that we need to protect ourselves from our own biases.

I believe that the best way of doing that is through open science.

One of the most straightforward examples of open science is sharing data. If raw data are available, you can see exactly what the conclusions in an article are based on. This way, any errors or questionable analytical choices can be corrected or discussed. Maybe the data can even be used to answer new research questions.

Sharing data can seem as simple as posting them on your own personal website, but this has proven to be rather unstable: URLs die, people move institutions, or they might leave academia altogether. A much better way to share data is via certified data repositories. That way, your data are safely stored for the long run.

Open data is only one example of open science. Another option is to openly preregister research plans before you actually start doing the research. You can also make materials and analysis code open, publish open access, or write public peer reviews.

Of course, it is not always possible to make everything open in every research project. Practical issues such as privacy can restrict how open you can be. However, you might be surprised by how many other things you can make open, even if you can’t share your data.

I would like to encourage you to think about ways to make your own research more open. Maybe you can preregister your plans, maybe you can publish open access, maybe you can share your data. No matter how small the change is, opening things up will make our science better, one step at a time.

This blog has been posted on the website of Tilburg University: https://www.tilburguniversity.edu/current/news/blog-michele-nuijten-open-science/

statcheck – A Spellchecker for Statistics

Guest blog for LSE Impact Blog by Michèle Nuijten

If you’re a non-native English speaker (like me), but you often have to write in English (like me), you will probably agree that the spellchecker is an invaluable tool. And even when you do speak English fluently, I’m sure that you’ve used the spellchecker to filter out any typos or other mistakes.

When you’re writing a scientific paper, there are many more things that can go wrong than just spelling. One thing that is particularly error-prone is the reporting of statistical findings.

Statistical errors in published papers

Unfortunately, we have plenty of reasons to assume that copying the results from a statistical program into a manuscript doesn’t always go well. Published papers often contain impossible meanscoefficients that don’t add up, or ratios that don’t match their confidence intervals.

In psychology, my field, we found a high prevalence of inconsistencies in reported statistical test results (although these problems are by no means unique to psychology). Most conclusions in psychology are based on “null hypothesis significance testing” (NHST) and look roughly like this:

“The experimental group scored significantly higher than the control group, t(58) = 1.91, p < .05”.

This is a t-test with 58 degrees of freedom, a test statistic of 1.91, and a p-value that is smaller than .05. A p-value smaller than .05 is usually considered “statistically significant”.

This example is, in fact, inconsistent. If I recalculate the p-value based on the reported degrees of freedom and the test statistic, I would get p = .06, which is not statistically significant anymore. In psychology, we found that roughly half of papers contain at least one inconsistent p-value, and in one in eight papers this may have influenced the statistical conclusion.

Even though most inconsistencies we found were small and likely to be the result of innocent copy-paste mistakes, they can substantively distort conclusions. Errors in papers make results unreliable, because they become “irreproducible”: if other researchers would perform the same analyses on the same data, a different conclusion would roll out. This, of course, affects the level of trust we place in these results.

statcheck

The inconsistencies I’m talking about are obvious. Obvious, in the sense you don’t need raw data to see that certain reported numbers don’t match. The fact that these inconsistencies do arise in the literature means that peer review did not filter them out. I think it could be useful to have an automated procedure to flag inconsistent numbers. Basically, we need a spellchecker for stats. To that end, we developed statcheck.

statcheck is a free, open-source tool that automatically extracts reported statistical results from papers and recalculates  p -values. It is available as an R package and as a user-friendly web app at  http://statcheck.io .

statcheck is a free, open-source tool that automatically extracts reported statistical results from papers and recalculates p-values. It is available as an R package and as a user-friendly web app at http://statcheck.io.

statcheck roughly works as follows. First, it converts articles to plain-text files. Next, it searches the text for statistical results. This is possible in psychology, because of the very strict reporting style (APA); stats are always reported in the same way. When statcheck detects a statistical result, it uses the reported degrees of freedom and test statistic to recompute the p-value. Finally, it compares the reported p-value with the recalculated one, to see if they match. If not, the result is flagged as an inconsistency. If the reported p-value is significant and the recalculated one is not, or vice versa, it is flagged as a gross inconsistency. More details about how statcheck works can be found in the manual.

statcheck’s accuracy

It is important that we know how accurate statcheck is in flagging inconsistencies. We don’t want statcheck to mark large numbers of correct results as inconsistent, and, conversely, we also don’t want statcheck to wrongly classify results as correct when they are actually inconsistent. We investigated statcheck’s accuracy by running it on a set of articles for which inconsistencies were also manually coded.

When we compared statcheck’s results with the manual codings, we found two main things. First, statcheck detects roughly 60% of all reported stats. It missed the statistics that were not reported completely according to APA style. Second, statcheck did a very good job in flagging the detected statistics as inconsistencies and gross inconsistencies. We found an overall accuracy of 96.2% to 99.9%, depending on the specific settings. (There has been some debate about this accuracy analysis. A summary of this discussion can be found here.)

Even though statcheck seems to perform well, its classifications are not 100% accurate. But, to be fair, I doubt whether any automated algorithm could achieve this (yet). And again, the comparison with the spellchecker still holds; mine keeps telling me I misspelled my own name, and that it should be “Michelle” (it really shouldn’t be).

One major advantage of using statcheck (or any algorithm) for statistical checks is its efficiency. It will take only seconds to flag potential problems in a paper, rather than going through all the reported stats and checking them manually.

An increasing number of researchers seem convinced of statcheck’s merits; the R package has been downloaded more than 8,000 times, while the web app has been visited over 23,000 times. Additionally, two flagship psychology journals have started to use statcheck as a standard part of their peer review process. Testimonies on Twitter illustrate the ease and speed with which papers can be checked before they’re submitted:

Just statcheck-ed my first co-authored manuscript. On my phone while brushing my teeth. Great stuff @MicheleNuijten @SachaEpskamp @seanrife!

— Anne Scheel (@annemscheel) October 22, 2016

Automate the error-checking process

More of these “quick and dirty spellchecks” for stats are being developed (e.g. GRIM to spot inconsistencies in means; or p-checker to analyse the consistency and other properties of p-value), and an increasing number of papers and projects make use of automated scans to retrieve statistics from large numbers of papers (e.g. hereherehere, and here).

In an era where scientists are pressed for time, automated tools such as statcheck can be very helpful. As an author you can make sure you didn’t mistype your key results, and as a peer reviewer you can quickly check if there are obvious problems in the statistics of a paper. Reporting statistics can just as easily go wrong as grammar and spelling; so when you’re typing up a research paper, why not also check your stats?

More information about statcheck can be found at: http://statcheck.io

Journal Policies that Encourage Data Sharing Prove Extremely Effective

Guest blog for LSE Impact Blog by Michèle Nuijten

For science to work well we should move towards opening it up. That means sharing research plans, materials, code, and raw data. If everything is openly shared, all steps in a study can be checked, replicated, or extended. By sharing everything we let the facts speak for themselves, and that’s what science is all about.

Unfortunately, in my own field of psychology, raw data are notoriously hard to come by. Statements in papers such as “all data are available upon request” are often void, and data may get lost if a researcher retires, switches university, or even buys a new computer. We need to somehow incentivise researchers to archive their data online in a stable repository. But how?

Currently it is not in a scientist’s interests to put effort into making data and materials available. Scientists are evaluated based on how much they publish and how often they’re cited. If they don’t receive credit for sharing all details of their work, but instead run the risk colleagues will criticise their choices (or worse: find errors!), why would they do it?

So now for the good news: incentivising researchers to share their data may be a lot easier than it seems. It could be enough for journals to simply ask for it! In our recent preprint, we found journal policies that encourage data sharing are extremely effective. Journals that require data sharing showed a steep increase in the percentage of articles with open data from the moment these policies came into effect.

In our study we looked at five journals. First, we compared two journals in decision making research: Judgment and Decision Making (JDM), which started to require data sharing from 2011; and the Journal of Behavioral Decision Making (JBDM), which does not require data sharing. Figure 1 shows a rapidly increasing percentage of articles in JDM sharing data (up to 100%!), whereas nothing happens in JBDM. The same pattern holds for psychology articles from open access publisher PLOS (with its data-sharing policy taking effect in 2014) and the open access journal Frontiers in Psychology (FP; no such data policy).

Similarly, the journal Psychological Science (PS) also contained increasing numbers of articles with open data after it introduced its Open Practice Badges in 2014. You can earn a badge for sharing data, sharing materials, or preregistering your study. A badge is basically a sticker for good behaviour on your paper. Although this may sound a little kindergarten, believe me: you don’t want to be the one without a sticker!

Figure 1: Percentage of articles per journal to have open data. A solid circle indicates no open-data policy; an open circle indicates an open-data policy. Source: Nuijten, M. B., Borghuis, J., Veldkamp, C. L. S., Alvarez, L. D., van Assen, M. A. L. M., &amp; Wicherts, J. M. (2017) “ Journal Data Sharing Policies and Statistical Reporting Inconsistencies in Psychology ”, PsyArXiv Preprints. This work is licensed under a  CC0 1.0 &nbsp;Universal license.

Figure 1: Percentage of articles per journal to have open data. A solid circle indicates no open-data policy; an open circle indicates an open-data policy. Source: Nuijten, M. B., Borghuis, J., Veldkamp, C. L. S., Alvarez, L. D., van Assen, M. A. L. M., & Wicherts, J. M. (2017) “Journal Data Sharing Policies and Statistical Reporting Inconsistencies in Psychology”, PsyArXiv Preprints. This work is licensed under a CC0 1.0 Universal license.

The increase in articles with available data is encouraging and has important consequences. With raw data we are able to explore different hypotheses from the same dataset, or combine information of similar studies in an Individual Participant Data (IPD) meta-analysis. We could also use the data to check if conclusions are robust to changes in the analyses.

The availability of research data would increase the quality of science as a whole. With raw data we have the possibility to find and correct mistakes. On top of that, the probability of making a mistake is likely to be lower once you have gone to the effort of archiving your data in such a way that another person can understand it. The process of archiving data for future users could also provide a barrier to taking advantage of the flexibility in data analysis that could lead to false positive results. Enforcing data sharing might even deter fraud.

Of course, data-sharing policy is not a “one-size-fits-all” solution. In some fields of psychological research (e.g. sexology or psychopathology) data can be very personal and sensitive, and can’t simply be posted online. Luckily there are increasingly sophisticated techniques to anonymise data, and often materials and analysis plans can still be shared to increase transparency.

It is also important to acknowledge the time and effort it took to collect the original data. One way to do this is to set a fixed period of time during which only the original researchers have access to the data. That way they get a head start in publishing studies based on the data. When this period is over and others can also use the data, the original authors should, of course, be properly acknowledged through citations, or even, in some cases, co-authorship.

There are many different ways to encourage openness in science. My hope is that more journals will soon follow and start implementing an open-data policy. But aside from merely requiring data sharing, journals should also check if the data is actually available. To illustrate the importance of this, our study found one third of PLOS articles claiming to have open data, actually did not deliver (for similar numbers, see the data by Chris Chambers).

And many (including myself) would even like to go one step further. Datasets should not only be available, they should also be stored in such a way that others can use them (see the FAIR Data Principles). A good way to influence the usability of open data might be the use of the Open Practice Badges. It turned out that in PS, the badges not only increased the availability of data, but also the relevance, usability, and completeness of the data. Another way of ensuring data quality, but also recognition for your work, is to publish your data in a special data journal, such as the Journal of Open Psychology Data.

Even though data sharing in psychology is not yet the status quo, several journals are already helping our field take a step in the right direction. As a matter of fact, the American Psychological Association (APA) has recently announced it will give its editors the option of awarding badges. It is very encouraging that journal policies on data sharing, or even an intervention as simple as a badge to reward good practice can cause such a surge in open data. Therefore, I hereby encourage all editors in all fields to start requiring data. And while we’re at it, why not ask for research plans, materials, and analysis code too?

I would like to thank Marcel van Assen for his helpful comments while drafting this blog.

This blog post is based on the author’s co-written article, “Journal Data Sharing Policies and Statistical Reporting Inconsistencies in Psychology”, available at http://doi.org/10.1525/collabra.102

The Replication Paradox

Guest blog for The Replication Network by Michèle Nuijten

Lately, there has been a lot of attention for the excess of false positive and exaggerated findings in the published scientific literature. In many different fields there are reports of an impossibly high rate of statistically significant findings, and studies of meta-analyses in various fields have shown overwhelming evidence for overestimated effect sizes.

The suggested solution for this excess of false postive findings and exaggerated effect size estimates in the literature is replication. The idea is that if we just keep replicating published studies, the truth will come to light eventually.

This intuition also showed in a small survey I conducted among psychology students, social scientists, and quantitative psychologists. I offered them different hypothetical combinations of large and small published studies that were identical except for the sample size – they could be considered replications of each other. I asked them how they would evaluate this information if their goal was to obtain the most accurate estimate of a certain effect. In almost all of the situations I offered, the answer was almost unanimously: combine the information of both studies.

This makes a lot of sense: the more information the better, right? Unfortunately this is not necessarily the case.

The problem is that the respondents forgot to take into account the influence of publication bias: statistically significant results have a higher probability of being published than non-significant results. And only publishing significant effects leads to overestimated effect sizes in the literature.

But wasn’t this exactly the reason to take replication studies into account? To solve this problem and obtain more accurate effect sizes?

Unfortunately, there is evidence from multi-study papers and meta-analyses that replication studies suffer from the same publication bias as original studies (see below for references). This means that bothtypes of studies in the literature contain overestimated effect sizes.

The implication of this is that combining the results of an original study with those of a replication study could actually worsen the effect size estimate. This works as follows.

Bias in published effect size estimates depends on two factors: publication bias and power (the probability that you will reject the null hypothesis, given that it is false). Studies with low power (usually due to a small sample size) contain a lot of noise, and the effect size estimate will be all over the place, ranging from severe underestimations to severe overestimations.

This in itself is not necessarily a problem; if you would take the average of all these estimates (e.g., in a meta-analysis) you would end up with an accurate estimate of the effect. However, if because of publication bias only the significant studies are published, only the severe overestimations of the effect will end up in the literature. If you would calculate an average effect size based on these estimates, you will end up with an overestimation.

Studies with high power do not have this problem. Their effect size estimates are much more precise: they will be centered more closely on the true effect size. Even when there is publication bias, and only the significant (maybe slightly overestimated) effects are published, the distortion would not be as large as with underpowered, noisier studies.

Now consider again a replication scenario such as the one mentioned above. In the literature you come across a large original study and a smaller replication study. Assuming that both studies are affected by publication bias, the original study will probably have a somewhat overestimated effect size. However, since the replication study is smaller and has lower power, it will contain an effect size that is even more overestimated. Combining the information of these two studies then basically comes down to adding bias to the effect size estimate of the original study. In this scenario it would render a more accurate estimation of the effect if you would only evaluate the original study, and ignored the replication study.

In short: even though a replication will increase precision of the effect size estimate (a smaller confidence interval around the effect size estimate), it will add bias if the sample size is smaller than the original study, but only if there is publication bias and the power is not high enough.

There are two main solutions to the problem of overestimated effect sizes.

The first solution would be to eliminate publication bias; if there is no selective publishing of significant effects, the whole “replication paradox” would disappear. One way to eliminate publication bias is to preregister your research plan and hypotheses before collecting the data. Some journals will even review this preregistration, and can give you an “in principle acceptance” – completely independent of the results. In this case, studies with significant and non-significant findings have an equal probability of being published, and published effect sizes will not be systematically overestimated.  Another way is for journals to commit to publishing replication results independent of whether the results are significant.  Indeed, this is the stated replication policy of some journals already.

The second solution is to only evaluate (and perform) studies with high power. If a study has high power, the effect size estimate will be estimated more precisely and less affected by publication bias. Roughly speaking: if you discard all studies with low power, your effect size estimate will be more accurate.

A good example of an initiative that implements both solutions is the recently published Reproducibility Project, in which 100 psychological effects were replicated in studies that were preregistered and high powered. Initiatives such as this one eliminates systematic bias in the literature and advances the scientific system immensely.

However, before preregistered, highly powered replications are the new standard, researchers that want to play it safe should change their intuition from “the more information, the higher the accuracy,” to “the more power, the higher the accuracy.”

This blog is based on the paper “The replication paradox: Combining studies can decrease the accuracy of effect size estimate” (2015) by Nuijten, van Assen, Veldkamp, Wicherts (2015). Review of General Psychology, 19 (2), 172-182.

Literature on How Replications Suffer From Publication Bias

  • Francis, G. (2012). Publication bias and the failure of replication in experimental psychology. Psychonomic Bulletin & Review, 19(6), 975-991.
  • Ferguson, C. J., & Brannick, M. T. (2012). Publication bias in psychological science: Prevalence, methods for identifying and controlling, and implications for the use of meta-analyses. Psychological Methods, 17, 120-128.

Data sharing not only helps facilitate the process of psychology research, it is also a reflection of rigour

Originally Published on LSE Impact Blog

Originally Published on LSE Impact Blog

Guest blog for LSE Impact Blog by Jelte Wicherts

Data sharing in scientific psychology has not been particularly successful and it is high time we change that situation. Before I explain how we hope to get rid of the secrecy surrounding research data in my field of psychology, let me explain how I got here.

 

Ten years ago, I was working on a PhD thesis for which I wanted to submit old and new IQ data from different cohorts to novel psychometric techniques. These techniques would enable us to better understand the remarkable gain in average IQ that has been documented in most western countries over the course of the 20thcentury. These new analyses had the potential to shed light on why it is that more recent cohorts of test-takers (say, folks born between 1975-1985) scored so much higher on IQ tests than older cohorts (say, baby boomers). In search of useful data from the millions of yearly IQ test administrations, I started emailing psychologists in academia and the test-publishing world. Although my colleagues acknowledged that indeed there must be a lot of data around, most of their data were not in any useful format or could no longer be found.

Raven Matrix – IQ Test Image credit: Life of Riley [CC-BY-SA-3.0]

Raven Matrix – IQ Test Image credit: Life of Riley [CC-BY-SA-3.0]

After a persistent search I ended up getting five useful data sets that had been lying in a nearly-destroyed file-cabinet at some library in Belgium, were saved on old floppy disks, were reported as a data table in published articles, or were in a data repository (because data collection had been financed by the Dutch Ministry of Education under the assumption that these data would perhaps be valuable for future use). Our analyses of the available data showed that the gain in average IQ was in part an artefact of testing. So a handful of psychologists back in the 1960s kept their data, which decades later helped show that their rebellious generation was not simply less intelligent than generations X  (born 1960-1980) or Y (born 1980-2000). The moral of the story is that often we do not know about all potential uses of the data that we as researchers collect. Keeping the data and sharing them can be scientifically valuable.

 

Psychologists used to be quite bad at storing and sharing their research data. In 2005, we contacted 141 corresponding authors of papers that had been published in top-ranked psychology journals. In our study, we found that 73% of corresponding authors of papers published 18 months earlier were unable or unwilling to share data upon request. They did so despite the fact that they had signed a form stipulating that they would share data for verification purposes. In a follow-up study, we found that researchers who failed to share data upon request reported more statistical errors and report less convincing results than researchers who did share data. In other words, sharing data is a reflection of rigor. We in psychology have learned a hard lesson when it comes to researchers being secretive about their data. Secrecy enables up all sorts of problems including biases in reporting of results, honest errors, and even fraud.

So it is high time that we as psychologists become more open with our research data. For this reason, an international group of researchers from different subfields in psychology and I have established an open access journal, published by Ubiquity Press, that rewards the sharing of psychological research data. The journal is called Journal of Open Psychology Data and in it we publish so-called data papers. Data papers are relatively short, peer-reviewed papers that describe an interesting and potentially useful data set that has been shared with the scientific community in an established data repository.

We aim to publish three types of data papers. First, a data paper in the Journal of Open Psychology Data may describe the data from research that has been published in traditional journals. For instance, our first data paper reports raw data from a study of cohort differences in personality factors over the period 1982-2007, which was previously published in the Journal of Personality and Social Psychology. Second, we seek data papers from unpublished work that may of interest for future work because they can be submitted to alternative analyses or can be enriched later. Third, we publish papers that report data from replications of earlier findings in the psychological literature. Such replication efforts are often hard to publish in traditional journals, but we consider them to be important for progress. So the Journal of Open Psychology Data helps psychologists to find interesting data sets that can be used for educational purposes (learning of statistical analyses), data sets that can be included in meta-analyses, or data sets that can be submitted to secondary analyses. More information can be found in the editorial I wrote for the first issue.

In order to remain open access, the Journal of Open Psychology Data charges authors a publication fee. But our article processing charge is currently only 25 pounds or 30 euros.  So if you are a psychologist and have data lying around that will probably vanish as soon as your new computer arrives, don’t hesitate. Put your data in a safe place in a data repository, download the paper template, describe how the data were collected (and/or where they were previously reported), explain why they are interesting, and submit your data paper to the Journal of Open Psychology Data. We will quickly review your data paper, determine whether the data are interesting and useful, and check the documentation and accessibility of the data. If all is well, you can add a data paper to your resume and let the scientific community know that you have shared your interesting data. Who knows how your data may be used in the future.

This post is part of a wider collection on Open Access Perspectives in the Humanities and Social Sciences (#HSSOA) and is cross-posted at SAGE Connection. We will be featuring new posts from the collection each day leading up to the Open Access Futures in the Humanities and Social Sciences conference on the 24th October, with a full electronic version to be made openly available then.