When Good Tools Fail: The Missing Step in Scientific Reform

This blogpost has been written by Anouk Bouma. Anouk her PhD project focuses on investigating the trustworthiness of simulation studies from a meta-scientific perspective, supervised by Marcel van Assen, Robbie van Aert & Lieke Voncken.

Prologue
If you're a researcher, like me, you've probably had an experience that goes something like this: say you want to conduct a power analysis—perhaps for a mediation model—so you head to the internet in search of a suitable tool to help you. You dig through pages of search results, and eventually, you find something promising: a Shiny app that claims to do exactly what you need. Perfect! You click on it, eager to get started.

But then… the enthusiasm quickly fades.

You try to input the required information, only to be met with cryptic labels and unclear fields. You wanted to calculate the required sample size, but the box labeled “N” seems to be asking you to enter a sample size instead. Wait…what?

Still, you don't give up easily. So, you try out different inputs to see what they do. Eventually, some output appears. Actually, a lot of output appears. But instead of clarity, you're left staring at a wall of unexplained numbers, unsure what any of it means. Feeling frustrated, you close the tab and return to your search.

Sure, maybe you haven’t had the experience of being frustrated over Shiny apps for power analyses. Maybe for you, frustration came from trying to navigate a confusing preregistration template, the OSF website, or an R package with cryptic, underexplained arguments. Whatever the specific case, the underlying issue is the same.

This isn’t just a story about power analyses or my personal struggle with conducting them. It’s a story about how researchers, especially metascientists, are increasingly stepping into a role they haven’t been trained for: that of product designers. In this blog I will argue that metascientists could benefit from adopting a marketing perspective once in a while—especially when developing tools, guidelines, and reforms aimed at improving scientific practices.

A little bit of backstory
First, let me tell you a little bit about myself. I joined the Tilburg University Meta-Research Center as a PhD candidate in September 2024, almost ten months ago. My background, like many in our group, is primarily in psychology (having completed both a bachelor’s and master’s in this direction). But in contrast to my fellow lab mates, I also obtained a bachelor’s in marketing management (at a university of applied sciences, for those familiar with the Dutch education system).

For a long time, I didn’t think the marketing part of my education was particularly meaningful or had much impact on my way of thinking—especially in the context of my goal to become a metascientist.

It turns out I was wrong.

I’ve come to see how surprisingly relevant that background can be, especially when thinking about one of metascience’s key challenges: not just to understand how science works but actually improving it. Which brings us to the bigger picture.

The two goals of metascience
Metascience, at its core, has two goals: to study science, and to improve it (Ioannidis et al., 2015). When we're in "improvement" mode, we are often developing 'products' (like a preregistration template, a data-sharing guideline, or a statistical tool) designed for specific 'users' (e.g., researchers, journal editors, or institutions). And for our 'products' to succeed they need to be used as intended by the users they are developed for.

This mindset, focusing first and foremost on the needs and perspective of your user, was a core lesson in my marketing education. You don’t just build something and assume it solves the user’s problem. You talk to your target audience, observe how they work, and check whether your solution actually fits their needs and constraints.

As scientists, we mostly focus on the correctness of our tools and if they are based on sound empirical evidence. But when developing a product, our responsibility doesn’t end with a solution that’s technically correct or theoretically justified. It also includes ensuring that our solutions are received, understood, and applied effectively in real-world settings. If the tools or reforms we create are confusing, time-consuming, or poorly aligned with researchers’ workflows, they risk being ignored, rejected, or misused. That doesn’t just limit our impact; it directly undermines our goal of improving scientific practice.

An example from the metascientific simulation literature
Let me share an example from the topic of my own PhD project: a metascientific investigation of simulation studies. There’s a growing body of work documenting concerns about simulation-based method evaluations. These metascientists raise important concerns, like how questionable research practices in simulation studies can hamper the validity of the conclusions (Pawel et al., 2024). Various metascientists propose solutions like preregistration templates (Siepe et al., 2024), guidelines (e.g., Kelter, 2024), and improved reporting standards (Morris et al., 2019). In doing so, they seem to tick off both of metascience’s core boxes: they study how simulation research is currently done, and they offer ways to improve it.

These papers make valuable contributions to evaluate and strengthen the robustness of simulation studies. At the same time, this line of research could benefit from systematic usability testing of proposed solutions, as well as more attention to the needs and perspectives of the intended users. This might mean focusing on understanding the challenges researchers face when conducting simulation studies, the tools they currently rely on, and the barriers that may prevent them from adopting new practices.

We should be careful not to design a key without ever having seen the lock.

Let’s think about our users
So, what would it look like if we develop metascientific solutions from a user-centered perspective? Let’s go back to the example of power analyses for mediation models.

1. Define your goal
Be specific. What change do you want to see? The actual end-goal of a project is often not to develop a working tool, but to solve a certain problem. Clearly defining your goal will help you evaluate if that goal is met in later stages.

Example: “I want more applied researchers to conduct power analyses when they run mediation models.”

2. Identify your target audience
Who are you designing your solution for? What field are they in? What type of research do they conduct?

It’s tempting to assume that your audience is “all researchers” or even “researchers like me,” but this often leads to tools that are too broad or mismatched. A clear definition of your audience helps guide design decisions and determines who to involve in the next steps.

Example: “Researchers in applied clinical psychological science who regularly conduct mediation analyses.”

3. Get to know your audience
Once you know who your target audience is, get to know them. What tools do they use? What’s their level of expertise? What’s stopping them from doing what you want them to do? Lack of time? Confusion? Skepticism? Overwhelm?

Conduct user interviews, survey potential users, or observe how they work. Don't assume. Ask.

If your audience struggles with statistical jargon, then improving the readability of your solution’s instructions might matter more than adding new features.

If they prefer SPSS, then creating an R package, even a brilliant one, will be a mismatch. A Shiny app with a simple interface, or a decision tree embedded in a tutorial, might be more effective.

Example: “My target audience primarily uses SPSS and is familiar with mediation analysis but not necessarily with programming. They generally know how to decide on the inputs required to conduct a power analysis, but are unsure about conducting the analysis itself.”

4. Design, test, and iterate
Build a prototype. Then test it! Don’t just test it with your collaborators, but with actual members of your target audience.

Sit next to them as they use your solution (e.g., using the cognitive interviewing technique (Balza et al., 2022)). Watch where they get stuck. Do they understand what each input field requires, what buttons to click, and where to find information? Can they interpret the output correctly? Do they need external help to use it?

You might also need to produce supplementary materials—short explainer videos, example use cases, and FAQ sheets to help new users get started.

Use that feedback to revise. Don’t stop until most users can use your tool successfully on their own.

5. Plan for dissemination
Even the best solution won’t help anyone if nobody knows it exists. Think carefully about how and where to promote it.

Which journals are most relevant to your audience? What conferences do they attend? Are they active on social media? Do they go to workshops where you can introduce your solution?

And when you produce output to disseminate, make sure it fits the language and level of technicality that your audience expects.

Some wonderful examples
The good news is that some projects have already used a user-centered approach in developing their solutions. One strong example is Spitzer et al. (2024), who ran two consecutive studies to evaluate the usability of their preregistration template and researchers’ intention to use it. This allowed them to iteratively refine the template based on real user feedback. Another example comes from Haven et al. (2020), a member of our lab group, who conducted a Delphi study to explore what a preregistration template for qualitative research should look like by consulting qualitative researchers themselves.

Conclusion
In the end, this isn’t just about power analysis apps, preregistration templates, or simulation guidelines. It’s about how we, as metascientists, approach the challenge of improving science. If we want our tools and reforms to actually make a difference, we need to stop thinking like toolmakers and start thinking like product designers. That means being curious about the people we’re trying to help, understanding their workflows, testing our assumptions, and iterating based on their feedback, not just our theory.

My hope is that one day, when a researcher sits down to calculate the power for a mediation model, the process will be straightforward. Not confusing or frustrating. That they’ll have access to a tool that’s clear, intuitive, and designed with them in mind. Because when we remove unnecessary hurdles, we make space for better research.

References

Balza, J. S., Cusatis, R. N., McDonnell, S. M., Basir, M. A., & Flynn, K. E. (2022). Effective questionnaire design: How to use cognitive interviews to refine questionnaire items. Journal of Neonatal-Perinatal Medicine, 15(2), 345–349. https://doi.org/10.3233/NPM-210848

Haven, T. L., Errington, T. M., Gleditsch, K. S., van Grootel, L., Jacobs, A. M., Kern, F. G., Piñeiro, R., Rosenblatt, F., & Mokkink, L. B. (2020). Preregistering Qualitative Research: A Delphi Study. International Journal of Qualitative Methods, 19, 1609406920976417. https://doi.org/10.1177/1609406920976417

Ioannidis, J. P. A., Fanelli, D., Dunne, D. D., & Goodman, S. N. (2015). Meta-research: Evaluation and Improvement of Research Methods and Practices. PLoS Biology, 13(10), e1002264. https://doi.org/10.1371/journal.pbio.1002264

Kelter, R. (2024). The Bayesian simulation study (BASIS) framework for simulation studies in statistical and methodological research. Biometrical Journal, 66(1), 2200095. https://doi.org/10.1002/bimj.202200095

Morris, T. P., White, I. R., & Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods. Statistics in Medicine, 38(11), 2074–2102. https://doi.org/10.1002/sim.8086

Pawel, S., Kook, L., & Reeve, K. (2024). Pitfalls and potentials in simulation studies: Questionable research practices in comparative simulation studies allow for spurious claims of superiority of any method. Biometrical Journal, 66(1), 2200091. https://doi.org/10.1002/bimj.202200091

Siepe, B. S., Bartoš, F., Morris, T. P., Boulesteix, A.-L., Heck, D. W., & Pawel, S. (2024). Simulation studies for methodological research in psychology: A standardized template for planning, preregistration, and reporting. Psychological Methods. Advance online publication. https://doi.org/10.1037/met0000695

Spitzer, L., Bosnjak, M., & Mueller, S. (2024). Testing the Usability of the Psychological Research Preregistration-Quantitative (PRP-QUANT) Template. Meta-Psychology, 8. https://doi.org/10.15626/MP.2023.4039