Trust is a Manual Override

AI doesn’t taste the outcome of what it produces. Humans do.

Apr 16, 2025

Harnessing AI to accelerate research won’t work without inspection, introspection, and integrity. AI won’t solve our problems on its own—it might even make things worse by scaling up the spread of unverifiable content. In this way, it risks turning parascience into something that seems legitimate.

Large, linked datasets fuel AI tools, and that’s exactly why trusted data matters. Without it, we can't expect trustworthy outputs.

Zoomed in image of two large lemons on a table with their leaves attached. — Exquisite meals come from trusted ingredients. *Pictured: Lemons of Southern Italy.*

Trusted Tools and Blind Trust

All datasets—whether highly curated or broadly inclusive—have flaws. Incomplete, inconsistent, or inaccurate data adds cognitive load at a time when researchers are already stretched thin. The demand for trustworthy tools is growing—from researchers to policymakers–the tools to facilitate research should ease current burdens, not make it heavier.

AI tools can look precise, cranking out clean, detailed outputs, but if trained on messy or incorrect data, they simply scale bad decisions. Precision without trust is a dangerous illusion.

Systematic Reviews on Shaky Foundations

I recently read a case study about an AI-assisted tool for systematic reviews (SRs)—an essential, but often gamed, part of scientific research.

First, what are systematic reviews? “Systematic reviews, syntheses of existing scientific evidence that address a focused question in an unbiased manner and using explicit methods, have gained momentum as an effective solution. Systematic reviews play an important role in uncovering problems in preclinical research, informing best practice guidelines, reducing research waste, promoting reproducibility, and guiding translational research (Ineichen et al, 2024).”

Second, SRs are hard to do right. Inclusion/exclusion criteria must be rigorous, but studies often don’t fit neatly into categories. A paper may be labeled one way but contain data more appropriate to another classification. These ambiguous criteria can introduce bias or inconsistency across included studies. Search strategies can be tricky - too narrow and you miss important work, too broad and you are inundated with noise. Here are some resources to guide one in SRs in animal models and preclinical studies. These are not light instructions, highlighting the complexities of good SR research.

Last, and the point of this blog, another crucial challenge is that you don’t know the study quality in which a tool (AI or not) has been presented to you. The scholarly corpus of papers is being corrected, but still has too much junk. If a person wants to make meta decisions through systematic reviews, we must trust the literature from which the reviews were built.

The Case of a Software Systematic Review

Elicit just released this case study on how to decrease the time to complete a systematic review using their software. (Note that it is not the only company moving in this direction; it is just one whose case study drew my attention.) The case states they evaluated nearly 500 papers and could ask 40 questions from the studies. Honestly, the software layout looks nice, and they seem to understand the complexities of SRs. And that is an impressive amount of papers to comb through in a short and methodical way.

But I wanted to know how trusted the papers were that made up their systematic review. Because I couldn’t find the complete list of articles reviewed, I followed up on one highlighted paper in the case study using three methods:

Author Check - a tool to look at authors, their publication and grant history, and networks (Disclaimer: This software is owned by my employer, Digital Science, and I was the originator of the tool)
PubPeer - a public portal where experts can provide post-publication article reviews
Expert sleuth in forensic scientometrics (who asked to remain unnamed)

The conclusions:

Author Check highlighted one author in the paper with a retracted publication due to image duplication.
Pubpeer had no comments.
Sleuth conclusion: “Okay [paper], not great. Short timeline, questionable authorship addition, and a protocol that originally stated no intent to publish—yet the paper was published. Sometimes you can do a SR to get a good overview of the field.”

None of these conclusions means the paper should be ignored entirely—but it should trigger deeper scrutiny. And that's precisely the point: automation doesn’t mean trusted information.

AI Isn’t a Lie Detector

Even with smart tools and big data, we still need humans—especially those trained in forensic scientometrics—to assess research quality and integrity. We can’t blindly trust software to make decisions for us. My assumption is that these tools can build trust in the research into their systems, but they are not there yet.

The take-home message? AI can help, but only when grounded in curated, transparent, and scrutinized data. Trust in science and tools built around it starts with knowing what we’re feeding into the machine. Because poor information in can be disinformation out. Let’s not ruin a good meal with bad ingredients. Trust needs to be baked in from the start.