Computers Aid Science, But Can’t Comprehend It
Artificial intelligence is reshaping how research gets done, but it shouldn’t reshape what science stands for. That’s the message from Dr Héloïse Stevance, a Schmidt AI in Science Fellow in Oxford University’s Physics Department and a computational astrophysicist building intelligent recommendation systems for sky surveys. Her work—sifting cosmic images to find fleeting stellar explosions—makes a compelling test case for where automation helps and where it can quietly erode rigor.
Recent events underline the stakes. One of the world’s premier AI conferences, NeurIPS, was found to have accepted over 100 submissions with hallucinated citations. The board’s response suggested that if roughly 1.1% of papers had incorrect references, the papers might still stand on their own. For many researchers, that reads like a troubling trade-off between speed and standards—an attitude that risks normalizing soft ethics under the banner of “scientific rigor.”
Stevance argues that we don’t need to choose between AI and principles. We do, however, need to be deliberate about what we automate, because those choices shape the scientific record we hand to the next generation. In data-heavy fields like astronomy, the pressures are obvious: massive volumes, tight deadlines, and not enough human hours.
When the sky changes faster than you can blink
Stevance studies how explosive deaths of distant stars forge new elements—the stuff that eventually makes planets, people, and smartphones. These cosmic eruptions brighten and fade within days to weeks. Capturing them demands rapid detection and rapid follow-up.
Surveys like ATLAS scan the sky night after night, comparing fresh images to reference frames in a colossal, continual game of “spot the difference.” On a dark night the human eye might pick out a few thousand stars, but ATLAS records around a billion bright sources. One night’s haul would take a researcher a year to check by eye. Automation isn’t optional; it’s survival.
But necessity is not a blank check. Whether the tool is a linear model or a large language model, the core question remains: how will this choice influence the legacy and longevity of my findings?
Three principles for delegating to machines
1) “Open” means the data is open too
Beware open-washing. Sharing code without accessible, documented training data is not reproducibility—it’s re-execution. Long-term scientific integrity requires that independent teams can retrain, inspect, and stress-test models. If the data and training pipeline are locked away, the community can’t validate how conclusions were reached. For model builders, that means releasing well-documented datasets alongside code, for example via repositories such as Zenodo, to ensure others can actually reproduce the training, not just the outputs.
2) Use the simplest tool that works
There’s constant pressure to deploy the newest “state-of-the-art” model. Yet starting simple is often the most powerful move. If a straightforward approach meets the scientific requirement, stop there; if not, analyze its failure modes to guide the next step up in complexity.
Simple tools reduce intellectual debt—the intricate, tacit knowledge needed to understand a method’s caveats. In industry, debt erodes velocity; in science, it undermines reproducibility and, ultimately, knowledge. Simplicity also preserves research sovereignty. Relying on third-party AI agents can tether your results to a vendor’s pricing, model versions, or survival. A sudden price hike, a deprecated model that behaves differently, or a shuttered service can jeopardize months or years of publicly funded work.
3) Be skeptical of what you don’t understand
Large language models have lowered the barrier to building complex pipelines. That’s a boon—and a trap. It’s easy to prompt your way to a solution that looks right and stop there. But that’s confirmation bias, not science. Scientific practice demands asking why something works, then actively probing where it breaks. Report both—strengths and failure cases—so others can trust, replicate, and extend the result.
Fear of missing out is real. Will slower, more careful validation leave you behind while others sprint ahead with automated literature reviews and AI-generated drafts? Stevance’s counterweight is purpose: the goal is understanding the natural world, not merely producing papers. AI can help do the science, but it can’t comprehend it for us. If results aren’t reproducible, we drift from astronomy to astrology by another name.
Automation with accountability
Modern research faces two relentless forces: exploding data and limited time. Delegating to machines is not only reasonable—it’s essential. But every delegated decision imprints assumptions into our datasets and models. Those choices won’t vanish; they’ll echo through future analyses, amplifying both insight and error.
That’s why the bar for AI in science must be higher than convenience. “Open” must mean retrainable. “Advanced” must mean necessary. “Working” must mean understood. Upholding these standards doesn’t slow discovery; it safeguards it, ensuring today’s breakthroughs remain tomorrow’s knowledge rather than tomorrow’s retractions.
Computers can accelerate the hunt for cosmic transients and many other phenomena. They can triage, recommend, flag anomalies, and compress timelines. What they can’t do is take responsibility for the integrity of the scientific record or truly grasp the meanings behind the numbers. That task—and that comprehension—still belongs to us.