Plot twist: the same basic family of technology that helps your phone guess “on my way” after you type “omw” is now being asked to suggest biomedical hypotheses and experiments, which is rather like finding out your toaster has been quietly auditing organic chemistry at night.
Olivier Elemento’s Nature News & Views article, “AI systems devise hypotheses and ways to test them,” looks at two new Nature papers that push AI from “helpful research assistant” toward “junior lab partner with a suspiciously full calendar” [1]. The two systems are Google DeepMind’s Co-Scientist and FutureHouse’s Robin. Both use teams of AI agents, each with a job: read literature, propose ideas, criticize those ideas, rank them, design tests, or analyze results [2,3].
Now, back in my day, if you wanted a hypothesis, you read papers until your coffee went cold and your eyes developed their own weather system. These systems try to compress that slow shuffle. They do not replace the scientist. They do, however, volunteer to do the kind of literature-sifting that makes even brave postdocs stare into the middle distance.
A Little Committee of Machines
A multi-agent AI system is basically a committee, but without the conference room muffins. One agent hunts through papers. Another suggests possible mechanisms. Another argues with it. Another scores the ideas. In Co-Scientist, the agents generate, critique, and refine hypotheses using a kind of tournament process, where stronger ideas survive more rounds of machine skepticism [2].
That matters because large language models are good at pattern-matching across giant piles of text, but science is not just “say a plausible thing with confidence.” Your uncle can already do that at Thanksgiving, and nobody is giving him a pipette. Science needs ideas that can be tested, broken, revised, and, occasionally, dragged outside and buried with dignity.
Co-Scientist was tested in biomedical settings, including drug repurposing for acute myeloid leukemia. The system suggested candidates and combinations, some of which researchers then tested in cell experiments [2]. Robin went further into the loop: it generated hypotheses, proposed experiments, interpreted data, and then suggested follow-up work for dry age-related macular degeneration. It identified ripasudil and KL001 as promising candidates in vitro, then analyzed RNA-seq follow-up data and pointed to ABCA1 as a possible target [3].
That is the interesting part. Not “the AI had an idea,” but “the AI had an idea, told humans how to test it, looked at the results, and proposed what to try next.” That is closer to the old scientific rhythm: wonder, test, squint, repeat.
The Bench Still Gets the Final Word
Here is where we put another log on the fire and calm ourselves down. These systems did not magically create approved medicines. They helped generate leads. The hard parts remain: reproducibility, toxicology, animal studies, clinical trials, and all the grim paperwork that keeps civilization from becoming a supplement aisle.
Nature’s own coverage stresses that these tools are meant to assist researchers, not replace them [4]. C&EN noted the same caution: the systems look promising, but novelty, toxicity, and clinical usefulness still need hard proof [5]. Biology moves slower than software because cells are tiny wet divas. You can prompt an AI in seconds; you still have to wait for cells to grow, reagents to arrive, and experiments to behave like they read the protocol.
There is also the “monoculture” worry. If many scientists use similar AI systems trained on similar literature, will everyone start asking the same tidy questions? A recent Nature World View argued that AI’s benefit will depend partly on whether science rewards originality, not just speed [6]. Quite right. A faster horse is handy, but not if every horse gallops toward the same fence.
Why This Feels Different
Earlier AI-for-science systems often handled one slice of the workflow: predict a protein shape, rank molecules, classify cells, or summarize papers. Those are useful tools. But Co-Scientist and Robin aim at the loop itself. That loop is the hearth of science: observe, hypothesize, test, analyze, revise. The scientific method, in its old boots and spectacles, has always depended on testable claims [7].
Other recent work points the same way. Coscientist used GPT-4 with tools and lab automation for chemistry experiments [8]. The Virtual Lab used AI agents to design SARS-CoV-2 nanobodies that were experimentally tested [9]. Survey papers now describe agentic AI for scientific discovery as a fast-growing field, with open problems around evaluation, reliability, and safe deployment [10].
For readers trying to make sense of these reasoning chains, this is where visual maps help. A tool like mapb2.io is not doing the biology for you, bless its sensible little heart, but mapping “paper evidence -> hypothesis -> experiment -> result -> next question” is exactly the kind of thing that keeps a complicated research story from turning into spaghetti.
The Takeaway by the Fire
The charm of this work is not that machines are becoming lone geniuses. Science has never really worked that way, no matter how many portraits of solemn men with beards suggest otherwise. The charm is that AI systems might become tireless idea-sorters: reading more than any one person can, proposing testable paths, and helping scientists spend less time wandering the library stacks with a candle.
But the lab bench still outranks the chatbot. If these tools keep producing hypotheses that survive real experiments, they could make early drug discovery less wasteful and help smaller teams explore more ideas. If not, they will still be very fancy autocomplete with a clipboard.
Either way, the next time your phone finishes your sentence, give it a wary nod. Its larger cousins are now trying to finish the sentence that begins, “Maybe this disease works like...”
References
-
Elemento, O. “AI systems devise hypotheses and ways to test them.” Nature (2026). DOI: 10.1038/d41586-026-01873-2. PMID: 42380272
-
Gottweis, J. et al. “Accelerating scientific discovery with Co-Scientist.” Nature (2026). DOI: 10.1038/s41586-026-10644-y
-
Ghareeb, A. E. et al. “A multi-agent system for automating scientific discovery.” Nature (2026). DOI: 10.1038/s41586-026-10652-y. arXiv: 2505.13400
-
Ledford, H. “Teams of AI agents boost speed of research.” Nature (2026). DOI: 10.1038/d41586-026-01596-4
-
Oldach, L. & Pratap, A. “AI companies introduce new agent-based tools for scientific discovery.” Chemical & Engineering News (2026).
-
Zhang, X. “Will AI spark a scientific renaissance - or a diffuse monoculture?” Nature (2026). DOI: 10.1038/d41586-026-01954-2
-
“Scientific method.” Wikipedia. https://en.wikipedia.org/wiki/Scientific_method
-
Boiko, D. A., MacKnight, R., Kline, B. & Gomes, G. “Autonomous chemical research with large language models.” Nature 624, 570-578 (2023). DOI: 10.1038/s41586-023-06792-0
-
Swanson, K. et al. “The Virtual Lab of AI agents designs new SARS-CoV-2 nanobodies.” Nature 646, 716-723 (2025). DOI: 10.1038/s41586-025-09442-9
-
Gridach, M. et al. “Agentic AI for Scientific Discovery: A Survey of Progress, Challenges, and Future Directions.” arXiv (2025). arXiv: 2503.08979
Disclaimer: This blog post is a simplified summary of published research for educational purposes. The accompanying illustration is artistic and does not depict actual model architectures, data, or experimental results. Always refer to the original paper for technical details.