Last year, the FDA approved eight new cancer drugs in the first half alone - and that was a slow six months. For oncologists trying to match the right targeted therapy to the right genetic mutation in the right patient, the knowledge landscape isn't just expanding. It's detonating.
So a team at Dana-Farber Cancer Institute did something clever: they gave an AI a cheat sheet.
The Problem With Googling Cancer Treatments (Even If You're a Doctor)
Here's the uncomfortable math. There are now over 100 precision cancer medicines approved by the FDA, each one tied to specific molecular biomarkers - genetic mutations, protein expressions, fusion events - that determine whether a drug will actually work for a given patient. The information connecting mutation X to therapy Y lives scattered across clinical trials, FDA labels, NCCN guidelines, and a small mountain of journal articles that grows taller every week.
Oncologists, who are busy doing things like treating cancer patients, have described keeping up with this firehose of approvals as "far from seamless." Which is doctor-speak for "impossible."
Large language models seemed like an obvious solution. Ask GPT about a BRAF V600E mutation and it'll give you a confident, articulate answer. The problem? That answer might be outdated, incomplete, or just plain wrong. A recent meta-analysis found that roughly one in five LLM responses to oncology questions contained inaccurate information (Cheng et al., JCO 2025). When you're deciding on chemotherapy protocols, a 20% error rate isn't quirky - it's terrifying.
Enter the Almanac: Teaching an AI to Read the Right Textbook
Hyeji Jun, Eliezer Van Allen, and colleagues at Dana-Farber and the Broad Institute tackled this by pairing GPT-4o with the Molecular Oncology Almanac (MOAlmanac), a curated database of 152 molecular features linked to 1,009 therapeutic assertions (Jun et al., Cancer Cell 2026). Instead of letting the LLM riff on whatever it absorbed during training, they used retrieval-augmented generation (RAG) to force-feed it verified, current precision oncology data before it answered each question.
Think of it this way: an LLM without RAG is like a medical student taking a closed-book exam three years after graduation. An LLM with RAG is the same student, except now they get to bring the textbook - and somebody highlighted the relevant pages.
The results were dramatic. The LLM-only approach scored 62-75% accuracy on biomarker-driven treatment recommendations. The RAG-LLM with a structured database hit 94-95% on synthetic queries and 93% on real-world clinical questions collected from practicing oncologists. That 20-30 percentage point jump is the difference between "interesting research toy" and "something a doctor might actually trust."
What Makes This More Than Just Another AI Paper
A few things stand out. First, the researchers didn't just test on neat, pre-formatted questions. They collected messy, real-world queries from actual oncologists - the kind of questions that come up during tumor boards and patient consultations. The system held up.
Second, they systematically explored different prompting and retrieval strategies, essentially building a playbook for how to squeeze the best performance out of a RAG-LLM in clinical settings. Not all retrieval approaches are equal: structured databases outperformed unstructured ones (79-91% accuracy vs. 94-95%), which tells you that how you organize the knowledge matters as much as having it.
Third - and Van Allen was explicit about this - the goal isn't to replace oncologists. "We wanted to see if we could create an assistant that can be helpful to an oncologist without taking away the autonomy, decision making, and relationship between the patient and provider," he told Dana-Farber's news team. This is an AI copilot, not an AI pilot.
The Bigger Picture (and the Fine Print)
This work joins a growing body of evidence that RAG is the secret sauce for making LLMs actually useful in specialized medicine. Similar approaches have shown promise in clinical trial matching for head and neck cancer (Bhayana et al., PMC 2024) and guideline retrieval across oncology specialties (Sorin et al., NEJM AI 2024).
But let's not get ahead of ourselves. This is still a research tool, not a deployed clinical product. The team is planning clinical trials before any real-world implementation. The hallucination problem hasn't been eliminated, just dramatically reduced. And the system depends on MOAlmanac being kept current - which requires ongoing human curation. AI might be doing the heavy lifting, but humans still need to stock the shelves.
Still, 93% accuracy on real-world oncology queries is nothing to shrug at. For a field where the knowledge base doubles faster than anyone can read it, giving doctors an AI assistant with an up-to-date, curated cheat sheet might be exactly what precision medicine needs to actually be, well, precise.
References:
-
Jun, H., Tanaka, Y., Johri, S., et al. (2026). A context-augmented large language model for accurate precision oncology medicine recommendations. Cancer Cell. DOI: 10.1016/j.ccell.2025.12.017
-
Reardon, B., Moore, N.D., Moore, N.S., et al. (2021). Integrating molecular profiles into clinical frameworks through the Molecular Oncology Almanac to prospectively guide precision oncology. Nature Cancer, 2, 1149-1161. DOI: 10.1038/s43018-021-00243-3
-
Bhayana, R., et al. (2024). Performance of Retrieval-Augmented Large Language Models to Recommend Head and Neck Cancer Clinical Trials. JAMA Network Open. PMCID: PMC11522650
-
Sorin, V., et al. (2024). GPT-4 for Information Retrieval and Comparison of Medical Oncology Guidelines. NEJM AI. DOI: 10.1056/AIcs2300235
-
Cheng, K., et al. (2025). Navigating artificial intelligence accuracy: A meta-analysis of hallucination incidence in LLM responses to oncology questions. Journal of Clinical Oncology, 43(16_suppl), e13686. DOI: 10.1200/JCO.2025.43.16_suppl.e13686
Disclaimer: This blog post is a simplified summary of published research for educational purposes. The accompanying illustration is artistic and does not depict actual model architectures, data, or experimental results. Always refer to the original paper for technical details.