When Cancer R&D Trips Over Its Own Data

A cancer drug can survive years of chemistry, tissue slides, animal studies, and enough meetings to qualify as psychological warfare, then still fall apart because the right clue was sitting in the wrong database. That failure mode is the backdrop for Richard Goodwin and colleagues’ new Cancer Discovery article on using AI to connect the messy, sprawling evidence trail inside oncology R&D instead of letting it die in separate folders with heroic file names like final_v7_reallyfinal.xlsx (Goodwin et al., 2026).

So here is the thing: precision oncology already runs on data. Genomics, pathology images, biomarker assays, clinical records, trial outcomes, safety signals - the whole buffet. The problem is that having more data does not magically produce more understanding. Sometimes it just gives you a larger digital attic.

Goodwin and coauthors make a straightforward argument. If drug companies want to build better cancer medicines faster, they need integrated, large-scale, multidomain datasets plus AI tools that are actually matched to the job. Not “AI” in the corporate-slide-deck sense. Actual fit-for-purpose models, good infrastructure, and data that do not look like they were collected by five different civilizations.

The Onion Has Layers, and So Does the Tumor

Precision oncology means tailoring treatment to the biology of a person’s tumor, not just the organ it showed up in. The National Cancer Institute defines it as using specific information about a person’s tumor to guide diagnosis, treatment, or prognosis. AI becomes useful here because cancer biology is multimodal. One patient can generate DNA data, RNA data, pathology images, radiology, lab tests, and doctor notes. A human expert can reason across that stack, but not at industrial scale and definitely not before lunch.

Let me unpack that. Modern AI systems, especially transformer-based and multimodal models, are good at finding patterns across giant mixed-format datasets. Wikipedia’s plain-English summary of transformers gets at the core idea: attention mechanisms help models weigh which pieces of information matter relative to each other. In oncology, that matters because the important signal might be hiding in the relationship between a mutation, a histology pattern, and a treatment response three systems away.

That is why recent papers are zooming toward foundation models and multimodal systems for cancer research. Truhn and colleagues argue that large language models and multimodal foundation models could support biomarker discovery, workflow automation, and decision support in precision oncology, while warning that validation and regulation are non-negotiable (Truhn et al., 2024). In other words, cool model, now prove it works on real patients.

AI Is Not Magic. It Is Expensive Pattern Hunting.

This is where it gets interesting. The Goodwin paper is less about one shiny model and more about rewiring the whole oncology R&D pipeline. That includes target discovery, translational science, clinical development, and patient stratification. The idea is not “replace scientists with chatbot energy.” The idea is to help scientists ask better questions and spot useful patterns earlier.

That broader view lines up with a 2025 Nature Medicine review that describes AI as increasingly relevant across target identification, molecule design, preclinical work, clinical trials, and post-market surveillance (Zhang et al., 2025). It also matches what is happening in pathology, where foundation models trained on huge slide collections are starting to show strong performance across cancer detection and biomarker-related tasks. A 2024 Nature Medicine paper on the Virchow model reported pan-cancer detection performance across common and rare cancers, which is the kind of result that makes pathologists interested and statisticians immediately ask for external validation, as they should (Zimmermann et al., 2024).

And yes, the real world is already inching in this direction. The FDA launched its Oncology Artificial Intelligence Program in 2023, explicitly responding to growing use of AI in cancer drug development, and in 2024 Bayer announced a precision-oncology collaboration with Aignostics to use multimodal AI for target discovery and clinical development. That does not mean the problem is solved. It means the industry has stopped treating this as science fiction.

The Catch, Because There Is Always a Catch

If you tried sketching this ecosystem in a notebook or a tool like mapb2.io, it would look like a detective board with genomics, pathology, and trial data all connected by red string and mild panic.

The hard part is not building a model demo. The hard part is data quality, interoperability, bias, reproducibility, and clinical usefulness. A recent systematic review of biomarker-driven adaptive phase II precision-oncology trials shows just how complex trial design gets once you try to match therapies to smaller molecularly defined patient groups (Ha et al., 2024). If your training data are skewed, your labels are noisy, or your model only works on one institution’s scanner setup, congratulations - you have built a very sophisticated trap.

That is why Goodwin and colleagues land in a sensible place. AI in oncology R&D works best when it sits on top of strong data infrastructure and domain expertise. The overworked GPUs can crunch the patterns, but they still need oncologists, pathologists, statisticians, and trial designers to tell them which patterns matter and which ones are just math wearing a fake mustache.

References

Goodwin RJA, Barry ST, Weatherall J, Platz SJ, Reis-Filho JS. Enabling AI to Drive Innovation and Precision across Oncology R&D. Cancer Discovery. 2026. DOI: 10.1158/2159-8290.CD-26-0271

Truhn D, Eckardt JN, Ferber D, et al. Large language models and multimodal foundation models for precision oncology. npj Precision Oncology. 2024;8:72. DOI: 10.1038/s41698-024-00573-2

Zhang K, Yang X, Wang Y, et al. Artificial intelligence in drug development. Nature Medicine. 2025;31(1):45-59. DOI: 10.1038/s41591-024-03434-4

Zimmermann E, Sali R, Simon C, et al. A foundation model for clinical-grade computational pathology and rare cancers detection. Nature Medicine. 2024. DOI: 10.1038/s41591-024-03141-0

Ha H, Lee HY, Kim JH, et al. Precision Oncology Clinical Trials: A Systematic Review of Phase II Clinical Trials with Biomarker-Driven, Adaptive Design. Cancer Research and Treatment. 2024;56(4):991-1013. DOI: 10.4143/crt.2024.128. PMCID: PMC11491240

FDA Oncology Center of Excellence. OCE Oncology Artificial Intelligence Program. Updated 2025. Available at: https://www.fda.gov/about-fda/oncology-center-excellence/oce-oncology-artificial-intelligence-program

Bayer. Bayer and Aignostics to collaborate on next generation precision oncology. March 14, 2024. Available at: https://www.bayer.com/en/us/news-stories/next-generation-precision-oncology

Disclaimer: This blog post is a simplified summary of published research for educational purposes. The accompanying illustration is artistic and does not depict actual model architectures, data, or experimental results. Always refer to the original paper for technical details.

AIb2.io - AI Research Decoded

When Cancer R&D Trips Over Its Own Data

The Onion Has Layers, and So Does the Tumor

AI Is Not Magic. It Is Expensive Pattern Hunting.

The Catch, Because There Is Always a Catch

References