AIb2.io - AI Research Decoded

Can AI Read Your Mammogram Better Than a Risk Calculator? It's Complicated.

A neural network walks into a radiology clinic and says, "I can predict breast cancer risk better than your fancy questionnaires." The doctors look intrigued. "But," the AI adds sheepishly, "I might be a little... dramatic about it."

That's essentially what researchers at Mayo Clinic discovered when they put MIRAI - an AI model that analyzes mammogram images directly - head-to-head against three established clinical risk prediction tools. The results? A masterclass in why "better" in medicine is rarely a simple word.

The Contenders

In one corner, we have the clinical risk models: Gail, Tyrer-Cuzick v8, and the Breast Cancer Surveillance Consortium (BCSC) v3. These are the seasoned veterans - they crunch numbers about your age, family history, breast density, genetics, and other factors to estimate your five-year breast cancer risk. They've been validated, refined, and trusted for years.

Can AI Read Your Mammogram Better Than a Risk Calculator? It's Complicated.
Can AI Read Your Mammogram Better Than a Risk Calculator? It's Complicated.

In the other corner: MIRAI, an AI system that looks directly at your mammogram images and makes predictions. No questionnaire. No manual density measurements. Just pixels in, risk score out. It's the kind of end-to-end deep learning approach that has been shaking up medical imaging since convolutional neural networks proved they could actually see things humans miss [1].

What They Actually Tested

The Mayo team pulled mammograms from 12,308 women in their biobank, then waited to see who developed breast cancer within five years. (Spoiler: 250 women did, 176 with invasive cancer.) Then they asked each model: could you have seen this coming?

Two metrics matter here. First, discriminatory accuracy - can the model correctly rank women by risk? If Woman A gets cancer and Woman B doesn't, did the model give Woman A a higher risk score? This is measured by the C-index, where 0.5 is random guessing and 1.0 is perfect.

Second, calibration - when the model says "you have a 3% risk," do roughly 3 out of 100 similar women actually get cancer? This is the observed-to-expected ratio (O/E), where 1.0 means the model is well-calibrated.

The Plot Twist

MIRAI won the discrimination game handily. Its C-index of 0.71 for overall breast cancer beat the clinical models (which ranged from 0.59 to 0.68). When it comes to ranking risk, the AI genuinely sees something in those images that our current clinical variables don't fully capture.

But here's where it gets interesting: MIRAI overestimated risk pretty much across the board.

For about half of women - those in the lowest risk categories - MIRAI was crying wolf, predicting more cancers than actually occurred. For invasive breast cancer specifically (the type everyone cares most about), MIRAI overestimated risk at every level, with an O/E ratio of 0.68. That means for every 100 cancers MIRAI predicted, only about 68 actually happened.

The clinical models, meanwhile? Their invasive cancer predictions were boringly accurate, with O/E ratios between 0.86 and 0.99.

Why This Actually Matters

Imagine telling a woman she has elevated breast cancer risk when she doesn't. She might undergo unnecessary additional screening, biopsies, or preventive interventions. She carries psychological burden. Healthcare resources get stretched. This isn't hypothetical hand-wringing - it's the real-world consequence of poor calibration [2].

The researchers were refreshingly blunt in their conclusion: AI-based risk models "should consider discriminatory accuracy and calibration for invasive cancer before implementation." Translation: being good at ranking isn't enough if your actual numbers are wrong.

This finding echoes broader concerns in the machine learning community about models that optimize for one metric while neglecting others. A model can achieve impressive AUC scores while being dangerously overconfident or underconfident in its actual predictions [3].

The Bigger Picture

MIRAI isn't a failure - far from it. The fact that an AI can look at a mammogram and extract risk information that traditional clinical variables miss is genuinely exciting. The model was trained on over 200,000 mammograms and has shown strong performance across multiple populations [4].

The path forward likely involves combining AI's pattern-recognition abilities with the calibration strengths of traditional models. Some researchers are already exploring hybrid approaches that use AI-extracted features as inputs to clinical risk calculators [5]. Others are working on recalibration techniques to adjust AI predictions to match real-world outcomes.

For now, the message is clear: that mammography AI is impressive at seeing risk patterns, but it needs to learn some humility about the numbers it spits out. The doctors should keep their questionnaires handy - at least until the neural network learns to count more accurately.

References

  1. McKinney, S. M., et al. (2020). International evaluation of an AI system for breast cancer screening. Nature, 577, 89-94. DOI: 10.1038/s41586-019-1799-6

  2. Kerlikowske, K., et al. (2022). Combining quantitative and qualitative breast density measures to assess breast cancer risk. Breast Cancer Research, 24, 52. DOI: 10.1186/s13058-022-01545-6

  3. Van Calster, B., et al. (2019). Calibration: the Achilles heel of predictive analytics. BMC Medicine, 17, 230. DOI: 10.1186/s12916-019-1466-7

  4. Yala, A., et al. (2021). Toward robust mammography-based models for breast cancer risk. Science Translational Medicine, 13(578). DOI: 10.1126/scitranslmed.aba4373

  5. Kaul, M., et al. (2025). Performance of clinical breast cancer risk prediction models versus a mammography-based artificial intelligence risk model. Journal of the National Cancer Institute. DOI: 10.1093/jnci/djag083

Disclaimer: This blog post is a simplified summary of published research for educational purposes. The accompanying illustration is artistic and does not depict actual model architectures, data, or experimental results. Always refer to the original paper for technical details.