Radiologists have spent years training their eyes to spot the subtle shadows of pneumonia, the telltale crack of a hairline fracture, the worrying mass that shouldn't be there. What they haven't trained for? Spotting when the entire X-ray is a fabrication conjured by ChatGPT in under a minute.
A new study published in Radiology just dropped a bombshell: deepfake X-rays are now good enough to fool trained radiologists and even the AI systems designed to detect them. We're not talking about obvious fakes with extra ribs or backwards hearts. These synthetic images are so convincing that when seventeen radiologists from twelve centers across six countries sat down to evaluate them, their best guesses were only slightly better than a coin flip - at least until someone whispered "hey, some of these might be fake."
The Numbers That Should Make You Nervous
Here's where it gets uncomfortable. When radiologists didn't know they were looking for fakes, only 41% spontaneously noticed something was off. After being explicitly told the dataset contained AI-generated images, accuracy climbed to 75%. Individual performance ranged wildly from 58% to 92%, and here's the kicker: years of experience made zero difference. A radiologist with four decades under their belt wasn't any better at spotting fakes than someone fresh out of residency.
The AI detectors didn't fare much better. Four major language models - GPT-4o, GPT-5, Gemini 2.5 Pro, and Llama 4 Maverick - scored between 57% and 85% accuracy. Even GPT-4o, which was literally the model used to create the deepfakes, couldn't reliably identify its own handiwork. It's like asking a forger to authenticate paintings and watching them shrug uncertainly at their own work.
The Telltale Signs of Synthetic Anatomy
Lead researcher Dr. Mickael Tordjman from Mount Sinai pointed out that deepfake medical images often suffer from what you might call "uncanny valley" syndrome. "Bones are overly smooth, spines unnaturally straight, lungs overly symmetrical, blood vessel patterns excessively uniform," he explained. Fractures appear "unusually clean and consistent, often limited to one side of the bone."
The problem is that these imperfections are subtle - the kind of thing you'd only notice if you were actively hunting for them. And radiologists reading through dozens of images daily aren't exactly in hunt-for-AI-artifacts mode. They're looking for pathology, not pixel-level perfection that betrays a machine's involvement.
Why This Matters Beyond Academic Curiosity
The security implications here aren't hypothetical thought experiments. Dr. Tordjman warned about "high-stakes vulnerability for fraudulent litigation if a fabricated fracture could be indistinguishable from a real one." Insurance fraud with synthetic injuries. Malpractice claims backed by evidence that never existed. Workers' comp scams with phantom broken bones.
Then there's the cybersecurity nightmare scenario: hackers infiltrating hospital networks and injecting fake images into patient records. Imagine a synthetic tumor appearing in someone's scan, triggering unnecessary biopsies and surgeries. Or the reverse - a real cancer digitally erased, leaving a patient untreated. The fundamental reliability of digital medical records suddenly looks a lot more fragile.
For AI diagnostic tools already deployed in hospitals, the threat is equally serious. These systems learn from the images they see. Feed them enough synthetic pathologies, and their accuracy degrades. It's like training a guard dog with fake scents.
The Tech Behind the Deception
The study tested images from two sources: ChatGPT (yes, the chatbot you use for recipe ideas can apparently also manufacture convincing skeletal fractures) and RoentGen, an open-source diffusion model developed at Stanford specifically for chest X-ray generation. Both produced images realistic enough to pass expert scrutiny.
RoentGen represents a growing field of domain-specific medical image generators - tools originally designed for legitimate purposes like augmenting training datasets and helping AI models learn from synthetic data when real patient images are scarce or privacy-restricted. The same capabilities that make these tools useful for research make them concerning when deployed by bad actors.
What Comes Next
"We are potentially only seeing the tip of the iceberg," Dr. Tordjman noted. "The logical next step in this evolution is AI-generation of synthetic 3D images, such as CT and MRI." If two-dimensional X-rays are this convincing, imagine what happens when generators tackle the volumetric complexity of cross-sectional imaging.
The researchers recommend implementing digital watermarks and cryptographic signatures that get embedded at the moment of image capture - essentially a tamper-evident seal linking every scan to the technologist who took it and the machine that generated it. It's not a perfect solution, but it's a start.
The study's authors also released a curated deepfake dataset with interactive quizzes, turning the threat into a training opportunity. Musculoskeletal radiologists already showed significantly better detection rates than other specialists - suggesting that with the right education, humans can learn to spot the tells.
Until then, the next time you see an X-ray, you might want to ask: is this image showing what's actually inside someone, or is it just a really convincing guess about what bones should look like?
References
-
Tordjman M, et al. (2026). Detection of AI-Generated Radiographs by Radiologists and Multimodal Large Language Models. Radiology. DOI: 10.1148/radiol.252094
-
Chambon P, et al. (2022). RoentGen: Vision-Language Foundation Model for Chest X-ray Generation. arXiv: 2211.12737
-
Stanford AIMI. RoentGen Project. https://stanfordmimi.github.io/RoentGen/
-
RSNA News. (2026). Deepfake X-Rays Fool Radiologists and AI. https://www.rsna.org/news/2026/march/chatgpt-generated-radiographs
-
Ahart J. (2026). These medical X-rays are all deepfakes - and they fool even radiologists. Nature. DOI: 10.1038/d41586-026-00892-3
Disclaimer: This blog post is a simplified summary of published research for educational purposes. The accompanying illustration is artistic and does not depict actual model architectures, data, or experimental results. Always refer to the original paper for technical details.