AIb2.io - AI Research Decoded

ChatGPT Took a Cadaver Anatomy Exam and Bombed It Spectacularly

If you ever wondered whether ChatGPT could pass medical school, researchers at Jagiellonian University in Krakow just gave us a definitive answer for the anatomy portion: absolutely not. They showed ChatGPT-4o photographs of labeled cadaveric specimens - the same kind used in practical anatomy exams - and asked it to name what it was looking at. The overall accuracy? A devastating 22.26%.

To put that in perspective, most medical schools require around 60% to pass. ChatGPT didn't just fail - it failed so badly that random guessing on a multiple-choice version might have done better.

The Setup

The researchers photographed 265 anatomical structures on cadaveric specimens from their anatomy department, labeled with markers in the exact same way they'd be presented during a real practical exam. Each image was fed to ChatGPT-4o with a standardized prompt, and the AI got up to three attempts per structure, with feedback after each wrong answer.

ChatGPT Took a Cadaver Anatomy Exam and Bombed It Spectacularly
ChatGPT Took a Cadaver Anatomy Exam and Bombed It Spectacularly

This is generous. Medical students typically get one shot. ChatGPT got three chances with hints, and still only identified about one in five structures correctly. First-attempt accuracy was even lower - just 33 correct identifications out of 265.

Where It Struggled (and the One Thing It Did Okay)

The model showed a bizarre performance distribution. It was halfway decent at identifying bones - osteological structures hit 64.71% accuracy within three attempts. Bones are distinctive, well-documented in textbooks, and have clear visual features. Fair enough.

Everything else was a disaster. Isolated thoracic organs? 8.82% accuracy. The model couldn't reliably tell a lung from a liver when they were removed from the body and sitting on a table. It frequently misidentified entire anatomical regions - calling a structure in the arm something that belongs in the abdomen, that kind of thing.

And here's the part that should concern anyone excited about AI in medicine: ChatGPT occasionally generated anatomical terms that don't exist. It wasn't just wrong - it was confidently making up structures. Imagine a medical student pointing at a tendon and calling it the "lateral supracondylar ligamentous band." That's not a real thing. You just invented anatomy.

Why This Is Harder Than You Think

Text-based anatomy questions are a completely different beast from image-based ones. When you ask ChatGPT "what muscle flexes the forearm?", it can pull from millions of textbook passages. When you show it a photograph of an actual biceps brachii on an actual dead person, with fascia and connective tissue and the general messiness of real human anatomy, it has almost nothing to work with.

The model was trained primarily on text, diagrams, and clean medical illustrations. Real cadaveric anatomy is wet, discolored, and looks nothing like the tidy diagrams in Netter's Atlas. It's the difference between recognizing a dog in a stock photo versus identifying a specific breed at a muddy dog park from 50 feet away.

What This Means for Medical Education

Some medical schools have been exploring AI as a study tool for anatomy, and this study is a cold shower on those ambitions - at least for practical identification tasks. ChatGPT can still be useful for explaining anatomical relationships in text, generating quiz questions, or helping students review theoretical concepts. But if you're studying for a practical exam where you need to identify structures on real tissue, the AI is currently worse than useless - it might actively teach you wrong names for things.

The silver lining is that this is a specific, measurable benchmark. Future models can be tested against the same task, and we'll be able to track improvement over time. If GPT-6 or a specialized medical vision model can crack 80% on cadaveric identification, that would actually mean something.

The Bottom Line

AI is getting remarkably good at many medical tasks - summarizing research, suggesting differential diagnoses from text descriptions, even reading some types of medical images when properly trained. But identifying structures on cadaveric photographs requires a kind of visual-spatial reasoning that current general-purpose models simply don't have.

For now, anatomy lab still belongs to the humans. The robots will have to wait their turn with the bone saw.

If you're a medical student building anatomy study materials and need to annotate images or organize reference PDFs, tools like pdfb2.io can help you mark up and organize your study resources without the AI making up muscle names. - ## References

  • Melczewski P, Chowaniec M, Larysz W, et al. Accuracy of ChatGPT-4o in Identifying Anatomical Structures on Cadaveric Images: A Practical Anatomy Examination Study. Clinical Anatomy. 2026. DOI: 10.1002/ca.70115 | PMID: 41872690