That is the job Alon, Shoval, and Levkovich take on in this 2026 systematic review, and the answer is not especially comforting. They looked across 36 empirical studies of AI-generated images used in medical teaching, assessment, and patient education and found two recurring structural problems: bias in who gets shown, and fidelity problems in what gets shown [1]. In builder terms, the paint looks fresh, but some of the load-bearing walls are crooked.
A slick rendering is not a finished building
Text-to-image systems feel magical because they turn a prompt into a polished picture in seconds. Under the hood, they are basically blueprint interpreters tied to giant pattern libraries, often using diffusion-style generation to sculpt noise into images that look plausible. Plausible is doing a lot of overtime there.
In the review, 80.6% of included studies evaluated DALL-E-based tools. Seventy-five percent reported significant demographic skew. When studies checked race, 66.7% found bias; when they checked gender, 58.3% did. The default clinician image was often white and male [1]. That is not a harmless aesthetic quirk. If your study deck, patient handout, and exam prep all keep showing the same kind of doctor and the same kind of patient, you are not just decorating the walls. You are teaching who belongs in the room.
And then there is clinical fidelity. Nearly half the reviewed studies, 47.2%, reported problems like anatomical hallucinations or medical equipment that looked believable but wrong [1]. That last part matters most. A bad cartoon is easy to reject. A polished lie is harder. AI can generate the educational equivalent of a beautifully installed staircase that leads directly into drywall.
Bias and bad anatomy are not separate leaks
One of the sharpest points in this paper is that representational bias and clinical fidelity problems often show up together [1]. The image can be socially misleading and medically misleading at the same time. That combination is nasty because visual realism makes people drop their guard. Your brain sees high-resolution shading and thinks, "Sure, inspector signed off on this." Meanwhile the oxygen mask is wrong, the lesion is off, and the demographics are skewed.
That fits with other recent work. A 2024 study in Frontiers in Artificial Intelligence found AI-generated anesthesiologist images were disproportionately White, with DALL-E 2 depicting 64.2% of subjects as White and Midjourney 83.0%, while also layering on stereotype-coded traits like "trustworthy" and "attractive" [2]. Another 2025 study found DALL-E 3 could reproduce diversity patterns and stereotypes across healthcare-provider imagery at scale, not just in one-off prompts [3]. The foundation problem is in the materials, not just the staging.
This also matches broader medical-imaging bias research. Stanley and colleagues argued in 2024 that bias evaluation in medical imaging needs to be systematic and scenario-based, because fairness patched in one setting may crack in another [4]. MIT researchers made the same practical point in June 2024: debias a model in one hospital, and the fix may not hold somewhere else [5]. In construction terms, passing inspection on one lot does not mean the same framing survives a different soil condition.
There is useful lumber here, but stop treating it like finished cabinetry
None of this means AI-generated medical imagery is worthless. A 2024 JAMA Network Open study found AI-generated images could help pediatric residents learn to recognize rare genetic conditions like Kabuki and Noonan syndromes, suggesting real educational upside when the images are carefully used as adjuncts rather than gospel [6]. So the material itself is not banned from the site. It just needs competent supervision.
That is where this review lands squarely on common sense: stop treating AI images as neutral educational stock art and start treating them like provisional drafts that require expert curation and visual AI literacy [1]. That means checking anatomy, checking equipment, checking demographic representation, and teaching students that "looks real" is not the same as "is reliable." Frankly, medicine should be pickier here than your average social media feed. The stakes are a little higher than posting a cursed pasta recipe.
There is also a timing issue. The FDA's AI-enabled medical device program keeps expanding, and on January 6, 2025, the agency published draft lifecycle guidance for AI-enabled device software functions [7]. Translation: more AI is entering real clinical workflows, not less. If medical education trains students on biased or clinically sloppy visuals now, we are basically pouring a bent foundation for the workforce that will later use AI at the bedside.
The foreman's verdict
The foundation of this paper is solid. AI-generated medical images are fast, cheap, and visually persuasive. They are also fully capable of teaching the wrong lesson with a straight face. That makes them useful only if educators treat them like raw material - inspect every beam, reject warped pieces, and never confuse a glossy rendering with code-compliant construction.
If medical schools want to use these tools, fine. But bring a real inspector.
References
-
Alon L, Hadar Shoval D, Levkovich I. Bias, representation, and clinical fidelity in AI-generated images for medical education: a systematic literature review. npj Digital Medicine. 2026. DOI: 10.1038/s41746-026-02608-3. PubMed: 42000932
-
Gisselbaek M, Minsart L, Köselerli E, et al. Beyond the stereotypes: Artificial Intelligence image generation and diversity in anesthesiology. Front Artif Intell. 2024;7:1462819. DOI: 10.3389/frai.2024.1462819. PMCID: PMC11497631
-
Uddagiri V, Isunuri A, et al. Evaluating diversity and stereotypes amongst AI generated representations of healthcare providers. Front Digit Health. 2025. DOI: 10.3389/fdgth.2025.1537907. PubMed: 40352327
-
Stanley EAM, Wilms M, et al. Towards objective and systematic evaluation of bias in artificial intelligence for medical imaging. J Am Med Inform Assoc. 2024;31(11):2613-2621. DOI: 10.1093/jamia/ocae165
-
MIT News. Study reveals why AI models that analyze medical images can be biased. Published June 28, 2024. https://news.mit.edu/2024/study-reveals-why-ai-analyzed-medical-images-can-be-biased-0628
-
Waikel RL, Othman AA, Patel T, et al. Recognition of Genetic Conditions After Learning With Images Created Using Generative Artificial Intelligence. JAMA Netw Open. 2024;7(3):e242609. DOI: 10.1001/jamanetworkopen.2024.2609. PubMed: 38488790
-
U.S. Food and Drug Administration. Artificial Intelligence in Software as a Medical Device. Draft guidance published January 6, 2025. https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-software-medical-device
Disclaimer: This blog post is a simplified summary of published research for educational purposes. The accompanying illustration is artistic and does not depict actual model architectures, data, or experimental results. Always refer to the original paper for technical details.