Good news: AI is getting weirdly good at medical reasoning. Bad news: if trainees let the robot do all the hard thinking, we may end up with a generation of clinicians who can click "accept suggestion" with real confidence and far less actual judgment.
That is the argument in a May 7, 2026 JAMA viewpoint, "Promoting Clinical Expertise in the Age of AI: No Struggle, No Mastery" by Ron Keren, Bimal R. Desai, and Daniel C. West [1]. The paper is not another chest-thumping benchmark parade. It is a warning label. The authors point out that AI is already moving beyond paperwork and simple support tasks into summarizing data, interpreting findings, proposing diagnoses, and suggesting treatments. Handy? Sure. Also slightly terrifying if you are still learning how to think like a doctor.
The Part Where the Struggle Actually Matters
Clinical expertise does not appear because someone read a lot and looked serious in a white coat. It comes from repeated wrestling matches with uncertainty: sorting relevant from irrelevant clues, building a differential diagnosis, revising it when new evidence shows up, and occasionally realizing your "obvious" answer was nonsense wearing a necktie.
That ugly, effortful process is the point.
Keren and colleagues argue that the biggest risk is not just deskilling, where experienced clinicians get rusty by outsourcing too much to AI. It is never skilling. If a trainee gets the answer before doing the reasoning, they may miss the cognitive reps that build expertise in the first place [1]. No reps, no muscle. No struggle, no mastery. The gym analogy writes itself because medicine, like leg day, punishes cheating later.
Why This Is Not Just Professors Yelling at a Cloud
The awkward part is that the concern comes at the exact moment AI looks genuinely useful. A 2024 JAMA Internal Medicine study found GPT-4 scored higher than residents and attending physicians on one structured measure of clinical reasoning documentation, even though diagnostic accuracy looked similar across groups [2]. Another 2024 randomized clinical trial found that giving physicians access to an LLM did not clearly improve diagnostic reasoning over conventional resources alone [3]. Translation: the model can look sharp on some tasks, but plugging it into real clinical thinking does not magically produce House, M.D. with better uptime.
That pattern shows up elsewhere too. A 2024 Nature Medicine paper tested large language models on 2,400 real patient cases and found they underperformed physicians, struggled with guidelines and lab interpretation, and were too brittle for autonomous clinical decision-making [4]. In other words, the AI can ace a polished exam question and then wobble when reality shows up with missing information, messy context, and all the emotional elegance of a 3 a.m. consult.
Which, to be fair, is also how many humans feel at 3 a.m. The difference is that humans can grow into expertise through guided practice. A chatbot does not become a mentor just because it answers quickly.
The Real Risk Is Educational Ventriloquism
This is what makes the JAMA viewpoint interesting. The authors are not saying, "Ban AI, return to candlelight and clipboards." They are saying medical education has to change on purpose.
If AI writes the differential diagnosis before the resident commits to one, the learner may skip the uncomfortable middle where the real thinking happens. If AI summarizes the chart, suggests the next test, and wraps everything in fluent prose, it can create the illusion that the trainee understands more than they do. The machine becomes a very persuasive ventriloquist dummy, and the human mouth moves along.
That matters because clinical reasoning is not just fact recall. It is prioritization, doubt management, tradeoff handling, and knowing when the tidy answer is probably wrong. A 2024 narrative review in BMC Medical Education makes a similar point: clinicians will need new AI-era skills, but they still need the old foundations of judgment, safety, and bias awareness [5].
One practical fix is to force a "commit first, compare later" workflow. Make the trainee form a differential and plan before seeing the AI output. Then compare. Then argue. Then revise. Basically, use the model like a sparring partner, not a spell-checker for your brain. If you want to sketch that reasoning chain visually, a tool like mapb2.io actually makes more sense than pretending your frontal lobe enjoys juggling six tabs and a half-finished progress note.
So What Should We Do With the Robot Intern?
Probably the same thing we do with any talented, overconfident assistant: supervise it, give it bounded jobs, and absolutely do not let it train the next generation by itself.
Recent reviews show LLMs already help with patient education, summarization, translation, and documentation, but they also bring familiar baggage: bias, inconsistency, opacity, and a truly impressive ability to sound correct while being wrong in a calm, professional tone [4,6]. That makes them useful tools, not substitutes for apprenticeship.
The big idea from this JAMA piece is simple and annoyingly sensible. If medicine hands trainees polished AI answers too early, we may save time now and lose expertise later. And in clinical care, "we optimized the workflow" is not nearly as comforting as "the doctor actually knows what they are doing."
References
-
Keren R, Desai BR, West DC. Promoting Clinical Expertise in the Age of AI: No Struggle, No Mastery. JAMA. Published online May 7, 2026. doi:10.1001/jama.2026.6097
-
Cabral S, Restrepo D, Kanjee Z, et al. Clinical Reasoning of a Generative Artificial Intelligence Model Compared With Physicians. JAMA Internal Medicine. 2024;184(5):581-583. doi:10.1001/jamainternmed.2024.0295. PMCID:PMC10985627
-
Goh E, Gallo R, Hom J, et al. Large Language Model Influence on Diagnostic Reasoning: A Randomized Clinical Trial. JAMA Network Open. 2024;7(10):e2440969. doi:10.1001/jamanetworkopen.2024.40969. PubMed:39466245
-
Hager P, Jungmann F, Holland R, et al. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nature Medicine. 2024;30:2613-2622. doi:10.1038/s41591-024-03097-1
-
McCoy LG, Ng FYC, Sauer CM, et al. Understanding and training for the impact of large language models and artificial intelligence in healthcare practice: a narrative review. BMC Medical Education. 2024;24:1096. doi:10.1186/s12909-024-06048-z
-
Busch F, Hoffmann L, Rueger C, et al. Current applications and challenges in large language models for patient care: a systematic review. Communications Medicine. 2025;5:26. doi:10.1038/s43856-024-00717-2
Disclaimer: This blog post is a simplified summary of published research for educational purposes. The accompanying illustration is artistic and does not depict actual model architectures, data, or experimental results. Always refer to the original paper for technical details.