AIb2.io - AI Research Decoded

The Doctor Is In, Sort Of

If you have ever waited three months for a specialist, argued with a symptom checker at 1 a.m., or watched a doctor type like a caffeinated court stenographer, Mariana Lenharo's Nature piece on "AI doctors" is for you: it asks whether medical AI is becoming a careful assistant, a second opinion, or the world's most expensive autocomplete with a stethoscope.

Lenharo's article, "How good are 'AI doctors' - and will they take over medicine?", surveys a moment when large language models are no longer just passing medical exams like nervous robots in a lab coat. They are being tested on messier tasks: emergency-department diagnosis, patient conversations, and clinical reasoning.

The Doctor Is In, Sort Of

That matters because medicine is not a tidy multiple-choice quiz. It is coughs, half-remembered medication names, insurance constraints, missing lab values, family history, fear, pain, and someone saying "it started last Tuesday" when they mean maybe March. A good doctor works inside that mess. A model reads patterns in text and predicts what should come next. Sometimes that looks almost magical. Sometimes it looks like a toaster trying to do haiku.

The recent results are still worth taking seriously.

Sparse Attention, Full Waiting Room

One headline study in Science tested OpenAI's o1 model on physician reasoning tasks, including real emergency-department cases from a Boston hospital. In one comparison, the model produced an exact or close diagnosis in about 67% of triage cases, while two physicians scored around 50-55% under the same constrained setup Brodeur et al., 2026.

That sounds dramatic. It is dramatic. But it also has ma, negative space. The empty parts matter.

The AI did not examine patients. It did not hear the tremor in a voice, notice confusion, smell ketones, or watch someone wince while pretending they are fine. It read clinical text. That is powerful because hospitals already run on text: notes, discharge summaries, lab reports, messages, more notes, notes about notes. Medicine has quietly become a paperwork monastery.

There is elegance in using an LLM where language already carries the work. Not every connection needs to exist for the whole to be useful. That is the wabi-sabi version of clinical AI: imperfect, bounded, helpful in the right light.

Bedside Manner By Text Message

The other major thread in Lenharo's article is Google's AMIE system, described in a 2026 arXiv preprint. AMIE chatted by text with 100 adult urgent-care patients before their appointments, collected histories, and proposed differential diagnoses. Its final diagnosis appeared somewhere in its differential in 90% of cases, and in its top three in 75% Brodeur et al., 2026.

That is not "the robot doctor has arrived." It is more like "the robot medical student read the chart carefully, asked decent questions, and did not spill coffee on the keyboard." Useful? Yes. Ready to practice alone? Please do not hand it prescription privileges and a tiny white coat.

The strongest near-term use may be pre-visit history taking, summarization, triage support, and second-opinion generation. These are not glamorous. They are the quiet tools that restore time. In Japanese design terms, their ikigai is not replacing the physician. It is making the clinical encounter less cluttered.

A doctor who spends less time wrestling the electronic health record may spend more time looking at the person in front of them. That is not a sci-fi fantasy. That is just humane workflow design, which in American healthcare counts as a minor miracle.

The Hard Part Is Trust

Clinical decision support systems have existed for decades. The new twist is that LLMs can handle free-form language, not just rigid checklists. As background, a clinical decision support system combines patient data and medical knowledge to help clinicians make choices. A large language model predicts and generates text after training on enormous corpora. Put those together, and you get something that can sound like a thoughtful consultant.

Sounding thoughtful is not the same as being accountable.

Researchers have warned that medical LLMs need serious validation, regulation, and monitoring because chatbots can be unreliable in exactly the places where unreliability is rude at best and dangerous at worst Haupt & Marks, 2023. Another Nature Medicine review notes that LLMs in medicine bring both promise and risk, including misinformation, bias, privacy concerns, and fuzzy responsibility when things go wrong Thirunavukarasu et al., 2023.

Bias deserves special attention. If training data reflects unequal care, the model may learn the inequality with excellent grammar. A biased AI is not less biased because it says "based on the available evidence" before doing something foolish. That is just bad judgment wearing a linen suit.

So, Will AI Take Over Medicine?

No. Not cleanly. Not soon. And not in the way the headline monster wants.

AI will probably take over pieces: drafting notes, summarizing records, suggesting differential diagnoses, flagging missed risks, helping patients prepare for visits, and giving clinicians a second pair of very fast, very literal eyes. In underserved settings, cheap chatbots may widen access to basic medical guidance, though that only helps if safety, escalation, and local context are handled well.

The bigger shift is quieter. Doctors may become supervisors of a growing layer of machine-generated suggestions. Patients may arrive better informed, or more confidently misinformed. Hospitals may discover that adding AI without redesigning workflows is like placing a bonsai tree in a server rack: visually interesting, operationally confused.

Lenharo's piece lands in the right place: the evidence is impressive, but medicine remains beautifully inconvenient. Bodies are not spreadsheets. Patients are not prompts. The best future is not an AI doctor replacing the human one. It is a calmer room, a cleaner chart, and a clinician with more attention left for the strange, imperfect person who came in asking for help.

References

  1. Lenharo, M. "How good are 'AI doctors' - and will they take over medicine?" Nature (2026). DOI: 10.1038/d41586-026-01691-6. PMID: 42236602

  2. Brodeur, P. G. et al. "Performance of a large language model on the reasoning tasks of a physician." Science 392, 524-527 (2026). DOI: 10.1126/science.adz4433. PMID: 42060751

  3. Brodeur, P. et al. "A prospective clinical feasibility study of a conversational diagnostic AI in an ambulatory primary care clinic." arXiv (2026). DOI: 10.48550/arXiv.2603.08448

  4. Haupt, C. E. & Marks, M. "Large language model AI chatbots require approval as medical devices." Nature Medicine 29, 2396-2398 (2023). DOI: 10.1038/s41591-023-02412-6

  5. Thirunavukarasu, A. J. et al. "Large language models in medicine." Nature Medicine 29, 1930-1940 (2023). DOI: 10.1038/s41591-023-02448-8

Disclaimer: This blog post is a simplified summary of published research for educational purposes. The accompanying illustration is artistic and does not depict actual model architectures, data, or experimental results. Always refer to the original paper for technical details.