Who's Really Steering the Ship When AI Enters the Clinic?

MIMIC-IV, the big hospital-records dataset many medical AI crews use to test their models, matters because beating benchmarks like it is how an algorithm earns a ticket from the research harbor toward the clinical sea. But Abulibdeh and colleagues ask the question that gets lost when everyone is cheering from the dock: once the AI sails into a real hospital, who is actually holding the wheel?

Their Lancet commentary, "Who's really in the loop? Rethinking oversight in AI-assisted health care," takes aim at one of healthcare AI's favorite safety charms: the "human in the loop." That phrase sounds sturdy, like a rope tied properly to the mast. A clinician sees the AI output, applies judgment, and harm is avoided. Neat. Clean. Almost suspiciously tidy.

Who's Really Steering the Ship When AI Enters the Clinic?

The authors argue that, in practice, this setup often works less like meaningful oversight and more like a decorative life jacket on a ship that has already left port.

The Loop Has a Leak

"Human in the loop" means a person remains involved in an AI-assisted decision. In healthcare, that usually means a clinician reviews an algorithmic recommendation before acting. The idea has common sense appeal. Nobody wants a black-box model making care decisions alone, especially if that model learned medicine from messy data, billing codes, and whatever clinical notes survived the copy-paste storms.

But the paper says this safeguard fails in three connected ways.

First, AI can scale old inequities at impressive speed. If a health system already underserves certain communities, an AI trained on its records may learn those patterns and then spread them with the efficiency of a well-funded kraken. Algorithmic bias is not just a technical glitch. It can reflect unequal access, unequal documentation, unequal treatment, and unequal power.

Second, harms do not always show up neatly for one "neutral" reviewer. Intersectional harms - say, effects that differ by race, gender, disability, language, insurance status, or geography in combination - can vanish when oversight treats patients as one category at a time. A lookout searching only for icebergs may miss the reef, the fog, and the fact that the compass is drunk.

Third, clinicians are busy. Very busy. Asking them to interrogate every AI output while juggling patients, alerts, documentation, insurance hurdles, and the emotional weather of medicine is like asking the ship's cook to inspect the hull during a hurricane. The problem is not that clinicians lack judgment. It is that the system often gives them too little time, authority, and information to use it well.

Accountability Is Not a Single Sailor

The authors draw from actor-network theory, feminist epistemology, and Iris Marion Young's social connection model of justice. In plain deckhand English: healthcare AI harm is produced by networks, not lone villains.

The model is trained by one group, bought by another, integrated by another, configured by another, used by clinicians, experienced by patients, and governed by institutions that may prefer a tidy signature line over a messy accountability map. When something goes wrong, the clinician at the end of the chain can become the convenient anchor point for blame.

That is the old liability model: find the person nearest the damage and pin responsibility there. But AI-assisted care is more like a convoy. If the route was bad, the maps outdated, the cargo unevenly loaded, and the admiral asleep in procurement, blaming the last sailor to touch the wheel is not justice. It is paperwork with a hat.

Recent evidence makes this concern less theoretical. A 2025 ONC brief found that 71% of U.S. non-federal acute care hospitals reported using predictive AI integrated with their EHR in 2024, up from 66% in 2023. Many hospitals evaluated accuracy and bias, but fewer did so for all or most models. That is progress, yes, but also a warning flare: adoption is outrunning deep governance.

Three Better Ways to Run the Vessel

The Lancet authors propose three sturdier routes.

First: co-reasoning. Instead of treating AI as an oracle that the doctor must approve or reject, treat it as one voice in clinical deliberation. The model offers evidence, the clinician weighs it, and the workflow supports disagreement. A good AI should be more like a seasoned navigator with charts, not a smug parrot yelling "starboard" because it saw starboard once in training.

Second: community-owned governance. Patients and affected communities need real authority, including the power to suspend harmful systems. Not advisory theater. Not a listening session with stale muffins. Actual control over whether the tool keeps running when harms appear.

Third: institutional liability. Responsibility should move upstream to the organizations that design, buy, deploy, monitor, and profit from these systems. Clinicians should not become legal sandbags for decisions made in boardrooms, vendor contracts, and implementation committees.

Why This Paper Matters

This piece is not anti-AI. That would be too easy, and frankly a bit landlubberish. AI can help with diagnosis, triage, documentation, imaging, scheduling, and a dozen other tasks where healthcare is currently held together by duct tape and caffeine.

But the authors are warning against fake safety. A human in the loop is not enough if the loop is rushed, underpowered, poorly informed, or designed mainly to launder accountability. Oversight must have teeth, time, data, and authority. Otherwise we are not governing AI. We are asking clinicians to bless the voyage after the ship has already hit the rocks.

References

Abulibdeh R, Osei Agyemang G, Celi LA, et al. "Who's really in the loop? Rethinking oversight in AI-assisted health care." The Lancet. 2026. DOI: 10.1016/S0140-6736(26)00204-7. PMID: 42070553.
Office of the National Coordinator for Health Information Technology. "Hospital Trends in the Use, Evaluation, and Governance of Predictive AI, 2023-2024." 2025. healthit.gov.
Freyer N, Groß D, Lipprandt M. "The ethical requirement of explainability for AI-DSS in healthcare: a systematic review of reasons." BMC Medical Ethics. 2024;25:104. DOI: 10.1186/s12910-024-01103-2.
Zack T, Lehman E, Suzgun M, et al. "A toolbox for surfacing health equity harms and biases in large language models." Nature Medicine. 2024. DOI: 10.1038/s41591-024-03258-2.
Chen RJ, et al. "The limits of fair medical imaging AI in real-world generalization." Nature Medicine. 2024. DOI: 10.1038/s41591-024-03113-4.

Disclaimer: This blog post is a simplified summary of published research for educational purposes. The accompanying illustration is artistic and does not depict actual model architectures, data, or experimental results. Always refer to the original paper for technical details.