AIb2.io - AI Research Decoded

Build the interview like it has to survive weather

Twenty years ago, researchers tried squeezing future doctors through standard admissions interviews. It didn't work. This paper explains why and fixes it.

Or at least it tightens the bolts.

Build the interview like it has to survive weather

The paper by Menon, Mahajan, and Powell is not a giant new clinical trial with smoke pouring out of the lab. It is a News & Views piece in npj Digital Medicine that looks at a newer problem: what happens when virtual admissions interviews meet generative AI, and every applicant suddenly has a very polished robot ghostwriter sitting just off camera? Their main argument is refreshingly practical. Stop treating this like a detective story and start treating it like a design problem. Build the interview so AI help is less useful in the first place, instead of playing whack-a-mole with surveillance software and suspicious vibes alone [Menon et al., 2026, DOI: https://doi.org/10.1038/s41746-026-02667-6].

The foundation here is the Multiple Mini-Interview, or MMI. That format uses a series of short stations instead of one long chat, which helps schools assess soft skills more reliably and reduces the chance that one charming handshake carries the whole job site [Wikipedia: Multiple mini-interview; Eva et al., 2004, DOI: https://doi.org/10.1111/j.1365-2929.2004.01796.x].

Menon and colleagues lean on a 2025 randomized study by Eva and coauthors that tested exactly the question admissions committees were dreading: if applicants can use ChatGPT during a virtual MMI, do they gain an edge? Surprisingly, not much. In that study, giving candidates access to ChatGPT did not produce a statistically meaningful performance boost, and reliability stayed in the normal range. That matters. It suggests a well-structured interview is more load-bearing than people feared [Eva et al., 2026 version of record for 2025 study, DOI: https://doi.org/10.1038/s41746-025-02256-z].

The clever bit was timing. If you shorten the unsupervised prep window and shift more of the station into live interaction, AI becomes less useful. That makes sense. A chatbot can help you draft a polished answer. It is much worse at rescuing you mid-conversation when an interviewer pushes back, changes the angle, or asks, essentially, "Okay, but do you believe what you just said?" That is where the scaffolding ends and the actual structure has to stand.

The paper's real target is the security-theater industry

A lot of institutions reached for AI detectors, proctoring tools, and bans. That sounds tough, but the concrete may be mixed wrong.

Detection tools have well-known weaknesses. Weber-Wulff and colleagues tested several AI-text detectors and found unreliable performance across conditions [Weber-Wulff et al., 2023, DOI: https://doi.org/10.1007/s40979-023-00146-z]. Liang and colleagues found something worse: GPT detectors disproportionately flagged non-native English writing as AI-generated, which is the sort of fairness problem that should make any admissions office put down the hammer and step away slowly [Liang et al., 2023, DOI: https://doi.org/10.1016/j.patter.2023.100779; arXiv:2304.02819].

That is the heart of Menon et al.'s case. If your inspection tool mistakes honest applicants for cheaters, especially applicants already navigating language and access barriers, the whole frame goes out of plumb. You have not protected fairness. You have built a nicer-looking bias.

This also lines up with broader admissions research. A 2025 scoping review in Nurse Educator found growing use of AI by both applicants and reviewers, but stressed unresolved ethical questions around authenticity, policy, and equity [Lewis et al., 2025, DOI: https://doi.org/10.1097/NNE.0000000000001753]. Another 2025 paper on medical school interviews argued that human judgment still has to be the foreman on site, even if AI becomes part of preparation or process design [MacIntosh et al., 2025, DOI: https://doi.org/10.1007/s40670-025-02607-1].

What this changes in the real world

The practical takeaway is almost boring, which is usually a good sign. Better interview design beats fancier policing.

Programs can reduce solo prep time, keep prompts hidden until the live station begins, favor responsive conversation over canned speeches, and train interviewers to probe for reasoning, tradeoffs, and reflection. In plain English: stop asking questions a chatbot can turn into a five-paragraph LinkedIn post. Ask questions that require the applicant to do some live framing lumber with their own brain.

This matters beyond medicine. Law schools, graduate programs, and employers are all staring at the same blueprint. There are already AI-powered coaching products for interview prep all over the market, and text watermarking systems such as Google DeepMind's SynthID-Text are being explored for provenance, but even the best watermarking is only part of the story [Nature, 2024, DOI: https://doi.org/10.1038/s41586-024-08025-4]. You still need assessments that work when the tools get better.

Menon et al. are basically saying: quit pretending you can frisk the internet out of the room. Design better rooms.

That is solid construction logic. If you know wind exists, you do not lecture the wind. You reinforce the structure.

References

Menon T, Mahajan A, Powell D. Designing AI-resilient admissions interviews for health professions training in the age of generative AI. npj Digital Medicine. 2026. DOI: https://doi.org/10.1038/s41746-026-02667-6. PMID: 42026111.

Eva KW, Martin S, Macala C, Shirzad S. The impact of GenAI on applicant behaviour, performance, and interview reliability during virtual interviews for medical school admissions. npj Digital Medicine. 2026;9:75. DOI: https://doi.org/10.1038/s41746-025-02256-z.

Lewis LS, Hartman AM, Brennan-Cook J, Felsman IC, Colbert B, Ledbetter L, Gedzyk-Nieman SA. Artificial Intelligence and Admissions to Health Professions Educational Programs: A Scoping Review. Nurse Educator. 2025;50(1):E13-E18. DOI: https://doi.org/10.1097/NNE.0000000000001753. PMID: 39418331.

MacIntosh A, Roulin N, Amiri L, Koutroulis I. Artificial Intelligence and the Medical School Admissions Interview: Strategic Guidance, Risks, and Lessons from Industrial-Organizational Psychology. Medical Science Educator. 2025;36(1):39-46. DOI: https://doi.org/10.1007/s40670-025-02607-1. PMID: 41939071.

Weber-Wulff D, Anohina-Naumeca A, Bjelobaba S, et al. Testing of detection tools for AI-generated text. International Journal for Educational Integrity. 2023;19:26. DOI: https://doi.org/10.1007/s40979-023-00146-z.

Liang W, Yüksekgönül M, Mao Y, Wu E, Zou J. GPT detectors are biased against non-native English writers. Patterns. 2023;4(7):100779. DOI: https://doi.org/10.1016/j.patter.2023.100779. arXiv:2304.02819.

Kirchenbauer J, Geiping J, Wen Y, et al. Scalable watermarking for identifying large language model outputs. Nature. 2024;634:818-823. DOI: https://doi.org/10.1038/s41586-024-08025-4.

Disclaimer: This blog post is a simplified summary of published research for educational purposes. The accompanying illustration is artistic and does not depict actual model architectures, data, or experimental results. Always refer to the original paper for technical details.