If AI beats doctors at diagnosis, what happens to the humans still learning?
The part of the AI conversation we're not talking about enough — and the line we're building Medsage on.
If AI becomes better than doctors at diagnosis, what happens to the humans still learning medicine? It's the part of the AI conversation we're not talking about enough — and it starts with some genuinely good news.
First, the good news
AI in medicine really is better now. Microsoft's diagnostic system recently solved 85.5% of the toughest New England Journal of Medicine case challenges — more than four times the 20% managed by 21 experienced physicians working unaided.¹ Large language models now comfortably pass all three steps of the US medical licensing exam, often above the average human score.² The promise was simple and good: pair a practitioner with an assistant this sharp and they should think more — debating a second opinion, catching their own blind spots, reasoning better than either could alone. That's the world we expected.
It's not the one the evidence is showing.
Automation bias turns "think more" into "think less"
When a confident answer becomes effortless to get, people stop generating their own. It has a name — automation bias — and instead of debating the machine, clinicians start deferring to it. And the cost lands in two layers.
Layer one: even the experts slip
The debate is still alive for experienced doctors — they spent years building their own clinical reasoning and pattern recognition first, so they can push back. But even they aren't immune. When experienced radiologists were shown a wrong AI suggestion on a mammogram, their accuracy collapsed from 82% to 45.5% — among readers with more than fifteen years of experience.³ And it compounds: after endoscopists started using AI in colonoscopy, their ability to spot precancerous growths without it fell from 28.4% to 22.4% — about a one-fifth drop in a skill they already had.⁴ Skilled doctors, quietly losing the edge they built.
Layer two: the next generation never builds it
The deeper worry is the doctors who never reach that debate in the first place. If students grow up with the machine handing them answers before they develop their own judgement, they have far fewer layers to fall back on — and far less reason to question what the AI says. Harvard Medical School faculty warn it would be "problematic for a physician to not have that training and competence in thinking skills,"⁵ and a 2025 BMJ Evidence-Based Medicine editorial calls it deskilling — "especially important for medical students and newly qualified doctors who are learning the skill in the first place."⁶ The experienced can still argue with AI. The next generation might only know how to agree.
The line we draw
AI is one of the most powerful tools we've ever created. But the line for how we use it in medicine still needs to be carefully drawn. AI shouldn't replace human reasoning — it should strengthen it. Because once we stop practising how to think deeply, we risk losing the very skills that make us human. The most valuable clinician of the next decade won't be the one who knows the most — AI wins that contest. It'll be the one who can debate AI without surrendering their own judgement.
In TCM, the diagnosis, the pattern recognition, the clinical reasoning — that thinking stays human. Medsage records and drafts so your mind stays on the patient. It does not diagnose, prescribe, or recommend. You decide.
Because maybe the future of medicine isn't about building AI that thinks like doctors. Maybe it's about making sure doctors never stop thinking for themselves. So the next generation learns to think deeper — not less.
We're at the very beginning of the AI era — and a tool this powerful can be pointed the wrong way just as easily as the right one. That's the choice we're making at Medsage: to get it right from the start, so it strengthens how clinicians think — never replaces it. 🌿
References
- Nori et al. "Sequential Diagnosis with Language Models." Microsoft AI — The Path to Medical Superintelligence (Jun 2025). MAI-DxO solved 85.5% of 304 NEJM cases vs. 20% for 21 physicians (>4×).
- LLM USMLE benchmarks (Kung et al. 2023; 2025 follow-ups): top models ~88–93% across Steps 1–3, above passing and typical human averages. (Script's "95%" softened.)
- Dratsch T, et al. "Automation Bias in Mammography." Radiology 2023;307(4):e222176. doi:10.1148/radiol.222176 — very experienced radiologists 82.3% → 45.5% on incorrect AI suggestion.
- Budzyń K, Romańczyk M, et al. "Endoscopist deskilling risk after exposure to AI in colonoscopy." Lancet Gastroenterology & Hepatology (Aug 2025). doi:10.1016/S2468-1253(25)00133-5 — unaided ADR 28.4% → 22.4% (~20% relative).
- Schwartzstein R. "How Generative AI Is Transforming Medical Education." Harvard Medicine Magazine, Oct 2024.
- Hough J, et al. "Potential risks of GenAI on medical education." BMJ Evidence-Based Medicine 2025;30(6):406. doi:10.1136/bmjebm-2025-114339.
See Medsage in your practice.
This is the line we build on — AI that documents the consult so your judgement stays yours. Watch the demos, then try it free for 21 days. No credit card.

