Somewhere in a hospital database, there's a patient whose medical records say "diverticular disease" and absolutely nothing else useful. Meanwhile, the CT scan report sitting in another digital drawer contains a goldmine of specific details about what's actually going on in their gut. For decades, these two systems have coexisted like roommates who never talk - and researchers just built a translator.
A team from Massachusetts General Hospital, Harvard, and the University of Washington trained a natural language processing (NLP) algorithm to read CT scan reports and accurately classify diverticulitis complications [1]. And here's the kicker: their specialized algorithm beat both traditional diagnostic codes AND a generalist large language model at the task. Turns out, sometimes the specialist knows their stuff.
The Problem With Medical Billing Codes
Diverticulitis - that condition where small pouches in your colon get inflamed and decide to ruin your week - affects millions of people. But when researchers want to study it using electronic health records, they hit a wall. The ICD diagnostic codes hospitals use are basically just "yep, diverticula exist" with no nuance about whether someone had a minor flare-up or ended up in surgery.
It's like trying to understand someone's driving record when the only information available is "owns a car." Not super helpful for predicting whether they'll need their insurance.
CT scan reports, on the other hand, contain the actual details radiologists observed. The problem? They're written in free text, not nice structured data. You can't easily query "show me all patients with perforated diverticulitis and abscess formation" when that information lives in paragraphs written by hundreds of different radiologists over 45 years.
Teaching an Algorithm to Read Doctor-Speak
The researchers pulled data from Mass General Brigham's patient registry spanning 1979 to 2024 - that's 45 years of CT reports, for those keeping score. They developed an NLP algorithm specifically trained to extract diverticulitis-related information from these unstructured reports.
The results were impressive. Their algorithm achieved positive predictive values between 82.8% and 99.9% for detecting various diverticulitis features. More importantly, it outperformed both the crude ICD codes AND a general-purpose large language model [1].
This isn't entirely surprising. General LLMs are trained on everything from Shakespeare to Reddit arguments about whether hot dogs are sandwiches. A specialized algorithm trained specifically on radiology reports for one condition should, in theory, understand the nuances better. And that's exactly what happened.
What the Algorithm Found
Once the researchers could actually classify patients accurately, they discovered something clinically useful: the severity at first diagnosis strongly predicted who would have severe recurrence later.
Among 16,349 patients with NLP-detected diverticulitis, they tracked outcomes over 76,736 person-years. Compared to folks with uncomplicated diverticulitis:
- Mild complications at diagnosis meant 39% higher risk of severe recurrence
- Severe initial complications jumped that to 202% higher risk
- Chronic complications? 441% higher risk of a bad outcome down the road [1]
These hazard ratios are substantial. They suggest that knowing exactly what someone's CT showed at first presentation could meaningfully change how aggressively doctors monitor them afterward.
Why This Matters Beyond Academic Papers
Electronic health records contain massive amounts of unstructured clinical text - radiology reports, pathology notes, physician documentation. All that information has been largely untapped for research because nobody could efficiently extract structured data from millions of free-text documents.
NLP algorithms like this one change the game. Suddenly, researchers can build large, high-quality cohorts based on actual clinical findings rather than imprecise billing codes. For conditions like diverticulitis, this means better epidemiological studies, improved risk prediction models, and potentially personalized management strategies.
The prediction angle is particularly intriguing. The researchers used random forest models and found that NLP-detected features significantly improved prediction of severe recurrence compared to using codified variables alone [1]. If validated prospectively, this approach could help clinicians identify which patients need closer follow-up versus which ones can be reassured.
Tools that process medical documents privately and efficiently are becoming increasingly valuable in healthcare research. The ability to extract meaningful, structured information from clinical text - while keeping sensitive data secure - represents one of the more practical applications of NLP technology today.
The Bigger Picture
This study demonstrates something important about medical AI: sometimes the most useful applications aren't the flashy ones. An algorithm that accurately reads CT reports and classifies disease severity isn't going to make headlines the way chatbots do. But it solves a real problem that's been limiting medical research for decades.
The researchers also showed that their specialized approach beat a generalist LLM. In an era where everyone assumes bigger models are always better, this is a useful reminder that domain-specific training still matters. Your general-purpose AI assistant might write decent poetry, but for reading radiology reports about inflamed colonic pouches, you probably want the specialist.
References
-
Ma W, Wu Y, Challa PK, et al. Natural language processing algorithm accurately classifies diverticulitis-related complications and predicts long-term outcomes. Clinical Gastroenterology and Hepatology. 2026. doi:10.1016/j.cgh.2026.03.009. PMID: 41881290
-
Strate LL, Morris AM. Epidemiology, Pathophysiology, and Treatment of Diverticulitis. Gastroenterology. 2019;156(5):1282-1298.e1. doi:10.1053/j.gastro.2018.12.033
-
Wu S, Roberts K, Datta S, et al. Deep learning in clinical natural language processing: a methodical review. Journal of the American Medical Informatics Association. 2020;27(3):457-470. doi:10.1093/jamia/ocz200
Disclaimer: This blog post is a simplified summary of published research for educational purposes. The accompanying illustration is artistic and does not depict actual model architectures, data, or experimental results. Always refer to the original paper for technical details.