At 7:12 a.m., a complete blood count analyzer starts its shift by counting cells in a tube of blood and trying very hard not to get dragged into oncology.
Usually, that machine has a humble job. Red cells, white cells, platelets, hemoglobin. A tidy little census of the bloodstream. But in this study, the CBC becomes the opening scene of a much bigger detective story: why do some people with extra monocytes, those immune cells that wander around like tiny cleanup crews with opinions, later develop serious blood disease while others do not?
Dunn and colleagues looked at 431,531 UK Biobank participants and asked a deceptively simple question: when clonal hematopoiesis and monocytosis show up together, is that just biological background noise, or is the smoke alarm actually smelling smoke?
The Weird Middle Zone Before Leukemia
First, a translation break.
Clonal hematopoiesis means some blood-forming stem cell picked up a mutation and started producing a genetically related “clone” of blood cells. This gets more common with age, because biology, like a neglected group chat, accumulates weird history over time. Clonal hematopoiesis can be harmless, but it raises the risk of blood cancers and cardiovascular disease.
Monocytosis means there are more monocytes than expected in the blood. Chronic myelomonocytic leukemia, or CMML, is a rare blood cancer where persistent monocytosis is a key clue. But plenty of people sit in the gray zone: they have clonal blood mutations and elevated monocytes, yet they do not meet full CMML criteria.
That gray zone now has names: CMUS, clonal monocytosis of undetermined significance, and CCMUS, clonal cytopenia and monocytosis of undetermined significance. Medical naming committees do not get paid by the syllable, but you would be forgiven for wondering.
The Plot Twist: Not All Monocytosis Is Equal
The study found that CMUS with absolute monocytosis and CCMUS were not just awkward labels sitting in a classification manual. They were linked to higher risk of future myeloid neoplasia, cardiovascular disease, and kidney disease.
That matters because “undetermined significance” can sound like a polite shrug. Here, the shrug came with receipts.
The researchers also noticed two details that sharpened the picture. First, men generally had higher monocyte counts than women, so a single threshold may blur risk in both directions. Second, isolated DNMT3A mutations, a common clonal hematopoiesis finding, seemed less ominous in this specific setting than other mutation patterns. When the authors adjusted the CMUS/CCMUS definition using sex-specific monocyte thresholds and excluded isolated DNMT3A mutations, the association with future myeloid neoplasia became stronger.
In plain English: the old rulebook caught some real risk, but it also scooped up people who may not belong in the same risk bucket. The new version uses a finer sieve.
Enter MoSAIC, the Blood Count Whisperer
Then the paper takes its machine-learning turn.
The team built MoSAIC, short for “Monocytosis with SRSF2 Automatically Inferred from Counts,” to predict SRSF2 mutation status using complete blood count indices alone. SRSF2 is a spliceosome gene often seen in CMML and higher-risk clonal states. If genes are the recipe book, spliceosome mutations are like a distracted editor moving commas around in every recipe. Suddenly the cake is soup.
MoSAIC used a random forest classifier, which is basically a crowd of decision trees voting on the answer. One tree might overreact like a first-year med student with WebMD access. A forest is calmer. It asks many slightly different questions, then takes the vote.
The practical promise is obvious: genomic testing is powerful, but it is not always available, cheap, or ordered early. CBCs, by contrast, are everywhere. They are the supermarket receipt of medicine: not glamorous, but surprisingly revealing if you know what to look for.
The authors then checked their findings in an independent cohort of 625,328 Danish primary care patients. That external validation is the part of the movie where the detective leaves the lab and tests the theory in another city.
Why This Is Useful, Not Magical
This paper does not say a CBC can diagnose hidden blood cancer by itself. It does not say everyone with high monocytes needs panic, sequencing, and a dramatic soundtrack.
It says risk is structured. Some combinations of cell counts, cytopenias, sex-specific thresholds, and mutation patterns point toward a higher-risk state. If reproducible in more health systems, that could help clinicians decide who needs closer monitoring, molecular testing, or specialist review.
The bigger idea is also very modern: machine learning may be most useful in medicine when it does not pretend to be a wizard. MoSAIC is not reading minds. It is squeezing extra signal from routine data clinicians already collect. That is less flashy than a robot doctor and much more believable.
And for patients, that is the real story. A boring blood test may become a smarter early-warning system. Still boring, ideally. In medicine, boring is often the luxury version.
References
-
Dunn WG, Sachs MC, Maggi M, et al. “The prevalence and clinical significance of clonal monocytosis.” Blood. 2026. DOI: 10.1182/blood.2025031883. PMID: 41802133.
-
Weeks LD, Niroula A, Neuberg D, et al. “Prediction of Risk for Myeloid Malignancy in Clonal Hematopoiesis.” NEJM Evidence. 2023;2(5). DOI: 10.1056/EVIDoa2200310. PMID: 37483562.
-
Dunn WG, Withnell I, Gu M, et al. “CHIC: A machine learning framework for inferring the presence of high-risk clonal hematopoiesis using complete blood count data.” HemaSphere. 2025. DOI: 10.1002/hem3.70169.
-
Patnaik MM, Tefferi A. “Chronic myelomonocytic leukemia: 2024 update on diagnosis, risk stratification and management.” American Journal of Hematology. 2024;99(6):1142-1165. DOI: 10.1002/ajh.27271. PMID: 38450850.
-
Singh J, Li N, Ashrafi E, et al. “Clonal hematopoiesis of indeterminate potential as a prognostic factor: a systematic review and meta-analysis.” Blood Advances. 2024;8(15):3771-3784. DOI: 10.1182/bloodadvances.2024013228.
Disclaimer: This blog post is a simplified summary of published research for educational purposes. The accompanying illustration is artistic and does not depict actual model architectures, data, or experimental results. Always refer to the original paper for technical details.