If the Paper Had an Honest Title: “We Made Noisy DNA Reads Behave Well Enough to Build Whole Chromosomes, and Honestly We Are Also Slightly Nervous About the Repeats”

Telomere-to-telomere genome assembly sounds like a quest item, because it sort of is. The goal is to reconstruct each chromosome from one protective end-cap, the telomere, all the way to the other one, without leaving mysterious gaps labeled “here be repetitive DNA dragons.” Stanojević and colleagues’ Nature paper introduces HERRO, a deep learning error-correction system for Oxford Nanopore Simplex reads, and if I am reading this right, the pitch is: what if one long-read sequencing technology could get us much closer to reference-grade, phased genomes without the usual pile of extra sequencing platforms, chemistry, and lab-budget incense? [1]

The Problem: DNA Is Long, Repeats Are Rude

Genome assembly is basically taking millions of shredded DNA strips and reconstructing the original book. Easy, except the book is three billion letters long, half the pages say something like “TTAGGG TTAGGG TTAGGG,” and your scanner occasionally swaps letters like it had one espresso too many.

Short reads are accurate but too tiny to bridge long repetitive regions. Long reads can span the awkward bits, including centromeres and segmental duplications, but Oxford Nanopore Simplex reads have historically been noisier than researchers would like. That noise matters because assemblers need to know whether a difference is a real biological variant or just the sequencer doing jazz.

This is especially touchy in diploid genomes, like ours, where you have two chromosome copies: one from each parent. A haplotype-aware assembler must keep those two copies straight. Correct me if I am wrong, but the nightmare is “fixing” a real maternal-vs-paternal difference because it looks like an error. That is not proofreading. That is editing your mom out of the genome.

HERRO’s Trick: Pay Attention to the Suspicious Bits

HERRO stands for haplotype-aware error correction. The system builds piles of overlapping reads, then uses convolutional blocks plus a Transformer encoder to focus on “informative positions” - places where differences may distinguish haplotypes or repeat copies [1]. The boring, obvious positions get majority voting. The suspicious positions get the neural-network spotlight, which is basically the lab version of “everyone stop talking, this part matters.”

That choice is the interesting AI bit. HERRO is not a chatbot for chromosomes. It is a very specialized proofreader trained to ask: is this base wrong, or is it a real difference that biology put there on purpose? That distinction is the whole sandwich.

The reported gains are not subtle. HERRO improved read accuracy up to 100-fold for diploid human genomes, reduced total errors by about 50-fold on average across tested datasets, and pushed human mismatch errors from more than 100 per 10 kb to fewer than 1 per 10 kb after correction [1]. I have read that sentence several times, and yes, it still feels like the sequencer got glasses.

What They Built With It

When the team fed HERRO-corrected reads into Verkko, they reconstructed up to 32 human chromosomes telomere-to-telomere, including X and Y, with NGA50 values at or above 100 Mb across several human genomes [1]. For context, Verkko itself was a major step toward automated diploid T2T assembly, producing 20 of 46 HG002 chromosomes without gaps at very high accuracy [3]. HERRO is trying to reduce the equipment sprawl: fewer platforms, fewer workflows, less DNA input, fewer opportunities for your wet lab calendar to develop villain energy.

The paper also tested non-human genomes, including zebrafish, Arabidopsis, and Drosophila, which matters because a method that only works on one celebrity human sample is useful, but also a little needy. The broader dream is cheaper, scalable genome assembly for pangenomes, biodiversity work, structural variation, rare disease research, and cancer genomics. If expanded and reproduced, this could help researchers inspect genomic regions that older references treated like that one closet nobody opens.

The Asterisk, Because Biology Charges Rent

This is not “assembly solved, everyone go home.” The authors are refreshingly clear that the pipeline still has rough edges: possible undercorrection or overcorrection, homopolymers that remain annoying, alignment pileups that can get messy around repeats, runtime and memory pressure, and assemblers that were not originally designed for ultra-long reads corrected up to Q40-ish quality [1]. Translation: the reads got better, but the downstream software may still be wearing last season’s assumptions.

There is also a nearby comparison worth keeping in your mental junk drawer. Cheng and colleagues introduced hifiasm (ONT), which assembles near-T2T genomes from standard ONT Simplex reads and can reduce compute demands while recovering many T2T chromosomes [4]. HERRO seems to shine on per-base accuracy and haplotype consistency, while hifiasm (ONT) pushes hard on practical contiguity and scale. I think the honest takeaway is not “one tool wins.” It is more like genomics has finally reached the fun part where multiple good tools disagree in productive ways, like scientists but with fewer conference pastries.

Why This Is Cool, Carefully

Complete genomes are not just prettier genomes. They expose structural variation, repeat expansions, duplicated genes, and chromosome-end regions that matter in evolution, disease, and population genetics. The T2T era has already changed what “reference quality” means [2], and the human pangenome effort shows why one reference is not enough for everyone [6]. HERRO fits into that bigger shift: make complete, phased assemblies less artisanal.

And yes, “less artisanal” is a compliment here. Genomics should not require a ceremonial blend of three sequencing technologies, six pipelines, and a spreadsheet named final_FINAL_v7.

References

Stanojević, D., Lin, D., Nurk, S., Florez de Sessions, P. & Šikić, M. “Telomere-to-telomere assembly using HERRO-corrected Nanopore Simplex reads.” Nature 655, 158-165 (2026). DOI: 10.1038/s41586-026-10563-y. PMID: 42045451.
Li, H. & Durbin, R. “Genome assembly in the telomere-to-telomere era.” Nature Reviews Genetics 25, 658-670 (2024). DOI: 10.1038/s41576-024-00718-w. arXiv: 2308.07877.
Rautiainen, M. et al. “Telomere-to-telomere assembly of diploid chromosomes with Verkko.” Nature Biotechnology 41, 1474-1482 (2023). DOI: 10.1038/s41587-023-01662-6. PMCID: PMC10427740.
Cheng, H. et al. “Efficient near-telomere-to-telomere assembly of nanopore simplex reads.” Nature 655, 166-173 (2026). DOI: 10.1038/s41586-026-10105-6.
Cheng, H., Asri, M., Lucas, J., Koren, S. & Li, H. “Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph.” Nature Methods 21, 967-970 (2024). DOI: 10.1038/s41592-024-02269-8. PMCID: PMC11214949.
Liao, W.-W. et al. “A draft human pangenome reference.” Nature 617, 312-324 (2023). DOI: 10.1038/s41586-023-05896-x.

Disclaimer: This blog post is a simplified summary of published research for educational purposes. The accompanying illustration is artistic and does not depict actual model architectures, data, or experimental results. Always refer to the original paper for technical details.

AIb2.io - AI Research Decoded