A Billion Proteins Walk Into a Mass Spec...

Proteomics has a favorite party trick, and it's been doing it the same way for decades. You feed proteins into a mass spectrometer, smash them apart with collision-induced dissociation (CID), and read the resulting debris like molecular tea leaves. It works. It works really well, actually - so well that scientists can now identify proteins from a single cell (Guo et al., 2025). But CID has a dirty little secret: it's terrible at reading the fine print.

The Fragmentation Problem Nobody Talks About

CID works by slamming peptide ions into neutral gas molecules until they break. Think of it as shaking a piñata - you'll get candy, sure, but you'll also lose anything delicate taped to the outside. That's exactly what happens with post-translational modifications (PTMs), the chemical tags that cells stick on proteins to control their behavior. Phosphorylation? Gone. Glycosylation? Shattered into useless sugar fragments. These modifications are central to understanding disease, and CID keeps knocking them off before anyone can read them (Wells & McLuckey, 2005).

Alternative fragmentation methods exist. Electron-based techniques like electron capture dissociation (ECD) and ultraviolet photodissociation (UVPD) break peptides along the backbone instead, preserving those fragile modifications beautifully (Brodbelt, 2014). UVPD fires a laser at your peptides - actual laser beams, like a Bond villain's lab - and produces richer, more diverse fragment patterns. The catch? These methods have been stuck as boutique techniques, too finicky and too poorly supported by computational tools to challenge CID's dominance in routine workflows.

One Model to Fragment Them All

A team spanning the Rosalind Franklin Institute, University of Oxford, Technical University of Munich, and University of Michigan decided to fix this. Led by Nikita Levin, Cemil Can Saylan, and senior authors Mathias Wilhelm and Shabaz Mohammed, they built a mass spectrometry platform capable of running CID, UVPD, ECD, and electron ionization dissociation (EID) - all automated, all during a single liquid chromatography run. It's like a Swiss Army knife for protein fragmentation, except the knife costs more than your house.

Using multi-enzyme deep proteomics workflows, they generated massive training datasets across all these dissociation methods and fed them into Prosit, a deep learning model that predicts what mass spectra should look like for any given peptide. The twist: instead of training separate models for each fragmentation type (the obvious approach), they trained a single unified model that handles everything. The model learned that CID produces b and y fragment ions while ECD generates c and z ions - without anyone explicitly programming those rules (Levin et al., 2026).

The Numbers That Made Reviewers Nod

The unified Prosit model was plugged into FragPipe's MSBooster module, a widely used computational pipeline for proteomics data analysis (Yang et al., 2023). The results: protein identifications jumped by more than 10% on average across both data-dependent and data-independent acquisition modes - and across all fragmentation techniques tested. That's not a marginal gain. In a field where a 5% bump makes people write grant applications, 10% is the kind of improvement that changes what experiments are worth running.

More importantly, the alternative fragmentation methods - particularly electron-based and UVPD approaches - achieved identification rates competitive with CID while delivering superior sequence coverage. The underdog techniques aren't underdogs anymore. They're viable alternatives that happen to preserve the very modifications CID keeps destroying.

Why This Matters Beyond the Lab

This work isn't just a technical flex. It establishes a framework for making advanced fragmentation routine. If you're studying phosphorylation in cancer signaling or glycosylation in immune response, you no longer need to choose between identification power and modification preservation. The deep learning model is publicly available, and its integration into FragPipe means any lab running standard proteomics pipelines can start using it tomorrow.

For anyone trying to map the tangled web of protein interactions and modifications that drive biology - something that tools like mapb2.io help visualize in other contexts - this is the computational equivalent of upgrading from a magnifying glass to a microscope. Same proteins, dramatically more information.

The proteomics community spent years perfecting CID. Turns out they also needed a neural network to make the alternatives worth the trouble. Sometimes progress isn't about inventing a new method. It's about teaching a computer to understand all the methods you already had.

References:

Levin, N., Saylan, C.C., Lapin, J., et al. (2026). Integration of alternative fragmentation techniques into standard LC-MS workflows using a single deep learning model enhances proteome coverage. Nature Methods. DOI: 10.1038/s41592-026-03042-9
Gessulat, S., Schmidt, T., Zolg, D.P., et al. (2019). Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nature Methods, 16, 509 - 518. DOI: 10.1038/s41592-019-0426-7
Yang, K.L., Yu, F., Teo, G.C., et al. (2023). MSBooster: improving peptide identification rates using deep learning-based features. Nature Communications, 14, 4539. DOI: 10.1038/s41467-023-40129-9
Brodbelt, J.S. (2014). Ultraviolet Photodissociation Mass Spectrometry for Analysis of Biological Molecules. Chemical Reviews, 120(7), 4328 - 4380. DOI: 10.1021/acs.chemrev.9b00440
Wells, J.M. & McLuckey, S.A. (2005). Collision-induced dissociation (CID) of peptides and proteins. Methods in Enzymology, 402, 148 - 185. PMID: 16401509
Guo, T., et al. (2025). Mass-spectrometry-based proteomics: from single cells to clinical applications. Nature. DOI: 10.1038/s41586-025-08584-0

Disclaimer: This blog post is a simplified summary of published research for educational purposes. The accompanying illustration is artistic and does not depict actual model architectures, data, or experimental results. Always refer to the original paper for technical details.

AIb2.io - AI Research Decoded

A Billion Proteins Walk Into a Mass Spec...

The Fragmentation Problem Nobody Talks About

One Model to Fragment Them All

The Numbers That Made Reviewers Nod

Why This Matters Beyond the Lab