Cells are weirdly organized for blobs of chemistry. Your DNA lives in the nucleus. Plenty of proteins need to get in there, do a job, then maybe leave again. They do that with tiny sequence motifs called nuclear localization signals, or NLSs, and nuclear export signals, or NESs. Think of them as molecular boarding passes, except the barcode is made of amino acids and the gate agent is an importin or exportin with no patience for nonsense Yang et al., 2023.
The problem is that these signals are short, messy, and not especially unique. A lot of proteins contain lookalike motifs that seem like they should send cargo into or out of the nucleus, but actually do nothing. Sequence-only prediction tools have been dealing with that by, well, guessing harder. Which is how you end up with false positives breeding like rabbits in a spreadsheet.
That is where SPSignal comes in. Engler, Abriata, and Bologna built a web tool that does not just ask, "Does this sequence kinda look like an NLS or NES?" It also asks, "Is that motif actually exposed in the protein structure, or is it buried like the TV remote in couch cushions?" That extra structural context turns out to matter a lot Engler et al., 2026.
Sequence alone is not enough, and honestly that makes sense
SPSignal combines classic sequence-based predictors with structural features such as solvent accessibility, intrinsic disorder, and broader 3D context. In plain English: it checks whether the candidate signal is physically available to be recognized by the transport machinery. A motif hidden inside a folded protein is a lot less believable than one sticking out where the cell can actually grab it.
The authors use curated sets of experimentally validated NLS and NES examples, then rank candidates with an interpretable machine-learning approach based on RuleFit. That last part matters. You do not want a black box here cheerfully declaring, "Trust me, bro." You want a model that can show why it thinks one motif is plausible and another is just a sequence coincidence wearing fake glasses. SPSignal also visualizes predicted signals in sequence and 3D structure, which is handy if you are the sort of person who likes your bioinformatics with receipts.
In their reported case studies, SPSignal improved accuracy by reducing false positives without losing sensitivity, and the authors tested it on 31 proteins outside the model development sets Engler et al., 2026. That is the whole sales pitch right there: fewer bogus hits, same ability to catch the real thing.
Why this is showing up now
This paper lands in a larger shift in computational biology. Protein localization prediction has been getting better fast, thanks to protein language models and much better structural coverage.
A good example is DeepLoc 2.0, which uses a pretrained protein language model to predict multi-label subcellular localization and sorting signals, while also offering some interpretability through attention maps Almagro Armenteros et al., 2022. More recently, broader reviews have mapped out how AI methods are taking over this area, including sequence models, hybrid approaches, and structure-aware systems Shatnawi et al., 2024; Mekhalfi et al., 2024.
And then there is the giant elephant in the lab: modern structure prediction. The AlphaFold Protein Structure Database now provides open access to huge numbers of predicted structures, which makes structure-assisted tools like SPSignal much more practical than they would have been a few years ago Varadi et al., 2025. Basically, biology got handed a planet-sized pile of 3D protein models, and researchers are finally cashing that check.
Why you should care, even if you are not annotating proteins for fun
Protein mislocalization shows up in cancer, neurodegeneration, and infection biology. Nuclear transport itself is deeply tied to disease, and drugs that target export machinery already exist, including XPO1 inhibitors used in cancer treatment Yang et al., 2023. So better signal prediction is not just a nice cleanup step for databases. It can affect how researchers interpret mutations, design experiments, and prioritize therapeutic targets.
There is also a broader AI angle here. Newer models such as ProtGPS and PUPS are pushing localization prediction toward richer settings, from condensates to single-cell context Kilgore et al., 2025; Zhang et al., 2025. SPSignal is less flashy than "AI predicts protein behavior in single cells," sure. But flashy is overrated. Sometimes the grown-up move is reducing false positives in a problem biologists have been side-eyeing for years.
And honestly? That is the charm of this paper. It does not try to replace biology with vibes. It just says: maybe if a signal is supposed to be recognized by another molecule, we should check whether that signal is physically visible first. Bold concept. Wild that it works. Right?
References
Engler C, Abriata LA, Bologna NG. SPSignal: a web tool for structure-assisted prediction of nuclear localization and nuclear export signals in proteins. Nucleic Acids Research. 2026. DOI: 10.1093/nar/gkag421. PubMed: 42109171
Almagro Armenteros JJ, Sønderby CK, Sønderby SK, et al. DeepLoc 2.0: multi-label subcellular localization prediction using protein language models. Nucleic Acids Research. 2022;50(W1):W228-W234. DOI: 10.1093/nar/gkac278
Yang Y, Guo L, Chen L, et al. Nuclear transport proteins: structure, function and disease relevance. Signal Transduction and Targeted Therapy. 2023;8:425. DOI: 10.1038/s41392-023-01649-4
Shatnawi M, Alwosaibai K, Alafnan H, et al. A Review for Artificial Intelligence Based Protein Subcellular Localization. Biomolecules. 2024;14(4):409. DOI: 10.3390/biom14040409
Mekhalfi ML, Memon D, Brás V, et al. Protein subcellular localization prediction tools. Computational and Structural Biotechnology Journal. 2024. DOI: 10.1016/j.csbj.2024.04.032
Kilgore HR, Chinn AM, Mikhael PG, et al. Protein codes promote selective subcellular compartmentalization. Science. 2025;387(6738):1095-1101. DOI: 10.1126/science.adq2634. PMCID: PMC12034300
Zhang X, Tseo Y, Zhou A, et al. Prediction of protein subcellular localization in single cells. Nature Methods. 2025;22:1265-1275. DOI: 10.1038/s41592-025-02696-1
Varadi M, Anyango S, Deshpande M, et al. AlphaFold Protein Structure Database 2025: a redesigned interface and updated structural coverage. Nucleic Acids Research. 2025. Available at: https://academic.oup.com/nar/article/54/D1/D358/8340156
Disclaimer: This blog post is a simplified summary of published research for educational purposes. The accompanying illustration is artistic and does not depict actual model architectures, data, or experimental results. Always refer to the original paper for technical details.