AIb2.io - AI Research Decoded

Hidden Pockets: How CryptoBank Maps the Secret Doors on "Undruggable" Proteins

Guess what percentage of human proteins have drug-friendly binding pockets that show up in a standard crystal structure. If you said "most of them," congratulations - you're wrong, and that wrong answer has haunted pharmaceutical companies for decades. Roughly 85% of the human proteome has been labeled "undruggable," meaning there's no obvious groove, cavity, or chemical handshake spot where a small molecule can latch on and do something useful. But what if those pockets exist - they're just hiding?

The Protein Equivalent of a Hidden Room Behind a Bookcase

Cryptic binding sites are exactly what they sound like: pockets on proteins that don't appear in the protein's resting state but pop open when the right molecule comes knocking. Think of it like a door that only materializes when you push on the correct brick in the wall. The protein sits there looking smooth and uncooperative in its "apo" (unbound) form, but introduce a ligand and suddenly - there's a pocket. Surprise.

This isn't theoretical hand-waving. The most celebrated example is KRAS G12C, an oncogene mutation that drives lung cancers and was written off as undruggable for three decades. Then in 2013, researchers found a cryptic switch-II pocket that only opens in a specific protein conformation. That discovery led directly to sotorasib (Lumakras), FDA-approved in 2021 as the first-ever direct KRAS inhibitor (Ostrem et al., 2013). One hidden pocket, one completely new class of cancer drugs.

Hidden Pockets: How CryptoBank Maps the Secret Doors on

The problem? Finding these pockets is brutally hard. You can't just look at a crystal structure and circle them with a red pen, because they literally aren't there yet.

Enter CryptoBank: Six Million Structural Comparisons Walk Into a Database

A team led by Francesco Gervasio at the University of Geneva decided to solve the data problem head-on. Their new resource, CryptoBank, systematically compared over 6 million pairs of unbound and bound protein structures from the Protein Data Bank, using a machine learning classifier to flag cases where a ligand induced a conformational change that revealed a hidden pocket (Febrer Martinez et al., 2026).

The headline number: cryptic pockets showed up in roughly 18% of protein clusters. That's not a rounding error - that's nearly one in five protein families harboring secret compartments that traditional structure-based drug design would completely miss.

Notice how this flips the script on "undruggable." It's not that these proteins lack binding sites. It's that our snapshots were taken at the wrong moment, like photographing a blink and concluding someone has no eyes.

Teaching a Language Model to Read Between the Amino Acids

Here's where it gets clever. The team didn't stop at building a database. They fine-tuned a protein language model - think of it as GPT for amino acid sequences - to predict whether a given protein residue sits at a cryptic site, using nothing but the raw sequence. No 3D structure required.

Protein language models have been tearing through structural biology lately, learning patterns from millions of evolutionary sequences the way large language models absorb text (Elnaggar et al., 2022). CryptoBank's PLM achieved strong precision when query sequences shared more than 20% identity with database entries, and the team validated its predictions on four proteins with low sequence similarity - then confirmed the pocket openings with molecular dynamics simulations.

This matters because sequence data is cheap and abundant. Crystal structures are expensive and incomplete. A model that can flag "hey, residues 142-158 probably hide a cryptic pocket" from sequence alone is the kind of scalable tool drug hunters actually need.

Previous efforts like CryptoSite and PocketMiner (Meller et al., 2023) pushed the field forward, but they were limited by training data. CryptoBank's contribution is brute-force elegant: by mining millions of structural alignments, it generates a dataset an order of magnitude larger than what existed before. A recent review in Bioinformatics Advances specifically highlighted training data scarcity as the central bottleneck for ML-based cryptic site prediction (Gašparíková et al., 2025). CryptoBank directly attacks that bottleneck.

The web server at cryptobankdb.com lets anyone query proteins and get cryptic site predictions, which is the kind of open-access move that actually accelerates science rather than just generating citations.

The Catch (Because There's Always a Catch)

Predicting a cryptic pocket exists is step one. Designing a drug that exploits it is step twelve, and steps two through eleven involve confirming the pocket opens under physiological conditions, figuring out what kind of molecule fits, and surviving the gauntlet of medicinal chemistry optimization. ML predictions can also produce false positives - pockets that look real computationally but never open in practice.

Still, if you're working on a protein that refuses to play nice with conventional drug design, knowing where to look for hidden pockets is worth its weight in GPU hours. And if you're the type who likes to visualize complex relationships between protein families, binding sites, and druggability data, tools like mapb2.io can help you map out those connections spatially before diving into the structural details.

The Bottom Line

CryptoBank suggests that the druggable proteome might be significantly larger than we thought - we just needed better tools to see the doors we were walking past. Between expanded databases, protein language models, and validation through molecular dynamics, the infrastructure for finding and exploiting cryptic sites is maturing fast. The next sotorasib might already be hiding in a pocket nobody has looked at yet.

References

  1. Febrer Martinez, P., Fröhlking, T., Borsatto, A., & Gervasio, F. L. (2026). CryptoBank: A resource for the identification and prediction of cryptic sites in proteins. Science Advances, 12(17), eady6364. DOI: 10.1126/sciadv.ady6364 | PubMed: 42018639

  2. Meller, A., Ward, M., Borowsky, J., et al. (2023). Predicting locations of cryptic pockets from single protein structures using the PocketMiner graph neural network. Nature Communications, 14, 1177. DOI: 10.1038/s41467-023-36699-3

  3. Gašparíková, D., Chikhale, R., Cole, J., & Pohl, E. (2025). Recent computational advances in the identification of cryptic binding sites for drug discovery. Bioinformatics Advances, 5(1), vbaf156. DOI: 10.1093/bioadv/vbaf156

  4. Bemelmans, S. & Bhatt, J. S. (2024). Computational advances in discovering cryptic pockets for drug discovery. Current Opinion in Structural Biology, 89, 102921. DOI: 10.1016/j.sbi.2024.102921

  5. Elnaggar, A., Heinzinger, M., Dallago, C., et al. (2022). ProtTrans: Toward understanding the language of life through self-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10), 7112-7127. DOI: 10.1109/TPAMI.2021.3095381

Disclaimer: This blog post is a simplified summary of published research for educational purposes. The accompanying illustration is artistic and does not depict actual model architectures, data, or experimental results. Always refer to the original paper for technical details.