Let us admit, right up front, that "candidate therapeutic targets with Geneformer" sounds like the sort of phrase that makes normal humans back slowly toward the snack table. And yet, friends, behind that gloriously niche title lurks a surprisingly juicy idea: what if you could take a giant AI model, feed it millions of single-cell gene-expression snapshots, and then ask it which genes might be worth nudging to push diseased cells back toward normal? That is the wager behind the new Nature Protocols paper from Zhang, Venkatesh, and Theodoris on using Geneformer for target discovery [1].
The Big Trick: Treat Cells Like Sentences
Geneformer borrows a move from language models. In ordinary AI, a transformer learns relationships between words by reading absurd amounts of text. In Geneformer, the "words" are genes, and the "sentences" are single-cell transcriptomes - those readouts of which genes are active inside individual cells [2]. Same basic energy, different costume. Instead of predicting what word comes next, the model learns which genes belong where in the context of a cell.
That matters because disease is rarely one gene wearing a fake mustache. It is usually a network problem. Genes regulate other genes, cell states drift, and biology becomes the kind of spaghetti diagram that makes you question your life choices. Traditional network mapping often needs a lot of task-specific data. Geneformer tries to cheat, legally, by pretraining on a huge general corpus first, then adapting to small, specialized datasets later [2].
The new protocol walks researchers through that pipeline: tokenize raw expression counts into a rank-based encoding, test whether disease-relevant phenotypes already separate in the pretrained embedding space, fine-tune on the task you care about, and then run in silico perturbation. That last part is the fun bit. You virtually "repress" or "activate" a gene and see whether the cell's embedding shifts toward a healthier state [1]. It is basically a rehearsal dinner for wet-lab experiments.
Zero-Shot, Fine-Tuning, and Other Spells From the AI Grimoire
The paper is useful because it turns a flashy concept into an actual recipe. Not "here be dragons," but "here are the ingredients, the GPU expectations, and the scoring metrics." The authors describe zero-shot inference, single-task and multitask fine-tuning, and perturbation ranking, all with practical evaluation tools like confusion matrices and macro F1 scores [1].
In plain English, zero-shot means the model gets asked to do something before you custom-train it for that exact job. Fine-tuning means you take the big pretrained brain and give it a brief, intense cram session on your specific disease or cell type. The hope is that the model already knows enough general biology to make useful judgments from limited new data. Like hiring a detective who has seen every crime drama ever made and now just needs the street address.
And unlike some computational biology methods that seem designed to require a monastery, a grant renewal, and three emotionally exhausted postdocs, this protocol says the full pipeline can usually run in under two days on a standard GPU workstation with moderate Python experience [1]. That is not exactly "download app, press magic button," but it is a lot closer than many methods in this neighborhood.
Why People Are Paying Attention
Geneformer already had a strong origin story. The 2023 Nature paper showed that transfer learning on roughly 30 million single-cell transcriptomes could improve predictions across multiple network-biology tasks and identify candidate therapeutic targets in cardiomyopathy [2]. Since then, the ecosystem has grown. A March 2026 follow-up reported that larger Geneformer models and 4-bit quantization preserved performance while slashing compute costs, which is catnip for any lab whose budget is held together by coffee and optimism [3].
The broader field is also heating up. CellFM scaled to about 100 million human cells and 800 million parameters, claiming strong performance across annotation, perturbation prediction, and gene-function tasks [4]. A 2025 ACL survey maps out this whole fast-growing zoo of single-cell foundation models, from Geneformer to scGPT and beyond [5].
So yes, the field has momentum. There is code on Hugging Face, public documentation, and open model weights for Geneformer if you want to inspect the machinery yourself [7]. If you need to sketch the gene-network tangle without drawing on a napkin like a caffeinated conspiracy theorist, something like mapb2.io fits naturally into that workflow.
Plot Twist: The Hype Needs a Chaperone
Now for the trumpet of caution. Some recent benchmarking work has been less starry-eyed. A 2025 Genome Biology study found that Geneformer and scGPT did not consistently beat simpler baselines in zero-shot settings, especially outside favorable contexts [6]. Another 2025 benchmark argued that performance depends heavily on task choice, evaluation design, and data leakage control, with classical methods still competitive in plenty of cases [8].
That does not kill the promise. It just means we should stop acting like every large model arrives carrying tablets from the mountain. Geneformer may be genuinely useful, especially for low-data target prioritization, but its best hits still need experimental follow-up. In silico perturbation is not the same as a verified therapy. It is a short list, not a miracle.
Which, frankly, is still a big deal. In drug discovery, getting from "we have no idea" to "here are the five genes worth testing first" can save months or years. That is the real charm of this protocol. It does not promise a robot biologist in a lab coat. It promises a sharper flashlight.
References
- Zhang Y, Venkatesh MS, Theodoris CV. Discovery of candidate therapeutic targets with Geneformer. Nature Protocols (2026). DOI: https://doi.org/10.1038/s41596-026-01364-8
- Theodoris CV, Xiao L, Chopra A, et al. Transfer learning enables predictions in network biology. Nature 618, 616-624 (2023). DOI: https://doi.org/10.1038/s41586-023-06139-9. PMCID: https://pmc.ncbi.nlm.nih.gov/articles/PMC10949956/
- Chen H, Venkatesh MS, Gomez Ortega J, et al. Scaling and quantization of large-scale foundation model enables resource-efficient predictions in network biology. Nature Computational Science (2026). DOI: https://doi.org/10.1038/s43588-026-00972-4
- Zeng Y, et al. CellFM: a large-scale foundation model pre-trained on transcriptomics of 100 million human cells. Nature Communications (2025). DOI: https://doi.org/10.1038/s41467-025-59926-5
- Zhang F, Chen H, Zhu Z, et al. A Survey on Foundation Language Models for Single-cell Biology. ACL 2025, pp. 528-549. DOI: https://doi.org/10.18653/v1/2025.acl-long.26
- Kedzierska KZ, Crawford L, Amini AP, Lu AX. Zero-shot evaluation reveals limitations of single-cell foundation models. Genome Biology (2025). DOI: https://doi.org/10.1186/s13059-025-03574-x
- Geneformer model hub and documentation: https://huggingface.co/ctheodoris/Geneformer
- Wu J, Ye Q, Wang Y, et al. Biology-driven insights into the power of single-cell foundation models. Genome Biology 26, 334 (2025). DOI: https://doi.org/10.1186/s13059-025-03781-6. PMCID: https://pmc.ncbi.nlm.nih.gov/articles/PMC12492631/
Disclaimer: This blog post is a simplified summary of published research for educational purposes. The accompanying illustration is artistic and does not depict actual model architectures, data, or experimental results. Always refer to the original paper for technical details.