AIb2.io - AI Research Decoded

The Cell Simulation Cabal Is Getting Organized

The title, "'Virtual cells' aim to turn raw data into predictive models of biology," sounds like it was assembled in a grant-writing bunker at 2:13 a.m., so let me translate: scientists want computer models that can predict what cells will do before anyone has to poke the actual cells with tiny lab instruments.

Suspiciously reasonable, right?

The Cell Simulation Cabal Is Getting Organized

Michael Eisenstein's Nature feature PMID: 42230831 is about one of biology's most tempting dreams: a virtual cell. Not a cute screensaver blob. Not a Tamagotchi with mitochondria. A real computational stand-in that can take raw biological data - gene expression, perturbation screens, protein measurements, spatial context, maybe the whole messy spreadsheet of life - and forecast how a cell responds when you change something.

Follow the data trail and you see the plot immediately. Biology has spent decades collecting molecular measurements like a person saving every receipt since 1998 "just in case." Now machine learning researchers have shown up asking: what if we trained models on all this and made cells predictable?

The Old Plan: Write Down the Rules

Virtual cells are not new. Wikipedia-level background will tell you that a cellular model is a computational model of some aspect of a biological cell, often built for in silico research. Older tools such as VCell model reactions, diffusion, compartments, stochastic events, and all the little chemical soap operas happening inside living systems.

This old-school approach has a certain beautiful honesty. You write equations. You specify mechanisms. You say, "This molecule bumps into that molecule, rates apply, math happens." It is interpretable, which is science-speak for "when it breaks, at least you can point at the smoking crater."

The catch? Cells are absurd. A cell is not a clean factory diagram. It is more like a nightclub, a city council meeting, and a plumbing emergency sharing one cytoplasm. Karr and colleagues built a landmark whole-cell model of Mycoplasma genitalium back in 2012, but that organism has one of the smallest genomes among free-living bacteria. Humans, because apparently we needed the deluxe chaos package, are much harder.

The New Plan: Let the Machines Read the Receipts

The modern virtual-cell push leans on foundation models. These are the same general idea behind large language models, except instead of predicting the next word, they learn patterns in cells. A cell becomes something like a sentence. Genes become tokens. Expression levels become the weird punctuation.

scGPT, for example, trained a generative transformer on more than 33 million cells and tackled tasks such as cell-type annotation, multi-omic integration, perturbation prediction, and gene-network inference. CellFM went even bigger, using transcriptomics from 100 million human cells. At this point the GPUs are not interns doing math anymore. They are interns doing math while living under the desk and muttering about batch effects.

And yes, the timing is interesting. Virtual-cell hype rises right as single-cell atlases, CRISPR perturbation screens, spatial transcriptomics, and multi-omics datasets all mature enough to become training fuel. Coincidence? I am merely asking questions while standing next to a corkboard covered in red string.

The Part Where Reality Kicks the Door Open

The dream is simple: perturb a gene in the computer, predict the cellular response, and save biologists time, money, and heartbreak. Drug discovery would love this. Disease modeling would love this. Synthetic biology would send it flowers.

But cells are not just bags of gene expression. They live in tissues. They talk to neighbors. They respond to history, location, stress, timing, and whatever molecular nonsense happened five minutes ago. A review in Experimental & Molecular Medicine notes that many single-cell foundation models treat each cell independently, which can ignore spatial organization and cell-cell signaling. That is like predicting a person's behavior from their grocery list while ignoring that they are at a wedding.

Benchmarks are also waving a little caution flag. A 2025 Genome Biology paper found that zero-shot performance for Geneformer and scGPT can be unreliable, with simpler methods sometimes winning. Translation: the giant model wearing the expensive sunglasses does not automatically beat the practical baseline with sensible shoes.

This is why the Virtual Cell Challenge matters. It frames virtual cells less like a press release and more like a testable claim: can a model predict how cells respond to perturbations in contexts it has not seen? The first challenge focuses on predicting single-gene perturbation effects in a held-out cell type. Very "prove it in the parking lot," scientifically speaking.

Why This Is Still Worth Watching

If virtual cells get good, they could change how researchers explore biology. Instead of testing every possible genetic or drug perturbation in wet lab experiments, scientists could use models to rank promising ideas first. The lab becomes less random treasure hunt, more guided heist movie.

The strongest version of this vision is not just predictive but explanatory. A useful virtual cell should not merely say, "Gene X goes up." It should help explain which pathways, networks, and interactions caused the shift. Otherwise we have built a biological fortune cookie: occasionally right, annoyingly vague.

There is also a tooling angle. These models produce tangled maps of genes, cells, perturbations, tissues, and inferred pathways. Visual thinking tools like mapb2.io are not going to solve cellular biology, but they are exactly the kind of thing you reach for when the evidence board starts looking like season three of a conspiracy drama.

The Reveal Behind the Reveal

The big secret is that "virtual cell" is really a negotiation between two scientific cultures. Systems biology wants mechanisms you can inspect. Machine learning wants scale and predictive performance. Biology, sitting in the middle with a lab coat and a migraine, wants both.

Eisenstein's Nature piece captures that tension neatly: simulations could transform biomedical research, but reproducing life's complexity without drowning in data remains very much unsolved. The field is not at "download a cell and run it locally." It is closer to "we found the basement door, and something down there is humming."

That hum is worth listening to.

References

  1. Eisenstein, M. "'Virtual cells' aim to turn raw data into predictive models of biology." Nature (2026). DOI: 10.1038/d41586-026-01731-1. PMID: 42230831

  2. Cui, H. et al. "scGPT: toward building a foundation model for single-cell multi-omics using generative AI." Nature Methods 21, 1470-1480 (2024). DOI: 10.1038/s41592-024-02201-0

  3. Wang, X. et al. "CellFM: a large-scale foundation model pre-trained on transcriptomics of 100 million human cells." Nature Communications 16, 4679 (2025). DOI: 10.1038/s41467-025-59926-5

  4. Rood, J. E., Hupalowska, A. & Regev, A. "Toward a foundation model of causal cell and tissue biology with a Perturbation Cell and Tissue Atlas." Cell 187, 4520-4545 (2024). DOI: 10.1016/j.cell.2024.07.035

  5. Virtual Cell Challenge Consortium. "Virtual Cell Challenge: Toward a Turing test for the virtual cell." Cell 188, 3370-3374 (2025). DOI: 10.1016/j.cell.2025.06.008

  6. Kedzierska, K. et al. "Zero-shot evaluation reveals limitations of single-cell foundation models." Genome Biology 26, 101 (2025). DOI: 10.1186/s13059-025-03574-x

Disclaimer: This blog post is a simplified summary of published research for educational purposes. The accompanying illustration is artistic and does not depict actual model architectures, data, or experimental results. Always refer to the original paper for technical details.