AIb2.io - AI Research Decoded

Simulation-Based Inference Captures Non-Markovian Effects in Protein Production Kinetics Through Cell Division

A routine Tuesday in a computational biology lab: someone feeds a neural network millions of fake cells dividing, and the network quietly figures out something that decades of equations couldn't solve.

The Problem With Hand-Me-Down Proteins

Cells divide. This is not news. But here's the wrinkle that keeps biophysicists up at night: when a cell splits in two, the daughter cells don't start from scratch. They inherit proteins from mom. So when you shine a light on a cell and measure how much fluorescent protein it has - a standard trick for figuring out which genes are switched on - you're not just seeing what that cell made. You're seeing leftovers from its mother, its grandmother, and possibly its great-grandmother's Sunday dinner.

Simulation-Based Inference Captures Non-Markovian Effects in Protein Production Kinetics Through Cell Division
Simulation-Based Inference Captures Non-Markovian Effects in Protein Production Kinetics Through Cell Division

This makes the math ugly. Standard approaches to modeling gene expression rely on something called the chemical master equation, which assumes the system has no memory - whatever happens next depends only on what's happening right now, not what happened three cell divisions ago. Mathematicians call this the Markov property, and it's the bedrock of most stochastic modeling in biology. Cell division, with its protein hand-me-downs and non-exponential timing, smashes that assumption like a dropped petri dish.

Pedro Pessoa and colleagues at Arizona State University, in collaboration with the University of Liège, decided to tackle this head-on. Their weapon of choice: simulation-based inference, powered by conditional normalizing flows - a class of neural network that learns entire probability distributions rather than just point estimates (Pessoa et al., 2025).

Teaching Neural Networks to Think Like Cells

The core insight is beautifully pragmatic. If you can't write down the math for how protein levels distribute across a population of dividing cells - and you really can't, because the non-Markovian dynamics make the likelihood function intractable - then just simulate a ton of cells dividing and let a neural network learn the patterns.

The team built a stochastic simulator that explicitly models cell division timing, protein partitioning between daughter cells, and gene switching between active and inactive states. They then trained conditional normalizing flows on the simulated data. These networks learn to map simple distributions (think: a boring bell curve) into the complex, multi-modal distributions that real cellular populations produce. Once trained, the network can take real experimental data and work backward to infer the underlying kinetic parameters - production rates, degradation rates, gene switching rates - that generated those observations.

It's Bayesian inference without ever writing down Bayes' theorem in closed form. The simulator is the model. The neural network is the likelihood. If that sounds like cheating, well, it kind of is - the good kind, where you let computers do what computers are good at instead of forcing elegant but wrong analytical solutions.

Yeast, Glycogen, and a Plot Twist

For their case study, the team looked at the glc3 gene in Saccharomyces cerevisiae (baker's yeast, for those who prefer their organisms pronounceable). This gene kicks in during nutrient stress, helping cells stockpile glycogen for lean times. They tracked its activity using GFP fluorescence via flow cytometry - essentially photographing millions of individual cells at a single moment.

Here's where it gets interesting. A naive analysis of the fluorescence data - one that ignores protein inheritance - would suggest that many cells are sitting at low expression levels, potentially indicating widespread gene activation at a low hum. But once the team properly accounted for the division history and inherited fluorescent proteins, a different picture emerged: the glc3 gene is actually mostly off under stress. When it does switch on, the activation is brief and transient, like a metabolic fire alarm that rings once and goes silent.

That's a meaningful biological distinction. "Lots of cells making a little protein" and "a few cells briefly making protein, with everyone else coasting on inherited leftovers" look identical in raw fluorescence data but imply completely different regulatory strategies. Getting this wrong could send you down the wrong path when designing experiments or therapeutic interventions.

Why This Matters Beyond Yeast

Simulation-based inference isn't new - physicists and cosmologists have been using it to fit models of the universe for years (Cranmer et al., 2020). But its application to non-Markovian biological systems is a frontier worth watching. The approach by Pessoa et al. is flexible enough to handle any system where you can write a simulator but can't write a likelihood - which, honestly, describes most of biology.

Previous work by Gorin et al. used neural networks to approximate non-Markovian gene expression distributions with simpler Markovian stand-ins (Gorin et al., 2021, Nature Communications). Others have applied AI-powered simulation-based inference to spatial-stochastic models of embryogenesis (Szep et al., 2024, PLOS Computational Biology). What makes this new work stand out is its direct confrontation with the protein inheritance problem - a source of systematic error that most gene expression studies quietly sweep under the rug.

If you like mapping out how different modeling approaches connect and where this fits in the broader inference landscape, visual tools like mapb2.io can help you diagram the relationships between Markovian, non-Markovian, and simulation-based frameworks in a way that actually makes sense.

The Bigger Picture

The researchers have effectively shown that ignoring cell division history doesn't just add noise to your measurements - it can flip your biological conclusions entirely. And they've handed the field a general-purpose toolkit for dealing with it: simulate what you can, train a neural network on the simulations, and let the network do the inference.

It's the kind of methodological advance that doesn't make headlines but quietly changes how an entire field does its homework. The next time someone measures fluorescence in dividing cells and draws conclusions about gene regulation, they'll need a better answer than "we assumed the math was Markovian" - because now there's a practical alternative that doesn't require that assumption at all.

References

  1. Pessoa, P., Martinez, J.A., Vandenbroucke, V., Delvigne, F., & Pressé, S. (2025). Simulation-based inference captures non-Markovian effects as exemplified in protein production kinetics through cell division. Proceedings of the National Academy of Sciences, 122. DOI: 10.1073/pnas.2517309123. PMID: 41950084

  2. Cranmer, K., Brehmer, J., & Louppe, G. (2020). The frontier of simulation-based inference. Proceedings of the National Academy of Sciences, 117(48), 30055-30062. DOI: 10.1073/pnas.1912789117

  3. Gorin, G., Vastola, J.J., Fang, M., & Pachter, L. (2021). Neural network aided approximation and parameter inference of non-Markovian models of gene expression. Nature Communications, 12, 2618. DOI: 10.1038/s41467-021-22919-1. PMCID: PMC8113478

  4. Szep, G., et al. (2024). AI-powered simulation-based inference of a genuinely spatial-stochastic gene regulation model of early mouse embryogenesis. PLOS Computational Biology, 20(11), e1012473. DOI: 10.1371/journal.pcbi.1012473. PMCID: PMC11614244

  5. Papamakarios, G., Nalisnick, E., Rezende, D.J., Mohamed, S., & Lakshminarayanan, B. (2021). Normalizing Flows for Probabilistic Modeling and Inference. Journal of Machine Learning Research, 22(57), 1-64. arXiv: 1912.02762

Disclaimer: This blog post is a simplified summary of published research for educational purposes. The accompanying illustration is artistic and does not depict actual model architectures, data, or experimental results. Always refer to the original paper for technical details.