AI Just Figured Out Which of Your 20,000 Genes to Actually Aim a Drug At - and Big Pharma Noticed

Every drug you've ever taken works by hitting one of just 716 molecular targets - out of roughly 20,000 possible protein-coding genes in your body.

Let that sink in. The entire pharmaceutical industry, with its trillion-dollar valuations and century of accumulated wisdom, has only managed to successfully drug about 3.6% of the human proteome. Suspicious? I think so. But here's where it gets interesting: a new review in Nature Reviews Drug Discovery - published, I should note, on the same day Insilico Medicine dropped a press release about their $2.75 billion Eli Lilly deal - lays out exactly how AI is about to blow the doors off target discovery (Pun et al., 2026).

The Target Problem Nobody Talks About at Parties

Drug discovery has a dirty secret. About 90% of drug candidates fail in clinical trials, and the number one reason - accounting for 40-50% of those failures - is picking the wrong target in the first place (Sun et al., 2022). That's not a rounding error. That's the pharmaceutical equivalent of building an entire house before checking if you bought the right lot.

Traditionally, finding a drug target meant years of painstaking biology: knockout mice, genome-wide association studies, and a whole lot of educated guessing. Scientists would identify a protein that seemed involved in a disease, spend a decade proving it mattered, then discover in Phase II trials that - whoops - it didn't matter enough. The process could take, and I quote the review, "months to decades." That's not a timeline. That's a hostage situation.

Follow the Data (All 20,000 Genes of It)

The review catalogs how AI is rewriting these rules, and the receipts are impressive. About 4,500 of our genes are considered "druggable" - meaning their protein products have the right shape and chemistry to interact with a drug molecule. That leaves roughly 3,784 untouched druggable targets just sitting there, waiting for someone - or something - to connect the dots.

Enter machine learning. The paper walks through a growing arsenal of AI tools: graph neural networks that map protein-protein interaction networks like conspiracy boards (red string included), transformer models like Geneformer and scGPT that read single-cell transcriptomics data the way GPT reads the internet, and large language models like BioGPT that have inhaled so much biomedical literature they could probably pass a pharmacology exam.

Platforms like PandaOmics and TargetPro don't just crunch numbers. They integrate multi-omics data - genomics, transcriptomics, proteomics, epigenomics - to score and rank potential targets with a speed that would make a human bioinformatician weep into their coffee. If you've ever tried to organize complex, interconnected information, you know the pain. (Tools like mapb2.io exist for exactly that kind of visual thinking, though admittedly with fewer billion-parameter models involved.)

Exhibit A: The 18-Month Drug

Here's where the conspiracy theory becomes a success story. The review highlights rentosertib (ISM001-055), an AI-discovered inhibitor of TNIK - a target identified using PandaOmics for idiopathic pulmonary fibrosis. The timeline from project start to preclinical candidate? Roughly 18 months. For context, the industry average is 4-6 years just for target identification and validation.

The Phase IIa trial results, published in Nature Medicine, showed patients on the 60 mg dose gained a mean of +98.4 mL in lung capacity versus a decline of -20.3 mL on placebo (Zhavoronkov et al., 2025). That's not a subtle signal. That's a lung function improvement you can actually measure with a spirometer and a straight face.

The AlphaFold Factor

No review of AI in drug targeting would be complete without mentioning AlphaFold, the protein structure prediction model that won its creators a Nobel Prize and gave structural biologists simultaneous feelings of awe and existential dread. AlphaFold3, released in 2024, now predicts complexes involving proteins, DNA, RNA, and small molecules - making previously "undruggable" targets suddenly look a lot more targetable (Abramson et al., 2024).

Combine structure prediction with network-based target identification, and you start to see why Eli Lilly wrote Insilico Medicine a check with a lot of zeros on it.

The Catch (Because There's Always a Catch)

The review is refreshingly honest about limitations. AI models are only as good as their training data, and biological datasets are noisy, incomplete, and riddled with batch effects that would make a statistician lose sleep. Correlation-based approaches can suggest targets that look great computationally but flop in wet-lab validation. And the "black box" nature of deep learning models means that when an AI says "target this kinase," explaining why to a regulatory agency remains, let's say, an evolving conversation.

There's also the uncomfortable truth that a target is only truly validated when a drug based on it gets regulatory approval. Everything before that? Educated optimism.

What Happens Next

The paper envisions AI-driven "closed-loop platforms" where computational target nomination feeds directly into automated experiments, which feed results back into the model. It's the scientific method on fast-forward - hypothesis, test, refine, repeat - except the hypothesis-generating machine never sleeps, never gets bored, and processes more data in an afternoon than a research team could read in a lifetime.

Whether this means we'll see drugs for the remaining 3,784 druggable targets in our lifetime is anyone's guess. But if you asked me who's connecting the dots between 20,000 genes, millions of bioactivity measurements, and decades of clinical literature faster than any human ever could - well, I think you already know the answer. And they're not even trying to hide it anymore.

References

Pun, F.W., Podolskiy, D., Izumchenko, E., et al. (2026). Target identification and assessment in the era of AI. Nature Reviews Drug Discovery. DOI: 10.1038/s41573-026-01412-8
Zhavoronkov, A., et al. (2025). A generative AI-discovered TNIK inhibitor for idiopathic pulmonary fibrosis: a randomized phase 2a trial. Nature Medicine. DOI: 10.1038/s41591-025-03743-2
Sun, D., Gao, W., Hu, H., & Zhou, S. (2022). Why 90% of clinical drug development fails and how to improve it? Acta Pharmaceutica Sinica B, 12(7), 3049-3062. DOI: 10.1016/j.apsb.2022.02.002
Abramson, J., Adler, J., Dunger, J., et al. (2024). Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, 630, 493-500. DOI: 10.1038/s41586-024-07487-w
Finan, C., Gaulton, A., Kruger, F.A., et al. (2017). The druggable genome and support for target identification and validation in drug development. Science Translational Medicine, 9(383). DOI: 10.1126/scitranslmed.aag1166

Disclaimer: This blog post is a simplified summary of published research for educational purposes. The accompanying illustration is artistic and does not depict actual model architectures, data, or experimental results. Always refer to the original paper for technical details.

AIb2.io - AI Research Decoded