50,688 Reactions Later, Chemistry’s AI Still Wants More Receipts

Back in 2018, Ahneman, Doyle, Dreher, Lin, and Estrada showed that machine learning could predict C-N cross-coupling performance from high-throughput data, which felt like handing a chemist a crystal ball - until everyone noticed the crystal ball worked best inside the room where it was trained.

The new JACS paper from Das, Zhang, Tan, Abdelalim, Lu, Mauro, Regan, and Cernak takes that earlier idea and asks the rude but necessary follow-up: what happens when you stop feeding the model beautifully isolated little case studies and instead give it 50,688 systematically varied reactions across palladium, nickel, and copper?

Answer: you learn a lot. Also, the universe immediately files a complaint.

50,688 Reactions Later, Chemistry’s AI Still Wants More Receipts

The Problem: Reaction Data Is Usually a Junk Drawer

C-N coupling reactions matter because carbon-nitrogen bonds show up everywhere in medicines, materials, and agrochemicals. The Buchwald-Hartwig amination, the celebrity version of this chemistry, uses palladium catalysts to stitch amines onto aryl halides. It is the kind of reaction that medicinal chemists reach for constantly, like coffee or denial.

But predicting which recipe will work is still painfully hard. Change the metal, ligand, base, solvent, temperature, or substrate, and the reaction may go from “beautiful yield” to “expensive beige sadness.” Machine learning can help, but only if the training data are useful.

That is the catch. Most chemical reaction data come from papers where researchers report the good recipe and quietly leave the graveyard of failed experiments offstage. ML models trained on that literature can become very confident about very incomplete stories, which is basically LinkedIn for molecules.

What This Team Actually Built

The Cernak group built a dataset of more than 50,000 C-N coupling reactions designed for comparison, not just accumulation. They varied reaction components systematically and maximized overlap across palladium, nickel, and copper conditions. That matters because the literature usually treats those metals like separate kingdoms with suspicious border policies.

According to coverage in C&EN, the team screened combinations involving dozens of metal catalysts, more than 160 ligands, multiple bases, solvents, and temperatures, then analyzed the outcomes with high-throughput liquid chromatography and mass spectrometry. The dataset is available through the Open Reaction Database, which means other researchers can benchmark models instead of simply waving at proprietary spreadsheets from across a conference room.

The headline result is not just “big dataset big good.” The more useful finding is that some ligands performed broadly across all three metals. That is chemistry gold: if a ligand works with palladium, nickel, and copper, it may help chemists swap away from precious or supply-chain-fragile metals without starting from zero.

Sure, “general ligand” sounds like a tiny military rank, but in synthesis it can mean fewer dead-end screens and faster routes to useful molecules.

The Sneaky Mechanism Plot Twist

The paper also reports metal-free control reactions. This is where the eyebrow goes up.

Under some nominally similar conditions, reactions appeared to proceed through different mechanisms, including an aryne-based pathway. Arynes are highly reactive aromatic intermediates, the chemical equivalent of a skateboard with no brakes. They are known in organic chemistry, but seeing evidence that they may be more common in these C-N coupling contexts than expected is a useful warning: the reaction you think you are studying may be running a side hustle.

That matters for machine learning because many models quietly assume one reaction family means one underlying mechanism. This dataset says: adorable. Real chemistry may have multiple mechanisms competing under similar conditions. A model that misses that could still score well on familiar data while failing when asked to generalize.

In other words, 95% accuracy sounds great until the other 5% is where your scale-up batch lives.

Why AI Chemists Should Care

Recent reviews on ML-guided reaction condition design keep circling the same pain point: models need better, more diverse, more standardized data. High-throughput experimentation and Bayesian optimization are making progress, including 2025 work showing ML-guided workflows can optimize real pharmaceutical reactions in parallel batches.

This JACS dataset fits that trend, but with a different personality. It is less “let us optimize this one reaction as fast as possible” and more “let us build a map where models can be tested on whether they actually understand the territory.” Tools like mapb2.io are handy for sketching complex decision spaces, and honestly, this chemistry could use a subway map: palladium line here, nickel transfer there, mysterious aryne tunnel underneath.

The practical upside is easy to see. If models trained on this kind of controlled data can predict when nickel or copper will substitute for palladium, drug synthesis could become cheaper, more resilient, and less dependent on precious metals. But the authors and outside experts are careful: 50,688 reactions is huge for a chemistry dataset, yet still tiny compared with what modern AI systems often consume. The GPUs may be hungry, but chemistry makes every snack in a tiny vial.

The Fine Print

This paper does not solve reaction prediction. It makes the problem harder to ignore in a useful way.

It shows that systematic counterfactual data - changing one thing while keeping enough else comparable - can expose patterns hidden by traditional scope tables. It also shows that mechanistic diversity is not a footnote. It is the plot.

For AI in chemistry, that is the real lesson: better models will need better data, but also more humility about what the labels mean. A reaction yield is not just a number. It is a tiny crime scene with catalysts, solvents, bases, and possibly an aryne slipping out the back door.

References

Das, J.; Zhang, X.; Tan, Y.; Abdelalim, M.; Lu, T.; Mauro, C.; Regan, C. J.; Cernak, T. “A 50,688-Reaction Data Set Reveals General Ligands and Mechanistic Diversity in C-N Couplings.” Journal of the American Chemical Society (2026). DOI: 10.1021/jacs.6c05959. PubMed: 42308135
Ahneman, D. T.; Estrada, J. G.; Lin, S.; Dreher, S. D.; Doyle, A. G. “Predicting reaction performance in C-N cross-coupling using machine learning.” Science 360, 186-190 (2018). DOI: 10.1126/science.aar5169
Sin, J. W. et al. “Highly parallel optimisation of chemical reactions through automation and machine intelligence.” Nature Communications 16, 6464 (2025). DOI: 10.1038/s41467-025-61803-0
Chen, S.-Y. et al. “Machine learning-guided strategies for reaction conditions design and optimization.” Beilstein Journal of Organic Chemistry 20, 2476-2492 (2024). DOI: 10.3762/bjoc.20.212
For background: Buchwald-Hartwig amination and aryne chemistry.

Disclaimer: This blog post is a simplified summary of published research for educational purposes. The accompanying illustration is artistic and does not depict actual model architectures, data, or experimental results. Always refer to the original paper for technical details.

AIb2.io - AI Research Decoded