Training Thermodynamic Computers by Gradient Descent

Backpropagation on digital chips just got a pink slip - or at least, a memo suggesting it start updating its resume. A new paper from Lawrence Berkeley National Laboratory shows that physical systems running on nothing but ambient heat can be trained to do machine learning, using the same gradient descent that powers every neural network you've ever heard of. The kicker? It could use ten million times less energy than the digital version.

Wait, What's a Thermodynamic Computer?

Here's the absurdity of modern computing: we spend obscene amounts of energy forcing silicon to behave deterministically - stamping out thermal noise like it's a cockroach at a dinner party - and then we use that deterministic hardware to simulate randomness for probabilistic AI models. It's like air-conditioning a sauna so you can install a space heater.

Training Thermodynamic Computers by Gradient Descent

Thermodynamic computers flip this on its head. Instead of fighting thermal fluctuations, they use them. The natural jiggling of particles at room temperature becomes the computational engine. Your neural network's stochastic sampling? A thermodynamic computer does that for free, courtesy of physics.

Stephen Whitelam at Berkeley Lab's Molecular Foundry just showed how to train one of these things using gradient descent - the same workhorse algorithm behind ChatGPT, Stable Diffusion, and basically every AI model eating through the world's electricity supply right now (Whitelam, 2026).

The Teacher-Student Trick

The training scheme is clever. First, you train a regular neural network on your task the normal way - nothing exotic. This is the "teacher." Then you set up a thermodynamic computer as the "student" and train it to reproduce the teacher's behavior. Specifically, you use gradient descent to tune the student's physical parameters so that its natural thermal dynamics generate trajectories matching the teacher's activations at a specified time.

Once trained, the thermodynamic computer runs the computation on its own, powered by thermal noise. No GPUs grinding away. No megawatt data centers. Just physics doing its thing.

Think of it like training a river to sort packages. You first figure out where the rocks should go (training), then let the water do the work (inference). The training happens digitally, but the execution happens physically, almost for free.

Seven Orders of Magnitude (That's a Lot of Zeros)

The headline number: the thermodynamic implementation is estimated to be 10 million times more energy-efficient than the digital version for an image classification task. To put that in perspective, if running a classifier on a GPU costs you a dollar in electricity, the thermodynamic version costs you a hundredth of a penny. Or think of it this way - it's the difference between a lightbulb and a power plant.

Now, before you get too excited, some fine print. This estimate compares the physical thermodynamic hardware against a digital simulation of that same system. The actual classification accuracy demonstrated was on a standard benchmark (handwritten digits), not ImageNet or anything that would make GPT-4 nervous. And the hardware to run this at scale? It doesn't exist yet.

But the theoretical advantage is real, and it aligns with other results in the field. Whitelam's companion paper on generative thermodynamic computing estimated an even wilder eleven orders of magnitude advantage for image generation tasks (Whitelam, PRL 2026). That's a hundred billion to one.

The Energy Crisis That Made This Matter

This isn't just a neat physics trick. Global data centers are projected to consume over 1,000 TWh of electricity by end of 2026, putting them in the same league as Japan's total electricity consumption. Training a single frontier AI model can burn through enough energy to power San Francisco for three days. The AI industry spent an estimated $580 billion on data center infrastructure in 2025 alone.

Against that backdrop, even modest improvements in compute efficiency matter. A ten-million-fold improvement would be like discovering cold fusion for matrix multiplication. If you've ever thought about the environmental footprint of tools like combb2.io that run AI image enhancement in the browser, imagine those models running on hardware where the electricity bill rounds to zero.

From Boltzmann's Dream to Actual Hardware

There's a satisfying historical loop here. Boltzmann machines - neural networks based on statistical mechanics - were proposed in the 1980s but were mostly impractical because simulating their stochastic dynamics on digital hardware was painfully slow. Thermodynamic computers essentially are Boltzmann machines, built in physical hardware that naturally samples from the right distributions.

And the hardware is actually coming. Normal Computing has taped out the CN101, billed as the world's first thermodynamic computing chip, backed by a $50M raise led by Samsung (Aifer et al., Nature Communications 2025). Extropic is developing thermodynamic sampling units. The UK's ARIA has committed tens of millions of pounds to the field. This isn't vaporware - it's vaporware-in-progress, which in hardware timelines counts as practically shipping.

The Catch (Because There's Always a Catch)

Thermodynamic computers are still rudimentary. Whitelam's demonstration works on digit classification, not language modeling or protein folding. Scaling from 8 coupled circuits to millions of parameters is a monumental engineering challenge. And the gradient descent training still happens on conventional digital hardware - the thermodynamic part only handles inference.

The previous training approach for these systems used genetic algorithms running across 96 GPUs on a supercomputer (Casert & Whitelam, Nature Communications 2026). Gradient descent is a massive improvement in training efficiency, but it also means thermodynamic computing currently depends on the very digital infrastructure it aims to replace.

Still, every computing revolution started with someone classifying handwritten digits and claiming it would change everything. Sometimes they were right.

References:

Whitelam, S. (2026). Training thermodynamic computers by gradient descent. PNAS. DOI: 10.1073/pnas.2528413123. arXiv: 2509.15324
Whitelam, S. (2026). Generative thermodynamic computing. Physical Review Letters, 136, 037101. DOI: 10.1103/kwyy-1xln. arXiv: 2506.15121
Casert, C. & Whitelam, S. (2026). Nonlinear thermodynamic computing out of equilibrium. Nature Communications, 17, 1189. DOI: 10.1038/s41467-025-67958-0
Aifer, M. et al. (2025). Thermodynamic computing system for AI applications. Nature Communications. DOI: 10.1038/s41467-025-59011-x. arXiv: 2312.04836
Can neuromorphic computing help reduce AI's high energy cost? (2025). PNAS. DOI: 10.1073/pnas.2528654122

Disclaimer: This blog post is a simplified summary of published research for educational purposes. The accompanying illustration is artistic and does not depict actual model architectures, data, or experimental results. Always refer to the original paper for technical details.

AIb2.io - AI Research Decoded