If Ocean's Eleven had been recast with one human, one pigeon, and one rat, the first rule of the heist would be simple: stop being predictable. That, more or less, is the heartbeat of “Adaptive variability in humans, pigeons, and rats”, a new Psychological Review paper arguing that variability is not just noise or sloppiness - sometimes it is the whole strategy (Reynders, Verguts, & Braem, 2026).
So here is the thing: a lot of decision research asks how agents learn the best move. This paper asks a sneakier question. What if the best move is to avoid becoming the kind of creature your opponent can predict by Tuesday afternoon?
The authors use a reinforcement learning framework, which is the part of AI where an agent learns by trying stuff, getting rewards, and slowly figuring out what pays off. Your phone's autocomplete is not exactly doing this, but the family resemblance is there: lots of pattern learning, lots of repeated feedback, and occasionally the machine acting like it learned from a textbook nobody proofread.
The Anti-Robot Problem
The setup is deliciously mean. The environment is adversarial, meaning it rewards highly variable behavior. If the agent falls into a pattern, the environment catches on and rewards it less. In other words, consistency - normally the gold star of performance advice - becomes a liability.
Let me unpack that. Imagine a basketball player who always drives left when the shot clock gets low. That works until defenders notice. Or think of rock-paper-scissors, except the opponent has a spreadsheet, a grudge, and infinite patience. In these settings, unpredictability is not a bug. It is rent money.
The paper tests three candidate mechanisms for producing that variability:
- Use a stochastic generator - basically inject more randomness.
- Increase the learning rate - update beliefs faster when outcomes change.
- Upvalue unchosen actions - give some extra credit to options you did not pick, which keeps them alive as future choices.
All three can, in principle, help an agent stay variable in simulations. But when the authors fit the model to existing behavioral data from humans, pigeons, and rats, a species difference pops out: humans seem to rely on upvaluing unchosen actions, while pigeons and rats do not (Reynders et al., 2026).
That is a neat result. Humans may not just be acting more randomly. We may be doing something more structured, like mentally keeping the roads not taken slightly warm.
Your Brain, Keeping Backup Plans on Simmer
This is where it gets interesting. Upvaluing unchosen actions sounds technical, but the intuition is friendly enough. Suppose you choose option A and get rewarded. Instead of only learning "A looks good," your brain also nudges option B or C upward a bit, just for remaining plausible. Not because they worked, but because overcommitting is how you become easy to read.
That idea lines up with other recent work. Lee, Rouault, and Wyart showed that humans tune both learning and choice variability when uncertainty spikes, which suggests variability can be strategically regulated rather than sprayed around like confetti (Lee, Rouault, & Wyart, 2023). Ben-Artzi and colleagues also found evidence that people update the value of unchosen actions, not just chosen ones - a surprisingly important detail if you want to model actual human behavior instead of the cleaner, more obedient version that lives in textbooks (Ben-Artzi et al., 2023).
In AI, this plugs straight into the old exploration-versus-exploitation dilemma. Do you stick with the thing that seems best, or do you try alternatives because the world may be changing, deceptive, or just plain weird? Reviews in both machine learning and cognitive science keep coming back to the same point: exploration is hard, necessary, and usually more subtle than "add random noise and hope for the best" (Ladosz et al., 2022; Wise et al., 2024).
Why This Matters Outside a Lab With Judgmental Pigeons
If these findings hold up, they matter anywhere adaptability beats routine. Robotics is an obvious one. Real-world RL systems still struggle when environments shift or when being too predictable gets them stuck. Recent surveys of robotics RL keep emphasizing that robust exploration remains one of the practical bottlenecks (Chen et al., 2025). The same logic shows up in recommendation systems, autonomous navigation, and game-playing agents: if you only chase what worked yesterday, tomorrow can embarrass you.
There is also a more human point here. Creativity, strategic play, and flexible problem-solving may depend less on "be random" and more on "don't let unused options die too quickly." That is a much smarter story. It makes variability look less like chaos and more like disciplined indecision - which, to be fair, is also how many of us order dinner.
None of this means humans are magic or pigeons are missing some secret software patch. It means different species may arrive at variable behavior through different internal routes. Same outward wiggle, different machinery underneath.
And honestly, that is the fun part. A human, a pigeon, and a rat can all look unpredictable. Only one of them, apparently, may be quietly keeping the abandoned options in play like a chess player who refuses to close tabs.
References
Reynders, J., Verguts, T., & Braem, S. (2026). Adaptive variability in humans, pigeons, and rats. Psychological Review. DOI: 10.1037/rev0000620. PubMed: 42024326
Lee, J. K., Rouault, M., & Wyart, V. (2023). Adaptive tuning of human learning and choice variability to unexpected uncertainty. Science Advances, 9(13), eadd0501. DOI: 10.1126/sciadv.add0501. PMCID: PMC10058239
Ben-Artzi, I., Kessler, Y., Nicenboim, B., & Shahar, N. (2023). Computational mechanisms underlying latent value updating of unchosen actions. Science Advances, 9(42), eadi2704. DOI: 10.1126/sciadv.adi2704. PMCID: PMC10588947
Wise, T., Radulescu, A., Balters, S., et al. (2024). Naturalistic reinforcement learning. Trends in Cognitive Sciences, 28(2), 144-158. DOI: 10.1016/j.tics.2023.08.016. PMCID: PMC10878983
Ladosz, P., Weng, L., Kim, M., & Oh, H. (2022). Exploration in deep reinforcement learning: A survey. arXiv: 2205.00824. DOI: 10.48550/arXiv.2205.00824
Chen, L., Tang, Y., Wang, T., et al. (2025). Deep reinforcement learning for robotics: A survey of real-world successes. Proceedings of the AAAI Conference on Artificial Intelligence, 39(27), 28694-28698. DOI: 10.1609/aaai.v39i27.35095
Disclaimer: This blog post is a simplified summary of published research for educational purposes. The accompanying illustration is artistic and does not depict actual model architectures, data, or experimental results. Always refer to the original paper for technical details.