AIb2.io - AI Research Decoded

Your Pupils Are Not Neutral: Fake News, Reinforcement Learning, and the Tiny Drama in Your Eyes

Thousands of papers get published every day like confetti launched by overcaffeinated grad students, so a study has to do something pretty unusual to earn a second look. This one did: it suggests your pupils may quietly expose how your prior beliefs steer what you learn from fake news, which is both scientifically interesting and a little rude of our eyeballs.

Your Eyeballs Have Opinions

The new PNAS paper, Eye of the beholder, asks a sneaky question: when you encounter news headlines, do you learn from feedback in a neutral way, or do your existing beliefs tilt the whole table before the game even starts? Lozito and colleagues tested this with a mix of fake and real headlines, confidence ratings, a reinforcement learning task, and pupillometry, which is a fancy way of saying "we watched your pupils do weird little honesty leaks" (Lozito et al., 2026).

Participants first judged whether headlines were true or false and rated how confident they felt. Then came the experimental trapdoor: those personalized judgments got turned into a learning task, basically a two-armed bandit setup, the psychology equivalent of a slot machine with trust issues. In some blocks, rewards lined up with what people had previously judged as true. In others, rewards lined up with confidence instead.

Your Pupils Are Not Neutral: Fake News, Reinforcement Learning, and the Tiny Drama in Your Eyes

That distinction mattered a lot.

People learned better when reward matched their earlier veracity judgments than when reward matched confidence, especially low-confidence choices. In plain English, your brain seems happier learning when the reward structure fits the mental filing system it already built. Ask it to optimize around a stranger category, and it starts driving like someone following GPS directions while distrusting the map.

Reinforcement Learning, But Make It Personal

Reinforcement learning sounds intimidating until you realize it is just "try stuff, get feedback, do more of what works." Rats do it. Humans do it. Apps do it. Your streaming service does it with the relentless energy of a casino host who knows you watched one submarine documentary and now thinks you live for ballast tank content.

What makes this paper interesting is that it bolts reinforcement learning onto belief formation. The authors are not just asking whether people can identify fake news. They are asking whether subjective priors, especially beliefs about truth and confidence, shape how new feedback gets absorbed.

That lines up with a broader wave of misinformation research. A 2024 review argued that confidence is not just a side effect of misinformation but part of the machinery that helps it stick (Rapp & Withall, 2024). Another 2024 study on vaccine misinformation found belief bias was a stronger predictor of falling for falsehoods than simple inability to tell true from false, and higher confidence tracked with stronger bias (Nahon, Ng, & Gawronski, 2024). So yes, confidence can behave less like a wise internal compass and more like that one friend who gives terrible directions with absolute swagger.

The Pupil Plot Twist

Now for the fun part: the pupils.

Pupil dilation is often treated as a window into attention, uncertainty, arousal, and cognitive effort. Not a perfect window, but a useful one. In this study, the pupil data showed signals tied to subjective confidence before decisions were made. That matters because it suggests belief-related processing is not just happening in the final "I choose this one" moment. It is already cooking earlier, like a political argument forming in your head while you are still pretending to listen politely.

The computational modeling adds another layer. When reward matched judged truthfulness, participants seemed to rely on feature-based generalization. When the reward structure stopped matching their prior epistemic structure, they shifted toward a different updating style, more sensitive to outcome valence. Translation: when the world plays by the rules your brain expected, learning looks smooth. When it does not, the brain starts improvising like a contractor discovering the floorboards are haunted.

That also fits adjacent work in reinforcement learning and confidence. Recent neuroscience research suggests confidence is deeply woven into learning systems rather than pasted on afterward like a customer satisfaction survey (Ting et al., 2023; de Lange et al., 2023).

Why This Actually Matters Outside the Lab

This is not just a lab curiosity about blinking and vibes. Online information systems constantly reward behavior. Click this. Ignore that. Trust this source. Share that headline. If prior beliefs help determine what feedback "makes sense," then misinformation is not only a content problem. It is also a learning problem.

That is especially relevant now, when generative AI has made deceptive content cheaper, faster, and weirder. A 2024 Nature Machine Intelligence perspective laid out how large language models can both worsen factuality problems and help fact-check them (Augenstein et al., 2024). New systems such as DEFAME and Veracity are already trying to verify multimodal claims with retrieved evidence instead of vibes alone (Braun et al., 2025; Curtis et al., 2025).

The catch is that even perfect tools still meet imperfect humans. If your internal reward system already prefers information that feels right, then fighting fake news is not just about better detectors. It is about understanding the deeply awkward marriage between belief, confidence, and learning.

Your pupils, apparently, knew that already.

References

Lozito, S., Piga, V., Lo Presti, S., Scuderi, A., Doricchi, F., Silvetti, M., & Lasaponara, S. (2026). Eye of the beholder: Pupillary response reflects how subjective prior beliefs shape reinforcement learning with fake news. Proceedings of the National Academy of Sciences. https://doi.org/10.1073/pnas.2518776123

Rapp, D. N., & Withall, M. M. (2024). Confidence as a metacognitive contributor to and consequence of misinformation experiences. Current Opinion in Psychology, 55, 101735. https://doi.org/10.1016/j.copsyc.2023.101735

Nahon, L. S., Ng, N. L., & Gawronski, B. (2024). Susceptibility to misinformation about COVID-19 vaccines: A signal detection analysis. Journal of Experimental Social Psychology, 114, 104632. https://doi.org/10.1016/j.jesp.2024.104632

Ting, C.-C., Salem-Garcia, N., Palminteri, S., Engelmann, J. B., & Lebreton, M. (2023). Neural and computational underpinnings of biased confidence in human reinforcement learning. Nature Communications, 14, 6736. https://doi.org/10.1038/s41467-023-42589-5

de Lange, F. P., Heilbron, M., & Meyniel, F. (2023). A characterization of the neural representation of confidence during probabilistic learning. NeuroImage, 268, 119849. https://doi.org/10.1016/j.neuroimage.2022.119849

Augenstein, I., Baldwin, T., Cha, M., Chakraborty, T., Ciampaglia, G. L., DiResta, R., Ferrara, E., Hale, S., Halevy, A., Hovy, E., Ji, H., Menczer, F., Nakov, P., Scheufele, D., Sharma, S., & Zagni, G. (2024). Factuality challenges in the era of large language models and opportunities for fact-checking. Nature Machine Intelligence, 6, 852-863. https://www.nature.com/articles/s42256-024-00881-z

Braun, T., Rothermel, M., Rohrbach, M., & Rohrbach, A. (2025). DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts. ICML 2025. https://openreview.net/forum?id=umT6rMf1Rm&noteId=nxl8Hxw9UX

Curtis, T. L., Puelma Touzel, M., Garneau, W., Gruaz, M., Pinder, M., Wang, L. W., Krishna, S., Cohen, L., Godbout, J.-F., Rabbany, R., & Pelrine, K. (2025). Veracity: An Open-Source AI Fact-Checking System. Proceedings of IJCAI 2025. https://doi.org/10.24963/ijcai.2025/1254

Disclaimer: This blog post is a simplified summary of published research for educational purposes. The accompanying illustration is artistic and does not depict actual model architectures, data, or experimental results. Always refer to the original paper for technical details.