Brain’s Reward Center Tracks Not Just What, But When Rewards Arrive – Neuroscience News
The study of the brain’s reward system has taken an intriguing turn. Long-time understood for its role in predicting future rewards, the ventral tegmental area (VTA) of the brain not only tracks the expected value of rewards but also the exact moment they are likely to occur. This remarkable finding suggests a more sophisticated mechanism of reward prediction within our brains than previously thought.
Researchers from the universities of Geneva (UNIGE), Harvard, and McGill employed a machine learning algorithm to explore this phenomenon further. By analyzing VTA neurons’ activities, they revealed that these neurons don’t just predict imminent rewards; they can forecast rewards that may occur seconds, minutes, or even longer. This refined temporal encoding provides the brain with enhanced flexibility in decision-making and learning, closely resembling the methods used in advanced AI systems.
The VTA, a compact region within our brain, is pivotal for motivation and the reward circuit. As a primary dopamine source, it sends this critical neuromodulator to other brain areas to initiate action in response to positive stimuli. Initially deemed merely a center of rewards within the brain, the understanding of the VTA’s role evolved significantly in the 1990s. Researchers discovered it didn’t encode the reward itself but instead managed the prediction of it.
Alexandre Pouget, a full professor in the Department of Basic Neurosciences at the UNIGE Faculty of Medicine, explains, “When experiments show a reward consistently following a light cue, the VTA doesn’t release dopamine at the reward’s moment but does so upon the signal’s appearance. This indicates it encodes the prediction tied to the signal rather than the actual reward.”
A More Refined Function
This process, often termed “reinforcement learning,” relies on minimal supervision and plays a critical role in human learning. It also underpins numerous AI algorithms that improve performance through training, like AlphaGo, the first AI algorithm to outperform a world champion in the game of Go.
Recent research conducted by Pouget’s team, in collaboration with Harvard’s Naoshige Uchida and McGill’s Paul Masset, unveiled that the VTA’s predictive function is even more complex than assumed.
“Instead of forecasting a weighted average of future rewards, the VTA forecasts their temporal occurrence. Every anticipated gain is represented separately, pinpointing exactly when it’s expected,” elaborates Pouget.
This nuance implies that certain neurons might prioritize imminent rewards over those in the future, embodying the adage “a bird in the hand is worth two in the bush.” Different neurons operate on varying time scales, with some focused on rewards a few seconds away, others on those a minute away, and some on even more distant futures.
This variation allows the brain’s learning system to exhibit great flexibility, enabling it to adapt to optimize immediate or delayed rewards according to an individual’s goals and priorities.
AI and Neuroscience: A Symbiotic Relationship
These revelations are the fruits of interdisciplinary collaboration between neuroscience and artificial intelligence. Alexandre Pouget devised a purely mathematical algorithm that incorporates the timing of reward processing.
Meanwhile, Harvard researchers collected extensive neurophysiological data on VTA activity in animals experiencing rewards. When they applied Pouget’s algorithm to their data, the results harmonized perfectly with their empirical observations.
While the brain often inspires AI and machine learning advancements, these results showcase how algorithms can also unearth our neurophysiological processes.
Multi-timescale Reinforcement Learning in the Brain
To succeed in intricate environments, animals and artificial agents must adapt behaviors that maximize both fitness and rewards. Such behavior can be refined through reinforcement learning—a class of algorithms renowned for training artificial agents and characterizing dopaminergic neuron firing in the midbrain.
Traditionally, reinforcement learning agents discount future rewards exponentially using a single timescale or discount factor. However, the current research explores multiple timescales in biological reinforcement learning. It shows that agents operating across numerous timescales unlock distinct computational advantages.
The study indicates that dopaminergic neurons, seen in mice performing specific behavioral tasks, encode reward prediction error with varied discount time constants. This model accounts for the heterogeneity of temporal discounting observed in both cue-evoked transient responses and slower fluctuations known as dopamine ramps.
Significantly, the measured discount factors of individual neurons correlate across tasks, indicating a cell-specific feature. These findings usher in a new paradigm for understanding functional diversity in dopaminergic neurons. They also offer a mechanistic foundation for why humans and animals often use non-exponential discounts and pave the way for designing more-efficient reinforcement learning algorithms.
The study, published in the journal Nature, continues to push the boundaries of neuroscience, demonstrating the profound potential of leveraging AI to advance our understanding of the brain.