Abstract: The orbitofrontal cortex (OFC) and amygdala are thought to participate in reversal learning, a process in which cue-outcome associations are switched. However, current theories disagree on whether OFC directs reversal learning in the amygdala. Here, we show that during reversal of cues' associations with rewarding and aversive outcomes, neurons that respond preferentially to stimuli predicting aversive events update more quickly in amygdala than OFC; meanwhile, OFC neurons that respond preferentially to reward-predicting stimuli update more quickly than those in the amygdala. After learning, however, OFC consistently differentiates between impending reinforcements with a shorter latency than the amygdala. Finally, analysis of local field potentials (LFPs) reveals a disproportionate influence of OFC on amygdala that emerges after learning. We propose that reversal learning is supported by complex interactions between neural circuits spanning the amygdala and OFC, rather than directed by any single structure.
Abstract: Making appropriate choices often requires the ability to learn the value of available options from experience. Parkinson's disease is characterized by a loss of dopamine neurons in the substantia nigra, neurons hypothesized to play a role in reinforcement learning. Although previous studies have shown that Parkinson's patients are impaired in tasks involving learning from feedback, they have not directly tested the widely held hypothesis that dopamine neuron activity specifically encodes the reward prediction error signal used in reinforcement learning models. To test a key prediction of this hypothesis, we fit choice behavior from a dynamic foraging task with reinforcement learning models and show that treatment with dopaminergic drugs alters choice behavior in a manner consistent with the theory. More specifically, we found that dopaminergic drugs selectively modulate learning from positive outcomes. We observed no effect of dopaminergic drugs on learning from negative outcomes. We also found a novel dopamine-dependent effect on decision making that is not accounted for by reinforcement learning models: perseveration in choice, independent of reward history, increases with Parkinson's disease and decreases with dopamine therapy.
Abstract: Although noncholinergic neurons in the basal forebrain are known to contribute to cognition, their response properties in behaving animals is unclear. In this issue of Neuron, Lin and Nicolelis demonstrate that these neurons represent the motivational salience of sensory stimuli and may modulate cortical processing to direct top-down attention.
Abstract: Choosing the most valuable course of action requires knowing the outcomes associated with the available alternatives. The striatum may be important for representing the values of actions. We examined this in monkeys performing an oculomotor choice task. The activity of phasically active neurons (PANs) in the striatum covaried with two classes of information: action-values and chosen-values. Action-value PANs were correlated with value estimates for one of the available actions, and these signals were frequently observed before movement execution. Chosen-value PANs were correlated with the value of the action that had been chosen, and these signals were primarily observed later in the task, immediately before or persistently after movement execution. These populations may serve distinct functions mediated by the striatum: some PANs may participate in choice by encoding the values of the available actions, while other PANs may participate in evaluative updating by encoding the reward value of chosen actions.
Abstract: Work in behaving primates indicates that midbrain dopamine neurons encode a prediction error, the difference between an obtained reward and the reward expected. Studies of dopamine action potential timing in the alert and anesthetized rat indicate that dopamine neurons respond in tonic and phasic modes, a distinction that has been less well characterized in the primates. We used spike train models to examine the relationship between the tonic and burst modes of activity in dopamine neurons while monkeys were performing a reinforced visuo-saccadic movement task. We studied spiking activity during four task-related intervals; two of these were intervals during which no task-related events occurred, whereas two were periods marked by task-related phasic activity. We found that dopamine neuron spike trains during the intervals when no events occurred were well described as tonic. Action potentials appeared to be independent, to occur at low frequency, and to be almost equally well described by Gaussian and Poisson-like (gamma) processes. Unlike in the rat, interspike intervals as low as 20 ms were often observed during these presumptively tonic epochs. Having identified these periods of presumptively tonic activity, we were able to quantitatively define phasic modulations (both increases and decreases in activity) during the intervals in which task-related events occurred. This analysis revealed that the phasic modulations of these neurons include both bursting, as has been described previously, and pausing. Together bursts and pauses seemed to provide a continuous, although nonlinear, representation of the theoretically defined reward prediction error of reinforcement learning.
Abstract: The basal ganglia appear to have a central role in reinforcement learning. Previous experiments, focusing on activity preceding movement execution, support the idea that dorsal striatal neurons bias action selection according to the expected values of actions. However, many phasically active striatal neurons respond at a time too late to initiate or select movements. Given the data suggesting a role for the basal ganglia in reinforcement learning, postmovement activity may therefore reflect evaluative processing important for learning the values of actions. To better understand these postmovement neurons, we determined whether individual striatal neurons encode information about saccade direction, whether a reward had been received, or both. We recorded from phasically active neurons in the caudate nucleus while monkeys performed a probabilistically rewarded delayed saccade task. Many neurons exhibited peak responses after saccade execution (77 of 149) that were often tuned for the direction of the preceding saccade (61 of 77). Of those neurons responding during the reward epoch, one subset showed direction tuning for the immediately preceding saccade (43 of 60), whereas another subset responded differentially on rewarded versus unrewarded trials (35 of 60). We found that there was relatively little overlap of these properties in individual neurons. The encoding of action and outcome was performed by largely separate populations of caudate neurons that were active after movement execution. Thus, striatal neurons active primarily after a movement appear to be segregated into two distinct groups that provide complimentary information about the outcomes of actions.
Abstract: We studied the choice behavior of 2 monkeys in a discrete-trial task with reinforcement contingencies similar to those Herrnstein (1961) used when he described the matching law. In each session, the monkeys experienced blocks of discrete trials at different relative-reinforcer frequencies or magnitudes with unsignalled transitions between the blocks. Steady-state data following adjustment to each transition were well characterized by the generalized matching law; response ratios undermatched reinforcer frequency ratios but matched reinforcer magnitude ratios. We modelled response-by-response behavior with linear models that used past reinforcers as well as past choices to predict the monkeys' choices on each trial. We found that more recently obtained reinforcers more strongly influenced choice behavior. Perhaps surprisingly, we also found that the monkeys' actions were influenced by the pattern of their own past choices. It was necessary to incorporate both past reinforcers and past choices in order to accurately capture steady-state behavior as well as the fluctuations during block transitions and the response-by-response patterns of behavior. Our results suggest that simple reinforcement learning models must account for the effects of past choices to accurately characterize behavior in this task, and that models with these properties provide a conceptual tool for studying how both past reinforcers and past choices are integrated by the neural systems that generate behavior.
Abstract: A crucial step in understanding the function of a neural circuit in visual processing is to know what stimulus features are represented in the spiking activity of the neurons. For neurons with complex, nonlinear response properties, characterization of feature representation requires measurement of their responses to a large ensemble of visual stimuli and an analysis technique that allows identification of relevant features in the stimuli. In the present study, we recorded the responses of complex cells in the primary visual cortex of the cat to spatiotemporal random-bar stimuli and applied spike-triggered correlation analysis of the stimulus ensemble. For each complex cell, we were able to isolate a small number of relevant features from a large number of null features in the random-bar stimuli. Using these features as visual stimuli, we found that each relevant feature excited the neuron effectively in isolation and contributed to the response additively when combined with other features. In contrast, the null features evoked little or no response in isolation and divisively suppressed the responses to relevant features. Thus, for each cortical complex cell, visual inputs can be decomposed into two distinct types of features (relevant and null), and additive and divisive interactions between these features may constitute the basic operations in visual cortical processing.
Abstract: A crucial step toward understanding visual processing is to obtain a comprehensive description of the relationship between visual stimuli and neuronal responses. Many neurons in the visual cortex exhibit nonlinear responses, making it difficult to characterize their stimulus-response relationships. Here, we recorded the responses of primary visual cortical neurons of the cat to spatiotemporal random-bar stimuli and trained artificial neural networks to predict the response of each neuron. The random initial connections in the networks consistently converged to regular patterns. Analyses of these connection patterns showed that the response of each complex cell to the random-bar stimuli could be well approximated by the sum of a small number of subunits resembling simple cells. The direction selectivity of each complex cell measured with drifting gratings was also well predicted by the combination of these subunits, indicating the generality of the model. These results are consistent with a simple functional model for complex cells and demonstrate the usefulness of the neural network method for revealing the stimulus-response transformations of nonlinear neurons.