Abstract
A large body of evidence has indicated that the phasic responses of midbrain dopamine neurons show a remarkable similarity to a type of teaching signal (temporal difference (TD) error) used in machine learning. However, previous studies failed to observe a key prediction of this algorithm: that when an agent associates a cue and a reward that are separated in time, the timing of dopamine signals should gradually move backward in time from the time of the reward to the time of the cue over multiple trials. Here we demonstrate that such a gradual shift occurs both at the level of dopaminergic cellular activity and dopamine release in the ventral striatum in mice. Our results establish a long-sought link between dopaminergic activity and the TD learning algorithm, providing fundamental insights into how the brain associates cues and rewards that are separated in time.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The fluorometry and two-photon imaging data have been shared at a public deposit source (https://datadryad.org/stash/dataset/doi:10.5061/dryad.hhmgqnkjw). Source data are provided with this paper.
Code availability
The model code is attached as Supplemetary Data. All other conventional codes used to obtain the results are available from a public deposit source (https://github.com/VTA-SNc/Amo2022).
References
Montague, P. R., Dayan, P. & Sejnowski, T. J. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–1947 (1996).
Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).
Rescorla, R. A. & Wagner, A. R. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. Class. Cond. II Curr. Res. Theory 2, 64–99 (1972).
Sutton, R. S. & Barto, A. G. A temporal-difference model of classical conditioning. In: Proceedings of the Ninth Annual Conference of the Cognitive Science Society. 355–378 (1987).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, 1998).
Hollerman, J. R. & Schultz, W. Dopamine neurons report an error in the temporal prediction of reward during learning. Nat. Neurosci. 1, 304–309 (1998).
Flagel, S. B. et al. A selective role for dopamine in stimulus–reward learning. Nature 469, 53–57 (2011).
Menegas, W., Babayan, B. M., Uchida, N. & Watabe-Uchida, M. Opposite initialization to novel cues in dopamine signaling in ventral and posterior striatum in mice. eLife 6, e21886 (2017).
Day, J. J., Roitman, M. F., Wightman, R. M. & Carelli, R. M. Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nat. Neurosci. 10, 1020–1028 (2007).
Clark, J. J., Collins, A. L., Sanford, C. A. & Phillips, P. E. M. Dopamine encoding of Pavlovian incentive stimuli diminishes with extended training. J. Neurosci. 33, 3526–3532 (2013).
Pan, W.-X., Schmidt, R., Wickens, J. R. & Hyland, B. I. Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward–learning network. J. Neurosci. 25, 6235–6242 (2005).
Brown, J., Bullock, D. & Grossberg, S. How the basal ganglia use parallel excitatory and inhibitory learning pathways to selectively respond to unexpected rewarding cues. J. Neurosci. 19, 10502–10511 (1999).
Mollick, J. A. et al. A systems-neuroscience model of phasic dopamine. Psychol. Rev. 127, 972–1021 (2020).
O’Reilly, R. C., Frank, M. J., Hazy, T. E. & Watz, B. PVLV: the primary value and learned value Pavlovian learning algorithm. Behav. Neurosci. 121, 31–49 (2007).
Tan, C. O. & Bullock, D. A local circuit model of learned striatal and dopamine cell responses under probabilistic schedules of reward. J. Neurosci. 28, 10062–10074 (2008).
Maes, E. J. P. et al. Causal evidence supporting the proposal that dopamine transients function as temporal difference prediction errors. Nat. Neurosci. 23, 176–178 (2020).
Roesch, M. R., Calu, D. J. & Schoenbaum, G. Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nat. Neurosci. 10, 1615–1624 (2007).
Sutton, R. S., Precup, D. & Singh, S. Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112, 181–211 (1999).
Mohebi, A. et al. Dissociable dopamine dynamics for learning and motivation. Nature 570, 65–70 (2019).
Kim, H. R. et al. A unified framework for dopamine signals across timescales. Cell 183, 1600–1616 (2020).
Tsutsui-Kimura, I. et al. Distinct temporal difference error signals in dopamine axons in three regions of the striatum in a decision-making task. eLife 9, e62390 (2020).
Li, L., Walsh, T. J. & Littman, M. L. Towards a unified theory of state abstraction for MDPs. In: Proceedings of the Ninth International Symposium on Artificial Intelligence and Mathematics. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.60.1229 (2006).
Botvinick, M. M., Niv, Y. & Barto, A. G. Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective. Cognition 113, 262–280 (2009).
Bromberg-Martin, E. S., Matsumoto, M., Hong, S. & Hikosaka, O. A pallidus-habenula-dopamine pathway signals inferred stimulus values. J. Neurophysiol. 104, 1068–1076 (2010).
Coddington, L. T. & Dudman, J. T. The timing of action determines reward prediction signals in identified midbrain dopamine neurons. Nat. Neurosci. 21, 1563–1573 (2018).
Zhong, W., Li, Y., Feng, Q. & Luo, M. Learning and stress shape the reward response patterns of serotonin neurons. J. Neurosci. 37, 8863–8875 (2017).
Chen, T.-W. et al. Ultrasensitive fluorescent proteins for imaging neuronal activity. Nature 499, 295–300 (2013).
Sun, F. et al. Next-generation GRAB sensors for monitoring dopaminergic activity in vivo. Nat. Methods 17, 1156–1166 (2020).
Kakade, S. & Dayan, P. Dopamine: generalization and bonuses. Neural Netw. 15, 549–559 (2002).
Morrens, J., Aydin, Ç., Janse van Rensburg, A., Esquivelzeta Rabell, J. & Haesler, S. Cue-evoked dopamine promotes conditioned responding during learning. Neuron 106, 142–153 (2020).
Cohen, J. Y., Haesler, S., Vong, L., Lowell, B. B. & Uchida, N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88 (2012).
Tian, J. & Uchida, N. Habenula lesions reveal that multiple mechanisms underlie dopamine prediction errors. Neuron 87, 1304–1316 (2015).
Niv, Y., Duff, M. O. & Dayan, P. Dopamine, uncertainty and TD learning. Behav. Brain Funct. 1, 6 (2005).
Schultz, W., Apicella, P. & Ljungberg, T. Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J. Neurosci. 13, 900–913 (1993).
Kobayashi, S. & Schultz, W. Reward contexts extend dopamine signals to unrewarded stimuli. Curr. Biol. 24, 56–62 (2014).
Matsumoto, H., Tian, J., Uchida, N. & Watabe-Uchida, M. Midbrain dopamine neurons signal aversion in a reward-context-dependent manner. eLife 5, e17328 (2016).
Engelhard, B. et al. Specialized coding of sensory, motor and cognitive variables in VTA dopamine neurons. Nature 570, 509–513 (2019).
Menegas, W. et al. Dopamine neurons projecting to the posterior striatum form an anatomically distinct subclass. eLife 4, e10032 (2015).
Dabney, W. et al. A distributional code for value in dopamine-based reinforcement learning. Nature 577, 671–675 (2020).
Sutton, R. S. & Barto, A. G. Reinforcement Learning, Second Edition: An Introduction (MIT Press, 2018).
Babayan, B. M., Uchida, N. & Gershman, S. J. Belief state representation in the dopamine system. Nat. Commun. 9, 1891 (2018).
Kobayashi, S. & Schultz, W. Influence of reward delays on responses of dopamine neurons. J. Neurosci. 28, 7837–7846 (2008).
Lee, R. S., Mattar, M. G., Parker, N. F., Witten, I. B. & Daw, N. D. Reward prediction error does not explain movement selectivity in DMS-projecting dopamine neurons. eLife 8, e42992 (2019).
Watabe-Uchida, M., Eshel, N. & Uchida, N. Neural circuitry of reward prediction error. Annu. Rev. Neurosci. 40, 373–394 (2017).
Kawato, M. & Samejima, K. Efficient reinforcement learning: computational theories, neuroscience and robotics. Curr. Opin. Neurobiol. 17, 205–212 (2007).
Tian, J. et al. Distributed and mixed information in monosynaptic inputs to dopamine neurons. Neuron 91, 1374–1389 (2016).
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Botvinick, M., Wang, J. X., Dabney, W., Miller, K. J. & Kurth-Nelson, Z. Deep reinforcement learning and its neuroscientific implications. Neuron 107, 603–616 (2020).
Bäckman, C. M. et al. Characterization of a mouse strain expressing Cre recombinase from the 3′ untranslated region of the dopamine transporter locus. Genesis 44, 383–390 (2006).
Tong, Q. et al. Synaptic glutamate release by ventromedial hypothalamic neurons is part of the neurocircuitry that prevents hypoglycemia. Cell Metab. 5, 383–393 (2007).
Madisen, L. et al. A robust and high-throughput Cre reporting and characterization system for the whole mouse brain. Nat. Neurosci. 13, 133–140 (2010).
Daigle, T. L. et al. A suite of transgenic driver and reporter mouse lines with enhanced brain-cell-type targeting and functionality. Cell 174, 465–480 (2018).
Tsutsui-Kimura, I. et al. Dysfunction of ventrolateral striatal dopamine receptor type 2-expressing medium spiny neurons impairs instrumental motivation. Nat. Commun. 8, 14304 (2017).
Zhang, F. et al. Optogenetic interrogation of neural circuits: technology for probing mammalian brain structures. Nat. Protoc. 5, 439–456 (2010).
Inutsuka, A. et al. The integrative role of orexin/hypocretin neurons in nociceptive perception and analgesic regulation. Sci. Rep. 6, 29480 (2016).
Chen, T.-W. et al. Ultrasensitive fluorescent proteins for imaging neuronal activity. Nature 499, 295–300 (2013).
Dana, H. et al. High-performance calcium sensors for imaging activity in neuronal populations and microcompartments. Nat. Methods 16, 649–657 (2019).
Uchida, N. & Mainen, Z. F. Speed and accuracy of olfactory discrimination in the rat. Nat. Neurosci. 6, 1224–1229 (2003).
Pachitariu, M. et al. Suite2p: beyond 10,000 neurons with standard two-photon microscopy. Preprint at https://www.biorxiv.org/content/10.1101/061507v2 (2017).
Keemink, S. W. et al. FISSA: a neuropil decontamination toolbox for calcium imaging signals. Sci. Rep. 8, 3493 (2018).
Paxinos, G. & Franklin, K. B. J. Paxinos and Franklin’s the Mouse Brain in Stereotaxic Coordinates (Academic Press, 2019).
Acknowledgements
We thank I. Tsutsui-Kimura, H.-G. Kim and B. Babayan for technical assistance; V. Roser and S. Ikeda for assistance in animal training; H. Matsumoto for sharing data; A. Lowet, M. Bukwich and all laboratory members for discussion. We thank M. Mathis, E. Soucy, V. Murthy, M. Andermann and members of their laboratories for advice on establishing two-photon imaging of deep structures. We thank C. Dulac for sharing reagents and equipment. We thank D. Kim and the GENIE Project, Janelia Farm Research Campus, Howard Hughes Medical Institute, for pGP-CMV-GCaMP6f and pGP-AAV-CAG-FLEX-jGCaMP7f-WPRE plasmids; E. Boyden, Media Lab, Massachusetts Institute of Technology, for AAV5-CAG-FLEX-tdTomato and AAV5-CAG-tdTomato; K. Deisseroth, Stanford University, for pAAV-EF1a-DIO-hChR2(H134R)-EYFP-WPRE; and Y. Li, State Key Laboratory of Membrane Biology, Peking University, for AAV9-hSyn-DA2m. This work was supported by grants from the National Institute of Mental Health (R01MH125162, to M.W.-U.), the National Institutes of Health (U19 NS113201 and NS108740, to N.U.), the Simons Collaboration on Global Brain (to N.U.), the Japan Society for the Promotion of Science, Japan Science and Technology Agency (to R.A.), the Human Frontier Science Program (LT000801/2018, to S.M.), the Harvard Brain Science Initiative (HBI Young Scientist Transitions Award, to S.M.) and Brain Mapping by Integrated Neurotechnologies for Disease Studies (Brain/MINDS) by AMED (JP20dm0207069, to K.F.T.).
Author information
Authors and Affiliations
Contributions
R.A. and M.W.-U. designed experiments and analyzed data. R.A. and S.M. collected data. R.A. and A.Y. made constructs. K.F.T. made transgenic mice. The results were discussed and interpreted by R.A., S.M., N.U. and M.W.-U. R.A., N.U. and M.W.-U. wrote the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Neuroscience thanks Erin Calipari and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Recording sites for fiber-fluorometry and example of fiber-fluorometry signals.
a, Recording site for each animal is shown in coronal views (Paxinos and Franklin61). b, Example coronal section of a recording site and DA2m (green) expression in the VS. c, Example coronal section of a recording site and GCaMP7f (green) and tdTomato (red) expression in the VS. d, Example coronal section of a recording site and GCaMP7f (green) and tdTomato (red) expression in the VTA. Asterisks indicate fiber tip locations. Other animals (n = 7 for DA2m, n = 9 for GCaMP in VS, and n = 4 for GCaMP in VTA) showed similar result as summarized in (a). Scale bars, 1 mm. e, Raw GCaMP7f (upper panel) and DA2m (GrabDA; lower panel) signals in VS. f, Comparison of free reward response between electrophysiology and fiber-fluorometry. Electrophysiology of opto-tagged dopamine neurons (upper; data from Matsumoto et al., 2016), fiber-fluorometry of GCaMP signals in dopamine axons in the VS (middle), fiber-fluorometry of DA2m signals in the VS (bottom).
Extended Data Fig. 2 Dopamine release in the ventral striatum during first-time classical conditioning.
a, The time-course of dopamine sensor responses to cued water (left) and to free water (right) in an example mouse. Each dot represents responses in each trial, and a line shows moving averages of 20 trials. b, Dopamine response to reward-associated cue in the late phase (2–3 s after cue onset) was significantly higher than activity in the early phase (0–1 s after cue onset) during the first 1/3 of learning phase (t = 5.0, p = 0.22 × 10−2; two-sided t-test). c, Dopamine sensor signal peaks during delay periods (red) overlaid on a heatmap of dopamine sensor signals in cued water trials. n = 7 animals. d, Dopamine sensor response onset (purple) overlaid on a heatmap of dopamine sensor signals. e, Linear regression of dopamine excitation onset with trial number. f, Each panel shows correlation coefficients between activity peak and trial number for trial order shuffled data (n = 500) in each animal (7 animals). 95 percentile area is marked with red and the correlation coefficient of original data is shown as blue line. Center of boxplot showing median, edges are 25th and 75th percentile, and whiskers are most extreme data points. **p<0.01.
Extended Data Fig. 3 Test for monotonic shift of dopamine activity and activity center of mass during first-time learning.
a, Schematic drawing of dopamine activity peaks of randomly sampled 3 trials (see Methods). In temporal shift (left), the probability of monotonic relationship is higher than chance level, while that is not the case in amplitude shift (right). b, Probability of having monotonic shift in randomly sampled 3 activity peaks. 100 sets of sampling were repeated for 500 times for each animal (see Methods). Each panel shows the result from a single animal (7 animals). Blue, actual data; red, control. The data showed significantly higher probability (p<0.05; two-sided t-test, no adjustment for multiple comparison) of monotonic shift in 6 out of 7 animals, which is significantly higher than chance level (p = 0.78 × 10−2; binomial cumulative function). c, Contour plot of activity pattern (left) and the time-course of the center of mass (0–3 s after cue onset) over training (right) in an example animal. d, Left, Time-course of the center of mass over training. Right, average centers of mass during the first 1/3 of learning period in all animals were significantly later than half time point (1.5s) (t = −2.8, p = 0.029; two-sided t-test). n = 7 animals. Center of boxplot showing median, edges are 25th and 75th percentile, and whiskers are most extreme data points. *p<0.05, ***p<0.001.
Extended Data Fig. 4 Temporal shift of activity during reversal learning.
a, The regression coefficients ±95% confidence intervals between activity peak timing and trial number in each animal under different experimental conditions (n = 2 animals for GCaMP6f, n = 3 animals for DA2m; mean ± 95% confidence intervals). Red circles, significant (p-value ≤0.05; two-sided F-test, no adjustment for multiple comparison) slopes. b, Average dopamine activity (normalized to free water response) in response to a reward-predicting cue in the first session of reversal from nothing to reward (left and right top; n = 3 animals with DA sensor). Each line shows 8 trials mean of population neural activity across the session (mean ± sem). Right bottom, linear regression of peak timing of average activity with trial number during reversal from nothing to reward with dopamine sensor (right; n = 3 animals; regression coefficient 31.8 ms/trial, F = 22, p = 1.3 × 10−4). c, Probability of having monotonic shift in randomly sampled 3 activity peaks in actual data (blue) and control (red) (7 animals, see Methods). The data showed significantly higher probability (p<0.05; two-sided t-test, no adjustment for multiple comparison) of monotonic shift in 6 out of 7 animals in nothing to reward reversal (p = 0.78 × 10−2; binomial cumulative function) and 6 out of 7 animals in airpuff to reward reversal (p = 0.78 × 10−2; binomial cumulative function). ***p<0.001.
Extended Data Fig. 5 Center of mass and time-course of cue responses during reversal learning.
a, Time-course of center of mass of activity (0–3 s after cue onset) over training, and average center of mass during the first 1/3 of learning period in all animals. The average centers of mass were significantly later than half time point (1.5 s) (left, nothing to reward, t = −4.8, p = 0.28 × 10−2; right, airpuff to reward, t = −5.2, p = 0.18 × 10−2; two-sided t-test). b, Responses to a reward-predicting odor in all animals (mean ± sem). early: 0–1 s from odor onset (green); late: 2–3 s from odor onset (magenta). Middle, difference between early and late odor responses (grey: each animal; orange: mean ± sem). Dopamine activity in the late phase was significantly higher than activity in the early phase during the first 1/3 of learning phase (t = 3.8, p = 0.85 × 10−2; two-sided t-test). Center of boxplot showing median, edges are 25th and 75th percentile, and whiskers are most extreme data points. **p<0.01.
Extended Data Fig. 6 Simultaneous fiber fluorometry recording of dopamine neuron activity in the VTA and VS.
a, GCaMP7f was expressed in VTA dopamine neurons and fibers were targeted at both VTA and VS. b, GCaMP signal of dopamine axons in the VS of an example animal (left; mean ± sem, and right top), and activity peak for each trial (right bottom; grey circle). Activity peaks were fitted with linear regression with trial number and the fitted line was shown with red line. c, GCaMP activity of dopamine neurons in the VTA in an example animal simultaneously recorded with (b) (left; mean ± sem, and right top). Activity peaks (right bottom; grey circle) were fitted with linear regression with trial number and the fitted line was shown with red line. d, Comparison of Pearson’s correlation coefficients of activity peaks and trial number between VS recording (p = 8.6 × 10−6, two-sided t-test) and VTA recording (p = 1.2 × 10−2, two-sided t-test) (p = 5.2 × 10−3, two-sided t-test after Fisher’s Z-transformation). Filled circles, reversal from nothing to reward. Open circle, reversal from airpuff to reward. Red circles, significant (p-value ≤0.05, F-test, no adjustment for multiple comparison). n = 8 sessions (airpuff-reward sessions and nothing-reward sessions from 4 animals). Center of boxplot showing median, edges are 25th and 75th percentile, and whiskers are most extreme data points. *p<0.05, **p<0.01, ***p<0.001.
Extended Data Fig. 7 Temporal shift of dopamine inhibitory activity in reversal learning.
a, Lick counts during the delay period (0–3 s after odor onset) in reversal training from reward to airpuff (n = 7 animals). Mean ± sem for each trial. b,b’, GCaMP signals from example animals during reversal. Left, session mean ± sem. The white horizontal lines (top) show session boundaries. c,c’, Dopamine activity trough (grey circle) and linear regression with trial number ((c) regression coefficient 7.6 ms/trial, p = 5.9 × 10−13: (c’) regression coefficient 7.9 ms/trial, p = 3.8 × 10−3). d, Responses to the airpuff-predicting odor in an example animal (left) and in all animals (right, mean ± sem). Early: 0.75–1.75 s from odor onset (green). First 0.75 s was excluded to minimize contamination of remaining positive cue response. Late: 2–3 s from odor onset (magenta). e, Difference between early and late odor responses (grey: each animal; orange: mean ± sem). Dopamine activity in the late phase was significantly higher than activity in the early phase during the first 2–20 trials (t = −3.0, p = 2.3 × 10−3; two-sided t-test). f, Regression coefficient ±95% confidence intervals between activity peak and trial number in each animal in different experimental conditions (mean ± 95% confidence intervals). Red circles, significant (p ≤0.05; F-test, no adjustment for multiple comparison) slopes. n = 7 animals for each condition. Data of one animal in reversal from reward to nothing was removed because of insufficient number of trials with detected troughs. *p<0.05.
Extended Data Fig. 8 Comparison of dopamine axon GCaMP signal, control fluorescence signal, and licking.
a, GCaMP signals (top; green), tdTomato signals (middle; red), and lick counts (bottom; blue) recorded simultaneously in the first reversal session from airpuff to reward (mean ± sem). b, Percentage of animals that show anticipatory licking during delay periods vs trial number. Regression coefficient 1.3%/trial, F = 33, p = 1.2 × 10−6, F-test. c, Average lick counts during delay periods vs trial number. Regression coefficient 0.11 lick/trial, F = 35, p = 7.8 × 10−7, F-test. d, First lick timings vs trial number. Regression coefficient −2.3 ms/trial, F = 0.19, p = 0.66, F-test. e, Relation between the first lick and GCaMP signals during the delay period in an example animal. Right, comparison between timing of GCaMP peak and timing of the first lick. f, Linear regression coefficients for timing of GCaMP peak, tdTomato peak, lick peak and first lick with trial number (t = 8.9, p = 8.8 × 10−4 for GCaMP peak; t = −0.058, p = 0.96 for tdTomato peak; t = 0.038, p = 0.97 for lick peak; and t = 0.98, p = 0.38 for first lick, two-sided t-test). Red circles, significant (p-value ≤0.05, F-test, no adjustment for multiple comparison). g, Latency of GCaMP response and first lick (GCaMP peak to first lick, 427 ± 241 ms; and GCaMP response onset to first lick, 989 ± 154 ms, mean ± sem). h, Correlation coefficients between timing of GCaMP response peak and lick peak (t = 1.3, p = 0.27, two-sided t-test) and lick onset (first lick, t = 0.53, p = 0.62, two-sided t-test, and second lick, t = 1.6, p = 0.19, two-sided t-test). Red circles, significant (p-value ≤0.05, F-test, no adjustment for multiple comparison). n=5 animals. Center of boxplot showing median, edges are 25th and 75th percentile, and whiskers are most extreme data points. **p<0.01.
Extended Data Fig. 9 Dopamine cue responses during repeated learning in well-trained animals.
a, Probability of having monotonic shift in randomly sampled 3 activity peaks in the actual data (blue) and control (red) (5 animals). The data showed significantly higher probability (p<0.05; two-sided t-test, no adjustment for multiple comparison) of monotonic shift in 3 out of 5 animals (p = 0.18; binomial cumulative function). b, Responses to a reward-predicting odor in all animals (center, mean ± sem). early: 0–1 s from odor onset (green); late: 2–3 s from odor onset (magenta). Middle, difference between early and late odor responses (grey: each animal; orange: mean ± sem). Right, dopamine activity in the late phase was not significantly higher than activity in the early phase during the first 2–10 trials (t = −1.3, p = 0.25; two-sided t-test). n = 5 animals. c, Left, time-course of center of mass during delay periods (0–3 s after cue onset) over training. Right, average centers of mass during 2–10 trial in all animals were not significantly later than half time point (1.5 s) (left, nothing to reward, t = 1.6, p = 0.17; two-sided t-test). n = 5 animals. Center of boxplot showing median, edges are 25th and 75th percentile, and whiskers are most extreme data points. ***p<0.001.
Extended Data Fig. 10 Recording sites and dopamine cue responses in deep 2-photon imaging.
a, Recording site for each animal is shown in coronal views (Paxinos and Franklin61). b, Example coronal section of a recording site and GCaMP expression in the VTA. An asterisk indicates fiber tip location. Other 4 animal also showed similar result as shown in (a). Scale bar, 1 mm. c, Probability of monotonic shift in randomly sampled 3 activity peaks in actual data (blue) and control (red). The neurons with sufficient detected peaks were used (21/36 neurons for nothing to reward, 16/36 neurons for airpuff to reward, see Methods). Sampling was repeated for 500 times for each neuron. The pi-chart summarizes the number of neurons with significantly higher probability of monotonic shift (p<0.05; two-sided t-test, no adjustment for multiple comparison, Data > Ctrl; blue), significantly lower probability of monotonic shift (p<0.05; two-sided t-test, no adjustment for multiple comparison, Data < Ctrl; orange), and not significantly different (n.s.; two-sided t-test, no adjustment for multiple comparison; grey) compared to control (left, nothing to reward, p = 0.039; right, airpuff to reward, p = 0.038, binomial cumulative function). d,e, Responses to a reward-predicting odor in an example neuron (left column of b) and all neurons (left column of c, n = 36, mean ± sem). early: 0–1 s from odor onset (green); late: 2–3 s from odor onset (magenta). Right column of b and middle column of c, difference between early and late odor responses (mean ± sem for c). Right column of c, dopamine activity in the late phase was significantly higher than activity in the early phase during the first 1/3 of learning phase for nothing to reward (t = 2.5, p = 0.015; two-sided t-test), and airpuff to reward (t = 3.1, p = 0.35 × 10−2; two-sided t-test). f, Time-course of center of mass (0–3 s after cue onset) over training and average centers of mass during the first 1/3 learning phase in all neurons used for linear fitting analysis (left, nothing to reward, n = 35, t = −1.0, p = 0.30; middle, airpuff to reward, n = 36, t = −2.0, p = 0.048; right, both types are pooled, t = −2.2, p = 0.030; two-sided t-test). Center of boxplot showing median, edges are 25th and 75th percentile, and whiskers are most extreme data points not considered as outlier. *p<0.05, **p<0.01.
Supplementary information
Supplementary Information
Supplementary Figs. 1–3.
Supplementary Data 1
Temporal difference model code.
Source data
Source Data Fig. 2
Statistical Source Data.
Source Data Fig. 3
Statistical Source Data.
Source Data Fig. 4
Statistical Source Data.
Source Data Fig. 5
Statistical Source Data.
Source Data Extended Data Fig. 2
Statistical Source Data.
Source Data Extended Data Fig. 3
Statistical Source Data.
Source Data Extended Data Fig. 4
Statistical Source Data.
Source Data Extended Data Fig. 5
Statistical Source Data.
Source Data Extended Data Fig. 6
Statistical Source Data.
Source Data Extended Data Fig. 7
Statistical Source Data.
Source Data Extended Data Fig. 8
Statistical Source Data.
Source Data Extended Data Fig. 9
Statistical Source Data.
Source Data Extended Data Fig. 10
Statistical Source Data.
Rights and permissions
About this article
Cite this article
Amo, R., Matias, S., Yamanaka, A. et al. A gradual temporal shift of dopamine responses mirrors the progression of temporal difference error in machine learning. Nat Neurosci 25, 1082–1092 (2022). https://doi.org/10.1038/s41593-022-01109-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41593-022-01109-2
This article is cited by
-
Dopamine release plateau and outcome signals in dorsal striatum contrast with classic reinforcement learning formulations
Nature Communications (2024)
-
Dopamine dynamics are dispensable for movement but promote reward responses
Nature (2024)
-
Dimensionality reduction beyond neural subspaces with slice tensor component analysis
Nature Neuroscience (2024)
-
Explaining dopamine through prediction errors and beyond
Nature Neuroscience (2024)
-
Striatal dopamine signals reflect perceived cue–action–outcome associations in mice
Nature Neuroscience (2024)