Action initiation shapes mesolimbic dopamine encoding of future rewards

Syed, Emilie C J; Grima, Laura L; Magill, Peter J; Bogacz, Rafal; Brown, Peter; Walton, Mark E

doi:10.1038/nn.4187

Brief Communication
Published: 07 December 2015

Action initiation shapes mesolimbic dopamine encoding of future rewards

Emilie C J Syed^1,2,3,
Laura L Grima³,
Peter J Magill ORCID: orcid.org/0000-0001-7141-7071²,
Rafal Bogacz^1,2,
Peter Brown^1,2^na1 &
…
Mark E Walton³^na1

Nature Neuroscience volume 19, pages 34–36 (2016)Cite this article

11k Accesses
132 Citations
44 Altmetric
Metrics details

Subjects

Abstract

It is widely held that dopamine signaling encodes predictions of future rewards and such predictions are regularly used to drive behavior, but the relationship between these two is poorly defined. We found in rats that nucleus accumbens dopamine following a reward-predicting cue was attenuated unless movement was correctly initiated. Our results indicate that dopamine release in this region is contingent on correct action initiation and not just reward prediction.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Go/No-Go task in experiment 1.**

**Figure 2: NAcc dopamine signals on Go and No-Go trials in experiment 1.**

**Figure 3: Experiment 2: behavior and dopamine signals.**

Reward expectation enhances action-related activity of nigral dopaminergic and two striatal output pathways

Article Open access 06 September 2023

The cost of obtaining rewards enhances the reward prediction error signal of midbrain dopamine neurons

Article Open access 15 August 2019

Signals of anticipation of reward and of mean reward rates in the human brain

Article Open access 09 March 2020

References

Montague, P.R., Dayan, P. & Sejnowski, T.J. J. Neurosci. 16, 1936–1947 (1996).
Article CAS Google Scholar
Schultz, W., Dayan, P. & Montague, P.R. Science 275, 1593–1599 (1997).
Article CAS Google Scholar
Bayer, H.M. & Glimcher, P.W. Neuron 47, 129–141 (2005).
Article CAS Google Scholar
Cohen, J.Y., Haesler, S., Vong, L., Lowell, B.B. & Uchida, N. Nature 482, 85–88 (2012).
Article CAS Google Scholar
Gan, J.O., Walton, M.E. & Phillips, P.E. Nat. Neurosci. 13, 25–27 (2010).
Article CAS Google Scholar
Tobler, P.N., Fiorillo, C.D. & Schultz, W. Science 307, 1642–1645 (2005).
Article CAS Google Scholar
Roesch, M.R., Calu, D.J. & Schoenbaum, G. Nat. Neurosci. 10, 1615–1624 (2007).
Article CAS Google Scholar
Day, J.J., Roitman, M.F., Wightman, R.M. & Carelli, R.M. Nat. Neurosci. 10, 1020–1028 (2007).
Article CAS Google Scholar
Howe, M.W., Tierney, P.L., Sandberg, S.G., Phillips, P.E. & Graybiel, A.M. Nature 500, 575–579 (2013).
Article CAS Google Scholar
Nicola, S.M. J. Neurosci. 30, 16585–16600 (2010).
Article CAS Google Scholar
Robbins, T.W. & Everitt, B.J. Psychopharmacology (Berl.) 191, 433–437 (2007).
Article CAS Google Scholar
Flagel, S.B. et al. Nature 469, 53–57 (2011).
Article CAS Google Scholar
Phillips, P.E., Stuber, G.D., Heien, M.L., Wightman, R.M. & Carelli, R.M. Nature 422, 614–618 (2003).
Article CAS Google Scholar
Roitman, M.F., Stuber, G.D., Phillips, P.E., Wightman, R.M. & Carelli, R.M. J. Neurosci. 24, 1265–1271 (2004).
Article CAS Google Scholar
Guitart-Masip, M. et al. Proc. Natl. Acad. Sci. USA 109, 7511–7516 (2012).
Article CAS Google Scholar
Jin, X. & Costa, R.M. Nature 466, 457–462 (2010).
Article CAS Google Scholar
Bari, A. & Robbins, T.W. Prog. Neurobiol. 108, 44–79 (2013).
Article Google Scholar
Phillips, P.E., Robinson, D.L., Stuber, G.D., Carelli, R.M. & Wightman, R.M. Methods Mol. Med. 79, 443–464 (2003).
CAS PubMed Google Scholar
Clark, J.J. et al. Nat. Methods 7, 126–129 (2010).
Article CAS Google Scholar
Heien, M.L., Phillips, P.E., Stuber, G.D., Seipel, A.T. & Wightman, R.M. Analyst 128, 1413–1419 (2003).
Article CAS Google Scholar
Wanat, M.J., Kuhnen, C.M. & Phillips, P.E. J. Neurosci. 30, 12020–12027 (2010).
Article CAS Google Scholar
Heien, M.L., Johnson, M.A. & Wightman, R.M. Anal. Chem. 76, 5697–5704 (2004).
Article CAS Google Scholar
Heien, M.L. et al. Proc. Natl. Acad. Sci. USA 102, 10023–10028 (2005).
Article CAS Google Scholar
Green, D.M. & Swets, J.A. Signal Detection Theory and Psychophysics (Wiley, New York, 1966).
Sutton, R.S. & Barto, A.C. Reinforcement Learning: An Introduction (MIT Press, London, 1998).

Download references

Acknowledgements

We would like to thank M. Baudonnat and L. Tankelevitch for assistance with data collection, N. Hollon and N. Kolling for analysis advice, and S. Ng-Evans for technical support. This work was funded by a Wellcome Trust Research Career Development Fellowship (WT090051MA to M.E.W.), the Medical Research Council UK (awards MC_UU_12020/5 and MC_UU_12024/2 to P.J.M., MC_UU_12024/5 to R.B., and MC_UU_12024/1 to P.B.), and a studentship from the Economic and Social Research Council and St John's College Oxford (to L.L.G.).

Author information

Peter Brown and Mark E Walton: These authors jointly directed this work.

Authors and Affiliations

Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, UK
Emilie C J Syed, Rafal Bogacz & Peter Brown
Department of Pharmacology, Medical Research Council Brain Network Dynamics Unit, University of Oxford, Oxford, UK
Emilie C J Syed, Peter J Magill, Rafal Bogacz & Peter Brown
Department of Experimental Psychology, University of Oxford, Oxford, UK
Emilie C J Syed, Laura L Grima & Mark E Walton

Authors

Emilie C J Syed
View author publications
You can also search for this author in PubMed Google Scholar
Laura L Grima
View author publications
You can also search for this author in PubMed Google Scholar
Peter J Magill
View author publications
You can also search for this author in PubMed Google Scholar
Rafal Bogacz
View author publications
You can also search for this author in PubMed Google Scholar
Peter Brown
View author publications
You can also search for this author in PubMed Google Scholar
Mark E Walton
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

P.B. conceived the core study, and P.B., P.J.M. and M.E.W. developed this and, with E.C.J.S., planned the experiments. For experiments 1 and 2, E.C.J.S. performed surgeries, and collected and analyzed the data. For experiment 2, L.L.G. also performed surgery and collected the data. R.B. performed the simulations. M.E.W. supervised the study. M.E.W. and E.C.J.S. prepared the manuscript with input from the other authors.

Corresponding author

Correspondence to Mark E Walton.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Representation of recording sites.

(a) Schematic, along with an example photomicrograph, of the recording locations in the nucleus accumbens core in experiment 1 (n = 7 electrodes in 7 rats). (b) Schematic of the recording locations in the nucleus accumbens core in experiment 2 (n = 9 electrodes in 6 rats). The numbers next to each section indicate distance in mm anterior to bregma. Adapted from the atlas of Paxinos and Watson (2005).

Supplementary Figure 2 Behavioral performance and simulations of the actor-critic model in experiment 1.

(a) Average success rates, (b) holding times from cue onset to head exit (mean ± S.D.), and (c) reward delivery times after cue onset during experiment 1. For all box plots, the central mark is the median, the edges of the box are the 25th and 75th percentiles, and the whiskers extend to the most extreme data points. (d) Box plots of success rates derived from simulations. Note that the model also found the No-Go trials more difficult to perform correctly as there was only a single sequence of actions that resulted in the delivery of the pellet, while all other sequences of actions resulted in the light being turned on signaling an error. By contrast, on Go trials there existed multiple potential sequences that were not immediately incorrect. (e) Average RPEs (mean ± S.E.M) recorded in 0.5 s time steps for the different trial types for 7 simulated “rats”. Though the model was trained over a long period (that parallels the extended training received by the rats in experiment 1) to achieve qualitatively similar discrepancies in success rates as the real rats, there is nonetheless a positive RPE at cue presentation in all conditions. This occurred because the simulated animal was (i) unable to estimate time precisely from past events, and (ii) its accuracy was not at 100%. In particular, there was a large increase in the RPE at cue onset, because after reward delivery, the next trial could only begin after the ITI had ended, 5 s after reward delivery, so the simulated animal could not fully predict if entering and staying in the nose-poke would initiate a trial. The real rats in experiment 1 in fact made nose-poke responses during the ITI (< 4.5 s since previous reward delivery) on ~20% of trials, which could indicate that they also did not estimate time precisely and hence could not fully predict if staying in the nose poke would initiate a trial. The RPE at cue presentation was numerically smaller on No-Go than on Go trial, because, just like the real rats, the simulated animals had lower accuracy on No-Go trials; thus, they estimated a lower value for the state associated with No-Go cue. There was also an increase in the RPE at the time of reward delivery. This occurred because the same action had to be executed multiple times to result in the pellet delivery, and the state of the simulated animals did not include any information on its past actions, so the simulated animal could not fully predict if its action would result in pellet delivery in the next time step.

Supplementary Figure 3 Dopamine signals in experiment 1 to reward delivery and collection.

Dopamine data was aligned to reward delivery (a) and reward collection from the food magazine (b). Reward delivery occurred after the 2^nd lever press on Go trials, and at cue offset on No-Go trials. The food magazine was situated on the opposite wall of the chamber to the nosepoke and levers. It took rats on average 1.85 s or 2.04 s for Go or No-Go trials, respectively, to reach the food magazine after reward delivery.

Supplementary Figure 4 Dopamine signals in delayed Go trials.

Upper panels display the unsmoothed average dopamine signals from experiment 1 (mean ± S.E.M.) recorded during ‘valid’ Go trials (post-cue RT < 1.7 s; blue filled line) or ‘delayed’ Go trials (post-cue RT > 1.7 s; dotted cyan line) (NB: delayed Go trials were not included in any other analyses) aligned to either (a) cue onset or (b) time of head exit from nose-poke. Lower panels show average discriminability between the Valid and Delayed Go trials for each timepoint (shaded area = population of 1000 permuted sessions; line: *, p < 0.05 permutation tests, corrected for multiple comparisons). (c) Boxplot of the nose-poke holding times (central mark is the median, the edges of the box are the 25^th and 75^th percentiles, the whiskers extend to the most extreme data points) for the valid or delayed Go trials (*, p = 0.02, W₇= 0, Wilcoxon Signed Ranks Test)

Supplementary Figure 5 Dopamine predicts Wrong Go selections prior to the error being signaled.

Average unsmoothed dopamine release from experiment 1 (mean ± S.E.M.) recorded during successful (filled line) and wrong (dotted line) Go trials (trials where the animal correctly initiated an action but selected the wrong lever) aligned to the time of (a, d) cue onset, (b) head exit from nose-poke or (c) the first lever press (when the error is signaled). Lower panels show average discriminability between the Correct and Wrong Go trials for each timepoint (shaded area = population of 1000 permuted sessions; line: *, p < 0.05 permutation tests, corrected for multiple comparisons). (d-e) Comparison of dopamine release on successful and wrong Go trials as a function of speed of action initiation. (d) Successful (blue lines) or Wrong (red lines) Go trials sorted by “fast” (< 1 s, filled line) or “slow” (> 1 s, dotted line) nose-poke exit latency. (e) Regression weights for trial accuracy and action initiation latency for Go trials (a.u., arbitrary units). On average, rats were both significantly slower to exit the nose poke (p = 0.03, W₇ = 1, Wilcoxon Signed Rank Test) and to move from the nose poke to make a lever press on Wrong trials (p = 0.02, W₇ = 0, Wilcoxon Signed Rank Test). Nonetheless, these differences could not fully explain the different patterns of dopamine release on the Correct and Wrong trials.

Supplementary Figure 6 Dopamine on failed No-Go trials decreases when the error is signaled.

Average unsmoothed dopamine signals from experiment 1 (mean ± S.E.M.) recorded during successful (filled line) and failed (dotted line) No-Go trials (trials where the animal exited the nose poke before the end of the No-Go hold period) aligned to time of (a) cue onset or (b) head exit from nose-poke (when the error is signaled). Lower panels show average discriminability between the Correct and Failed No-Go trials for each timepoint (shaded area = population of 1000 permuted sessions; line: *, p < 0.05 permutation tests, corrected for multiple comparisons). (c) Boxplot of the average coefficients for the slopes of the linear fit of the dopamine signal in a 2.5 s window following cue onset for correct and failed No-Go trials (central mark is the median, the edges of the box are the 25^th and 75^th percentiles, the whiskers extend to the most extreme data points). The decrease in DA following cue onset is significantly greater in failed no-go trials than in successful no-go trials (*, p = 0.02, W₇ = 0, Wilcoxon Signed Rank Test).

Supplementary Figure 7 Dopamine release scales with success rates on Go trials in experiment 1.

(a) Average dopamine release (mean ± S.E.M.) recorded during successful Go Left and Right trials in experiment 1, arranged in each animal by performance (best / highest success rate = red line, worst / lowest success rate = blue line). Lower panels show average discriminability (black line) between the Best and Worst Go trials for each time point (grey lines = population of 1000 permuted sessions; black bars with *, p < 0.05 permutation tests, corrected for multiple comparisons). Note the mean head exit latency was the same between the two conditions (Best Go = 0.59 s, Worst Go = 0.59 s). (b) Patterns of dopamine release on Go trials with equivalent nose-poke exit times, divided into trials with fast (< 1 s) or slow (> 1 s) response times to make the 1^st lever press (filled and dashed blue lines, respectively), were qualitatively very similar.

Supplementary Figure 8 Behavioral performance in experiment 2.

(a) Average success rates in experiment 2. Central mark is the median, box edges are 25^th and 75^th percentiles, whiskers extend to the most extreme data points not considered outliers (points 1.67 x interquartile range away from the 25^th or 75^th percentile), and outliers are plotted individually. The two outlier points in the Go Small condition come from the same animal. (b) Holding times from cue onset to head exit (mean ± S.D.) on Correct, Wrong Response (wrong lever pressed) or No Response (no lever pressed within 5 s of cue onset) Go trials, or Correct or Failed (premature exit from nose poke) No-Go trials. (*, both p = 0.03, W₆ = 0, Wilcoxon Signed Rank Test).

Supplementary Figure 9 Comparison between trial types in experiment 2.

Average discriminability between the different trial types in experiment 2 for each time point (grey lines = max and min from population of 1000 permuted sessions; line: *, p < 0.05 permutation tests, corrected for multiple comparisons). (a) Go Large v Go Small, (b) No-Go Large v Go Small, (c) No-Go Small v Go Small, (d) Go Large v No-Go Small, (e) No-Go Large v No-Go Small, (f) Go Large v No-Go Small.

Supplementary Figure 10 Dopamine release on No-Go trials in experiment 2 reflects correct action initiation.

(a, b) Average dopamine release (mean ± S.E.M.) on correct No-Go trials (green lines) compared to incorrect No-Go trials when the animals left the nose-poke prematurely (blue lines). Data is aligned to cue onset (a) or nose-poke exit (b). Large Reward trials are plotted with filled lines, Small Reward with dashed lines. Owing to the smaller proportion of Go trials in each session and increased success rate on Go Large trials, there were too few Incorrect Go trials to analyse in experiment 2 (< 2%). (c, d) Average No-Go dopamine release (mean ± SEM) on a subset of correctly performed No-Go trials where the rats’ exit from the nose-poke was either “fast” (< 600 ms after cue offset, filled green line) or “slow” (> 1,600 ms after cue offset, dashed green line). Data is aligned to cue offset (which occurred 1 s before reward delivery in experiment 2) (c) or nose-poke exit (d). Therefore, the fast head exit times occur > 400 ms before reward delivery and the slow ones > 600 ms after reward delivery.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–10 and Supplementary Table 1 (PDF 1346 kb)

Supplementary Methods Checklist (PDF 465 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Syed, E., Grima, L., Magill, P. et al. Action initiation shapes mesolimbic dopamine encoding of future rewards. Nat Neurosci 19, 34–36 (2016). https://doi.org/10.1038/nn.4187

Download citation

Received: 01 September 2015
Accepted: 04 November 2015
Published: 07 December 2015
Issue Date: January 2016
DOI: https://doi.org/10.1038/nn.4187

This article is cited by

State and rate-of-change encoding in parallel mesoaccumbal dopamine pathways
- Johannes W. de Jong
- Yilan Liang
- Stephan Lammel
Nature Neuroscience (2024)
Integrated omics analysis reveals the alteration of gut microbiota and fecal metabolites in Cervus elaphus kansuensis
- Zhenxiang Zhang
- Changhong Bao
- Yanxia Chen
Applied Microbiology and Biotechnology (2024)
Dopamine transporter blockade during adolescence increases adult dopamine function, impulsivity, and aggression
- Deepika Suri
- Giulia Zanni
- Mark S. Ansorge
Molecular Psychiatry (2023)
Glutamatergic dysfunction leads to a hyper-dopaminergic phenotype through deficits in short-term habituation: a mechanism for aberrant salience
- Marios C. Panayi
- Thomas Boerner
- David M. Bannerman
Molecular Psychiatry (2023)
Mobilization of endocannabinoids by midbrain dopamine neurons is required for the encoding of reward prediction
- Miguel Á. Luján
- Dan P. Covey
- Joseph F. Cheer
Nature Communications (2023)