Selective cortical representation of attended speaker in multi-talker speech perception

Mesgarani, Nima; Chang, Edward F.

doi:10.1038/nature11020

Letter
Published: 18 April 2012

Selective cortical representation of attended speaker in multi-talker speech perception

Nima Mesgarani¹ &
Edward F. Chang¹

Nature volume 485, pages 233–236 (2012)Cite this article

17k Accesses
628 Citations
201 Altmetric
Metrics details

Subjects

Abstract

Humans possess a remarkable ability to attend to a single speaker’s voice in a multi-talker background^1,2,3. How the auditory system manages to extract intelligible speech under such acoustically complex and adverse listening conditions is not known, and, indeed, it is not clear how attended speech is internally represented^4,5. Here, using multi-electrode surface recordings from the cortex of subjects engaged in a listening task with two simultaneous speakers, we demonstrate that population responses in non-primary human auditory cortex encode critical features of attended speech: speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal the salient spectral and temporal features of the attended speaker, as if subjects were listening to that speaker alone. A simple classifier trained solely on examples of single speakers can decode both attended words and speaker identity. We find that task performance is well predicted by a rapid increase in attention-modulated neural selectivity across both single-electrode and population-level cortical responses. These findings demonstrate that the cortical representation of speech does not merely reflect the external acoustic environment, but instead gives rise to the perceptual aspects relevant for the listener’s intended goal.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

Figure 1: **Acoustic and neural reconstructed spectrograms for speech from a single speaker or a mixture of speakers.**

Figure 2: **Quantifying the attentional modulation of neural responses.**

Figure 3: **Decoding spoken words and the identity of the attended speaker.**

Figure 4: **Attentional modulation of individual electrode sites.**

Large-scale single-neuron speech sound encoding across the depth of human cortex

Article Open access 13 December 2023

Temporal coherence shapes cortical responses to speech mixtures in a ferret cocktail party

Article Open access 25 October 2024

Subcortical responses to music and speech are alike while cortical responses diverge

Article Open access 08 January 2024

References

Cherry, E. C. Some experiments on the recognition of speech, with one and with two ears. J. Acoust. Soc. Am. 25, 975–979 (1953)
Article ADS Google Scholar
Shinn-Cunningham, B. G. Object-based auditory and visual attention. Trends Cogn. Sci. 12, 182–186 (2008)
Article Google Scholar
Bregman, A. S. Auditory Scene Analysis: The Perceptual Organization of Sound (MIT Press, 1994)
Google Scholar
Kerlin, J., Shahin, A. & Miller, L. Attentional gain control of ongoing cortical speech representations in a “cocktail party”. J. Neurosci. 30, 620–628 (2010)
Article CAS Google Scholar
Besle, J. et al. Tuning of the human neocortex to the temporal dynamics of attended events. J. Neurosci. 31, 3176–3185 (2011)
Article CAS Google Scholar
Bee, M. & Micheyl, C. The cocktail party problem: what is it? How can it be solved? And why should animal behaviorists study it? J. Comparative Psychol. 122, 235–252 (2008)
Article Google Scholar
Shinn-Cunningham, B. G. & Best, V. Selective attention in normal and impaired hearing. Trends Amplif. 12, 283–299 (2008)
Article Google Scholar
Scott, S. K., Rosen, S., Beaman, C. P., Davis, J. P. & Wise, R. J. S. The neural processing of masked speech: evidence for different mechanisms in the left and right temporal lobes. J. Acoust. Soc. Am. 125, 1737–1743 (2009)
Article ADS Google Scholar
Elhilali, M., Xiang, J., Shamma, S. A. & Simon, J. Z. Interaction between attention and bottom-up saliency mediates the representation of foreground and background in an auditory scene. PLoS Biol. 7, e1000129 (2009)
Article Google Scholar
Chang, E. F. et al. Categorical speech representation in human superior temporal gyrus. Nature Neurosci. 13, 1428–1432 (2010)
Article CAS Google Scholar
Crone, N. E., Boatman, D., Gordon, B. & Hao, L. Induced electrocorticographic gamma activity during auditory perception. Clin. Neurophysiol. 112, 565–582 (2001)
Article CAS Google Scholar
Steinschneider, M., Fishman, Y. I. & Arezzo, J. C. Spectrotemporal analysis of evoked and induced electroencephalographic responses in primary auditory cortex (A1) of the awake monkey. Cereb. Cortex 18, 610–625 (2008)
Article Google Scholar
Scott, S. K. & Johnsrude, I. S. The neuroanatomical and functional organization of speech perception. Trends Neurosci. 26, 100–107 (2003)
Article CAS Google Scholar
Hackett, T. A. Information flow in the auditory cortical network. Hear. Res. 271, 133–146 (2011)
Article Google Scholar
Bolia, R. S., Nelson, W. T., Ericson, M. A. & Simpson, B. D. A speech corpus for multitalker communications research. J. Acoust. Soc. Am. 107, 1065–1066 (2000)
Article CAS ADS Google Scholar
Brungart, D. S. Informational and energetic masking effects in the perception of two simultaneous talkers. J. Acoust. Soc. Am. 109, 1101–1109 (2001)
Article CAS ADS Google Scholar
Mesgarani, N., David, S. V., Fritz, J. B. & Shamma, S. A. Influence of context and behavior on stimulus reconstruction from neural activity in primary auditory cortex. J. Neurophysiol. 102, 3329–3339 (2009)
Article Google Scholar
Bialek, W., Rieke, F., de Ruyter van Steveninck, R. R. & Warland, D. Reading a neural code. Science 252, 1854–1857 (1991)
Article CAS ADS Google Scholar
Pasley, B. N. et al. Reconstructing speech from human auditory cortex. PLoS Biol. 10, e1001251 (2012)
Article CAS Google Scholar
Garofolo, J. S. et al. TIMIT Acoustic-Phonetic Continuous Speech Corpus (Linguistic Data Consortium, 1993)
Rifkin, R., Yeo, G. & Poggio, T. Regularized least-squares classification. Nato Science Series Sub Series III Computer and Systems Sciences 190, 131–154 (2003)
Google Scholar
Formisano, E., De Martino, F., Bonte, M. & Goebel, R. “Who” is saying “what”? Brain-based decoding of human voice and speech. Science 322, 970–973 (2008)
Article CAS ADS Google Scholar
Staeren, N., Renvall, H., De Martino, F., Goebel, R. & Formisano, E. Sound categories are represented as distributed patterns in the human auditory cortex. Curr. Biol. 19, 498–502 (2009)
Article CAS Google Scholar
Shamma, S. A., Elhilali, M. & Micheyl, C. Temporal coherence and attention in auditory scene analysis. Trends Neurosci. 34, 114–123 (2010)
Article Google Scholar
Darwin, C. J. Auditory grouping. Trends Cogn. Sci. 1, 327–333 (1997)
Article CAS Google Scholar
Warren, R. M. Perceptual restoration of missing speech sounds. Science 167, 392–393 (1970)
Article CAS ADS Google Scholar
Kidd, G., Jr, Arbogast, T. L., Mason, C. R. & Gallun, F. J. The advantage of knowing where to listen. J. Acoust. Soc. Am. 118, 3804–3815 (2005)
Article ADS Google Scholar
Shen, W., Olive, J. & Jones, D. Two protocols comparing human and machine phonetic discrimination performance in conversational speech. INTERSPEECH 1630–1633. (2008)
Cooke, M., Hershey, J. R. & Rennie, S. J. Monaural speech separation and recognition challenge. Comput. Speech Lang. 24, 1–15 (2010)
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank A. Ren for technical help, and C. Micheyl, S. Shamma and C. Schreiner for critical discussion and reading of the manuscript. E.F.C. was funded by National Institutes of Health grants R00-NS065120, DP2-OD00862, R01-DC012379, and the Ester A. and Joseph Klingenstein Foundation.

Author information

Authors and Affiliations

Departments of Neurological Surgery and Physiology, UCSF Center for Integrative Neuroscience, University of California, San Francisco, 94143, California, USA
Nima Mesgarani & Edward F. Chang

Authors

Nima Mesgarani
View author publications
You can also search for this author in PubMed Google Scholar
Edward F. Chang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

N.M. and E.F.C. designed the experiment, collected the data, evaluated results and wrote the manuscript.

Corresponding author

Correspondence to Edward F. Chang.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Figures

This file contains Supplementary Figures 1-3. (PDF 1436 kb)

PowerPoint slides

PowerPoint slide for Fig. 1

PowerPoint slide for Fig. 2

PowerPoint slide for Fig. 3

PowerPoint slide for Fig. 4

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mesgarani, N., Chang, E. Selective cortical representation of attended speaker in multi-talker speech perception. Nature 485, 233–236 (2012). https://doi.org/10.1038/nature11020

Download citation

Received: 30 August 2011
Accepted: 05 March 2012
Published: 18 April 2012
Issue Date: 10 May 2012
DOI: https://doi.org/10.1038/nature11020

This article is cited by

Subcortical responses to music and speech are alike while cortical responses diverge
- Tong Shan
- Madeline S. Cappelloni
- Ross K. Maddox
Scientific Reports (2024)
How does the human brain process noisy speech in real life? Insights from the second-person neuroscience perspective
- Zhuoran Li
- Dan Zhang
Cognitive Neurodynamics (2024)
Dissecting neural computations in the human auditory pathway using deep neural networks for speech
- Yuanning Li
- Gopala K. Anumanchipalli
- Edward F. Chang
Nature Neuroscience (2023)
Emergence of the cortical encoding of phonetic features in the first year of life
- Giovanni M. Di Liberto
- Adam Attaheri
- Usha Goswami
Nature Communications (2023)
Induced alpha and beta electroencephalographic rhythms covary with single-trial speech intelligibility in competition
- Vibha Viswanathan
- Hari M. Bharadwaj
- Barbara G. Shinn-Cunningham
Scientific Reports (2023)