Speaker discrimination in humans and machines: Effects of speaking style variability

Afshan, Amber; Kreiman, Jody; Alwan, Abeer

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2008.03617 (eess)

[Submitted on 8 Aug 2020]

Title:Speaker discrimination in humans and machines: Effects of speaking style variability

Authors:Amber Afshan, Jody Kreiman, Abeer Alwan

View PDF

Abstract:Does speaking style variation affect humans' ability to distinguish individuals from their voices? How do humans compare with automatic systems designed to discriminate between voices? In this paper, we attempt to answer these questions by comparing human and machine speaker discrimination performance for read speech versus casual conversations. Thirty listeners were asked to perform a same versus different speaker task. Their performance was compared to a state-of-the-art x-vector/PLDA-based automatic speaker verification system. Results showed that both humans and machines performed better with style-matched stimuli, and human performance was better when listeners were native speakers of American English. Native listeners performed better than machines in the style-matched conditions (EERs of 6.96% versus 14.35% for read speech, and 15.12% versus 19.87%, for conversations), but for style-mismatched conditions, there was no significant difference between native listeners and machines. In all conditions, fusing human responses with machine results showed improvements compared to each alone, suggesting that humans and machines have different approaches to speaker discrimination tasks. Differences in the approaches were further confirmed by examining results for individual speakers which showed that the perception of distinct and confused speakers differed between human listeners and machines.

Comments:	Accepted to Interspeech 2020
Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Signal Processing (eess.SP)
Cite as:	arXiv:2008.03617 [eess.AS]
	(or arXiv:2008.03617v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2008.03617

Submission history

From: Amber Afshan [view email]
[v1] Sat, 8 Aug 2020 22:59:46 UTC (1,170 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Speaker discrimination in humans and machines: Effects of speaking style variability

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Speaker discrimination in humans and machines: Effects of speaking style variability

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators