skip to main content
10.1145/3593013.3594049acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfacctConference Proceedingsconference-collections
research-article
Open access

Augmented Datasheets for Speech Datasets and Ethical Decision-Making

Published: 12 June 2023 Publication History

Abstract

Speech datasets are crucial for training Speech Language Technologies (SLT); however, the lack of diversity of the underlying training data can lead to serious limitations in building equitable and robust SLT products, especially along dimensions of language, accent, dialect, variety, and speech impairment—and the intersectionality of speech features with socioeconomic and demographic features. Furthermore, there is often a lack of oversight on the underlying training data—commonly built on massive web-crawling and/or publicly available speech—with regard to the ethics of such data collection. To encourage standardized documentation of such speech data components, we introduce an augmented datasheet for speech datasets1, which can be used in addition to “Datasheets for Datasets” [78]. We then exemplify the importance of each question in our augmented datasheet based on in-depth literature reviews of speech data used in domains such as machine learning, linguistics, and health. Finally, we encourage practitioners—ranging from dataset creators to researchers—to use our augmented datasheet to better define the scope, properties, and limits of speech datasets, while also encouraging consideration of data-subject protection and user community empowerment. Ethical dataset creation is not a one-size-fits-all process, but dataset creators can use our augmented datasheet to reflexively consider the social context of related SLT applications and data sources in order to foster more inclusive SLT products downstream.

References

[1]
[n. d.]. Enable the profanity filter; cloud speech-to-text documentation google cloud. https://cloud.google.com/speech-to-text/docs/profanity-filter
[2]
Basil Abraham, Danish Goel, Divya Siddarth, Kalika Bali, Manu Chopra, Monojit Choudhury, Pratik Joshi, Preethi Jyoti, Sunayana Sitaram, and Vivek Seshadri. 2020. Crowdsourcing speech data for low-resource languages from low-income workers. In Proceedings of the 12th Language Resources and Evaluation Conference. 2819–2826.
[3]
Martine Adda-Decker and Lori Lamel. 2000. The use of lexica in automatic speech recognition. Lexicon Development for Speech and Language Processing (2000), 235–266.
[4]
Devaraja Adiga, Rishabh Kumar, Amrith Krishna, Preethi Jyothi, Ganesh Ramakrishnan, and Pawan Goyal. 2021. Automatic speech recognition in Sanskrit: A new speech corpus and modelling insights. arXiv preprint arXiv:2106.05852 (2021).
[5]
Andrea Agostinelli, Timo I Denk, Zalán Borsos, Jesse Engel, Mauro Verzetti, Antoine Caillon, Qingqing Huang, Aren Jansen, Adam Roberts, Marco Tagliasacchi, 2023. MusicLM: Generating Music From Text. arXiv preprint arXiv:2301.11325 (2023).
[6]
Ana Aguiar, Mariana Kaiseler, Mariana Cunha, Hugo Meinedo, J Silva, T Abrudan, and PR Almeida. 2014. VOCE Corpus: Ecologically Collected Speech Annotated with Physiological and Psychological Stress Assessments. In Proceedings of the Ninth International Conference on Language Resources. 1568–1574.
[7]
Afroz Ahamad, Ankit Anand, and Pranesh Bhargava. 2020. Accentdb: A database of non-native english accents to assist neural speech recognition. arXiv preprint arXiv:2005.07973 (2020).
[8]
Shafayat Ahmed, Nafis Sadeq, Sudipta Saha Shubha, Md Nahidul Islam, Muhammad Abdullah Adnan, and Mohammad Zuberul Islam. 2020. Preparation of bangla speech corpus from publicly available audio & text. In Proceedings of The 12th language resources and evaluation conference. 6586–6592.
[9]
Ahmed Ali, Stephan Vogel, and Steve Renals. 2017. Speech recognition challenge in the wild: Arabic MGB-3. In 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 316–322.
[10]
Khalid Almeman, Mark Lee, and Ali Abdulrahman Almiman. 2013. Multi dialect Arabic speech parallel corpora. In 2013 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA). 1–6. https://doi.org/10.1109/ICCSPA.2013.6487288
[11]
Jerone TA Andrews, Dora Zhao, William Thong, Apostolos Modas, Orestis Papakyriakopoulos, Shruti Nagpal, and Alice Xiang. 2023. Ethical considerations for collecting human-centric image datasets. arXiv preprint arXiv:2302.03629 (2023).
[12]
Rosana Ardila, Megan Branson, Kelly Davis, Michael Henretty, Michael Kohler, Josh Meyer, Reuben Morais, Lindsay Saunders, Francis M Tyers, and Gregor Weber. 2019. Common voice: A massively-multilingual speech corpus. arXiv preprint arXiv:1912.06670 (2019).
[13]
J.-U. Bang, S. Yun, M.-Y. Kim, S.-H.and Choi, Lee, M.-K., Y.-J. Kim, D.-H. Kim, J. Park, Y.-J. Lee, and S.-H. Kim. 2020. KsponSpeech: Korean Spontaeneous Sopeech Corpus for Automatic Speech Recognition. Applied Sciences 10, 19 (2020), 6369.
[14]
Subham Banga, Ujjwal Upadhyay, Piyush Agarwal, Aniket Sharma, and Prerana Mukherjee. 2019. Indian EmoSpeech Command Dataset: A dataset for emotion based speech recognition in the wild. arXiv preprint arXiv:1910.13801 (2019).
[15]
Jon Barker, Shinji Watanabe, Emmanuel Vincent, and Jan Trmal. 2018. The fifth’CHiME’speech separation and recognition challenge: dataset, task and baselines. arXiv preprint arXiv:1803.10609 (2018).
[16]
Yasmine Belkacemi, Eric Buesing, Arpit Goenka, Vinay Gupta, Damian Lewandowski, and Maurice Obeid. 2022. From speech to insights: The value of the human voice. McKinsey & Company (January 2022).
[17]
Emily M Bender and Batya Friedman. 2018. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics 6 (2018), 587–604.
[18]
Kaushal Santosh Bhogale, Abhigyan Raman, Tahir Javed, Sumanth Doddapaneni, Anoop Kunchukuttan, Pratyush Kumar, and Mitesh M Khapra. 2022. Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages. arXiv preprint arXiv:2208.12666 (2022).
[19]
Jayadev Billa. 2021. Leveraging Non-Target Language Resources to Improve ASR Performance in a Target Language. In Interspeech. 2581–2585.
[20]
Jordan J Bird, Elizabeth Wanner, Anikó Ekárt, and Diego R Faria. 2019. Accent classification in human speech biometrics for native and non-native english speakers. In Proceedings of the 12th ACM International Conference on PErvasive Technologies Related to Assistive Environments. 554–560.
[21]
Steven Bird. 2020. Decolonising speech and language technology. In Proceedings of the 28th International Conference on Computational Linguistics. 3504–3519.
[22]
Matthew P Black, Abe Kazemzadeh, Joseph Tepperman, and Shrikanth S Narayanan. 2011. Automatically assessing the ABCs: Verification of children’s spoken letter-names and letter-sounds. ACM Transactions on Speech and Language Processing (TSLP) 7, 4 (2011), 1–17.
[23]
José Luis Blanco, Rubén Fernández Pozo, Doroteo T Toledano, F Javier Caminero, and Eduardo López Gonzalo. 2011. Analyzing training dependencies and posterior fusion in discriminant classification of apnea patients based on sustained and connected speech. In Interspeech. International Speech Communication Association.
[24]
Su Lin Blodgett, Solon Barocas, Hal Daumé III, and Hanna Wallach. 2020. Language (Technology) is Power: A Critical Survey of “Bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.485
[25]
David Boersma, Paul & Weenink. 2023. Praat (Version 6.3.06). http://www.praat.org/
[26]
Marcely Zanon Boito, Laurent Besacier, Natalia Tomashenko, and Yannick Estève. 2022. A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems. arXiv preprint arXiv:2204.01397 (2022).
[27]
Marcely Zanon Boito, John Ortega, Hugo Riguidel, Antoine Laurent, Loïc Barrault, Fethi Bougares, Firas Chaabani, Ha Nguyen, Florentin Barbier, Souhir Gahbiche, 2022. ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks. arXiv preprint arXiv:2205.01987 (2022).
[28]
Hynek Boril, Abhijeet Sangwan, and John HL Hansen. 2012. Arabic Dialect Identification-’Is the Secret in the Silence?’and Other Observations. In INTERSPEECH. 30–33.
[29]
Zalán Borsos, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Olivier Teboul, David Grangier, Marco Tagliasacchi, and Neil Zeghidour. 2022. Audiolm: a language modeling approach to audio generation. arXiv preprint arXiv:2209.03143 (2022).
[30]
Soumia Bougrine, Aicha Chorana, Abdallah Lakhdari, and Hadda Cherroun. 2017. Toward a Web-based speech corpus for Algerian dialectal Arabic varieties. In Proceedings of the Third Arabic Natural Language Processing Workshop. 138–146.
[31]
Pierre Bourdieu and Jean-Claude Passeron. 1990. Reproduction in education, society and culture. Vol. 4. Sage.
[32]
Thorsten Brants. 2000. Inter-annotator Agreement for a German Newspaper Corpus. In LREC. Citeseer.
[33]
David Brazil. 1997. The Communicative Value of Intonation in English Book. Cambridge University Press.
[34]
Hui Bu, Jiayu Du, Xingyu Na, Bengu Wu, and Hao Zheng. 2017. Aishell-1: An open-source mandarin speech corpus and a speech recognition baseline. In 2017 20th conference of the oriental chapter of the international coordinating committee on speech databases and speech I/O systems and assessment (O-COCOSDA). IEEE, 1–5.
[35]
Joy Buolamwini and Timnit Gebru. 2018. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency(Proceedings of Machine Learning Research, Vol. 81), Sorelle A. Friedler and Christo Wilson (Eds.). PMLR, 77–91. https://proceedings.mlr.press/v81/buolamwini18a.html
[36]
Bradley Butcher, Vincent S Huang, Christopher Robinson, Jeremy Reffin, Sema K Sgaier, Grace Charles, and Novi Quadrianto. 2021. Causal datasheet for datasets: An evaluation guide for real-world data analysis and data collection design using Bayesian Networks. Frontiers in Artificial Intelligence 4 (2021), 612551.
[37]
Alena Butryna, Shan-Hui Cathy Chu, Isin Demirsahin, Alexander Gutkin, Linne Ha, Fei He, Martin Jansche, Cibu Johny, Anna Katanova, Oddur Kjartansson, 2020. Google crowdsourced speech corpora and related open-source resources for low-resource languages and dialects: an overview. arXiv preprint arXiv:2010.06778 (2020).
[38]
Mathieu Carrier, Philippe Apparicio, and Anne-Marie Séguin. 2016. Road traffic noise in Montreal and environmental equity: What is the situation for the most vulnerable population groups?Journal of Transport Geography 51 (2016), 1–8.
[39]
Inigo Casanueva, Thomas Hain, and Phil Green. 2016. Improving generalisation to new speakers in spoken dialogue state tracking. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Sheffield, 2726–2730.
[40]
J. A. Casey, R. Morello-Frosch, K. Mennitt, D. J.and Fristrup, E. L. Ogburn, and P. James. 2017. Race/Ethnicity, Socioeconomic Status, Residential Segregation, and Spatial Variation in Noise Exposure in the Contiguous United States. Environmental health perspectives 125, 7 (2017), 077017.
[41]
R.T. Cauldwell. 2002. Streaming speech: Listening and pronunciation for advanced learners of English. Speechninaction.
[42]
Malgorzata Ćavar, Damir Ćavar, Dov-Ber Kerler, and Anya Quilitzsch. 2016. Generating a Yiddish speech corpus, forced aligner and basic ASR system for the AHEYM project. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). 4688–4693.
[43]
Özlem Çetinoğlu. 2017. A Code-Switching Corpus of Turkish-German Conversations. In Proceedings of the 11th Linguistic Annotation Workshop. Association for Computational Linguistics, Valencia, Spain, 34–40. https://doi.org/10.18653/v1/W17-0804
[44]
Chen-Yu Chen, Wei-Zhong Zheng, Syu-Siang Wang, Yu Tsao, Pei-Chun Li, and Ying-Hui Lai. 2020. Enhancing Intelligibility of Dysarthric Speech Using Gated Convolutional-Based Voice Conversion System. In INTERSPEECH. 4686–4690.
[45]
Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, 2021. Gigaspeech: An evolving, multi-domain asr corpus with 10,000 hours of transcribed audio. arXiv preprint arXiv:2106.06909 (2021).
[46]
Winnie Cheng, Christopher Greaves, Martin Warren, 2005. The creation of a prosodically transcribed intercultural corpus: The Hong Kong Corpus of Spoken English (prosodic). ICAME journal 29 (2005), 47–68.
[47]
Piotr Chlebek, Elizabeth Shriberg, Yang Lu, Tomasz Rutowski, Amir Harati, and Ricardo Oliveira. 2020. Comparing speech recognition services for HCI applications in behavioral health. In Adjunct Proceedings of the 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers. 483–487.
[48]
Prafulla Kumar Choubey, Anna Currey, Prashant Mathur, and Georgiana Dinu. 2021. Improving gender translation accuracy with filtered self-training. arXiv preprint arXiv:2104.07695 (2021).
[49]
Elvan Çiftçi, Heysem Kaya, Hüseyin Güleç, and Albert Ali Salah. 2018. The turkish audio-visual bipolar disorder corpus. In 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia). IEEE, 1–6.
[50]
Renee Peje Clapham, Lisette van der Molen, RJJH van Son, M van den Brekel, and Frans JM Hilgers. 2012. NKI-CCRT corpus-speech intelligibility before and after advanced head and neck cancer treated with concomitant chemoradiotherapy. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12). 3350–3355.
[51]
Leigh Clark, Philip Doyle, Diego Garaialde, Emer Gilmartin, Stephan Schlögl, Jens Edlund, Matthew Aylett, João Cabral, Cosmin Munteanu, Justin Edwards, 2019. The state of speech in HCI: Trends, themes and challenges. Interacting with Computers 31, 4 (2019), 349–371.
[52]
Ido Cohn, Itay Laish, Genady Beryozkin, Gang Li, Izhak Shafran, Idan Szpektor, Tzvika Hartman, Avinatan Hassidim, and Yossi Matias. 2019. Audio de-identification: A new entity recognition task. arXiv preprint arXiv:1903.07037 (2019).
[53]
Joris Cosentino, Manuel Pariente, Samuele Cornell, Antoine Deleforge, and Emmanuel Vincent. 2020. Librimix: An open-source dataset for generalizable speech separation. arXiv preprint arXiv:2005.11262 (2020).
[54]
Marta R Costa-jussà, Roger Creus, Oriol Domingo, Albert Domínguez, Miquel Escobar, Cayetana López, Marina Garcia, and Margarita Geleta. 2020. Mt-adapted datasheets for datasets: template and repository. arXiv preprint arXiv:2005.13156 (2020).
[55]
Evie Coussé and Steven Gillis. 2006. Regional bias in the broad phonetic transcriptions of the Spoken Dutch Corpus. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06).
[56]
Mathias Creutz, Teemu Hirsimäki, Mikko Kurimo, Antti Puurula, Janne Pylkkönen, Vesa Siivola, Matti Varjokallio, Ebru Arisoy, Murat Saraçlar, and Andreas Stolcke. 2007. Morph-based speech recognition and modeling of out-of-vocabulary words across languages. ACM Transactions on Speech and Language Processing (TSLP) 5, 1 (2007), 1–29.
[57]
Amit Das, Preethi Jyothi, and Mark Hasegawa-Johnson. 2016. Automatic Speech Recognition Using Probabilistic Transcriptions in Swahili, Amharic, and Dinka. In INTERSPEECH. 3524–3528.
[58]
Datatang. [n. d.]. 500 hours - Italian conversational speech data by mobile phone. https://www.datatang.ai/datasets/1178?utm_source=PaperwithCode&utm_medium=PaperwithCode&utm_campaign=PaperwithCode&utm_id=PaperwithCode&utm_term=PaperwithCode&utm_content=PaperwithCode
[59]
Laurence Devillers, Ioana Vasilescu, and Lori Lamel. 2002. Annotation and detection of emotion in a task-oriented human-human dialog corpus. In proceedings of ISLE Workshop, Vol. 20. 43.
[60]
Alex DiChristofano, Henry Shuster, Shefali Chandra, and Neal Patwari. 2022. Performance Disparities Between Accents in Automatic Speech Recognition. arXiv preprint arXiv:2208.01157 (2022).
[61]
Rachel Dorn. 2019. Dialect-specific models for automatic speech recognition of african american vernacular english. In Proceedings of the Student Research Workshop Associated with RANLP 2019. 16–20.
[62]
Jiayu Du, Xingyu Na, Xuechen Liu, and Hui Bu. 2018. Aishell-2: Transforming mandarin asr research into industrial scale. arXiv preprint arXiv:1808.10583 (2018).
[63]
Priyank Dubey and Bilal Shah. 2022. Deep Speech Based End-to-End Automated Speech Recognition (ASR) for Indian-English Accents. arXiv preprint arXiv:2204.00977 (2022).
[64]
H. T. Edwards. 1997. Applied Phonetics: The sounds of American English. Singular, San Diego, CA.
[65]
Lotte Eijk, Marlou Rasenberg, Flavia Arnese, Mark Blokpoel, Mark Dingemanse, Christian F Doeller, Mirjam Ernestus, Judith Holler, Branka Milivojevic, Asli Özyürek, 2022. The CABB dataset: A multimodal corpus of communicative interactions for behavioural and neural analyses. NeuroImage 264 (2022), 119734.
[66]
Severin Engelmann, Chiara Ullstein, Orestis Papakyriakopoulos, and Jens Grossklags. 2022. What people think AI should infer from faces. In 2022 ACM Conference on Fairness, Accountability, and Transparency. 128–141.
[67]
Nicholas Evans and Stephen C. Levinson. 2009. The myth of language universals: Language diversity and its importance for cognitive science. Behavioral and Brain Sciences 32, 5 (2009), 429–448. https://doi.org/10.1017/S0140525X0999094X
[68]
Alessandro Fabris, Stefano Messina, Gianmaria Silvello, and Gian Antonio Susto. 2022. Tackling documentation debt: a survey on algorithmic fairness datasets. In Equity and Access in Algorithms, Mechanisms, and Optimization. 1–13.
[69]
Lingyun Feng, Jianwei Yu, Deng Cai, Songxiang Liu, Haitao Zheng, and Yan Wang. 2021. ASR-GLUE: A new multi-task benchmark for asr-robust natural language understanding. arXiv preprint arXiv:2108.13048 (2021).
[70]
Siyuan Feng, Olya Kudina, Bence Mark Halpern, and Odette Scharenborg. 2021. Quantifying bias in automatic speech recognition. arXiv preprint arXiv:2103.15122 (2021).
[71]
Tiantian Feng, Rajat Hebbar, Nicholas Mehlman, Xuan Shi, Aditya Kommineni, 2022. A Review of Speech-centric Trustworthy Machine Learning: Privacy, Safety, and Fairness. arXiv preprint arXiv:2212.09006 (2022).
[72]
Gianni Fenu, Hicham Lafhouli, and Mirko Marras. 2020. Exploring algorithmic fairness in deep speaker verification. In International Conference on Computational Science and Its Applications. Springer, 77–93.
[73]
Robert W. Frick. 1985. Communicating emotion: The role of prosodic features.Psychological Bulletin 97, 3 (May 1985), 412–429. https://doi.org/10.1037/0033-2909.97.3.412
[74]
Penelope Gardner-Chloros. 2009. Code-switching. Cambridge university press.
[75]
Simson Garfinkel 2015. De-identification of Personal Information:.US Department of Commerce, National Institute of Standards and Technology.
[76]
Mahault Garnerin, Solange Rossato, and Laurent Besacier. 2019. Gender representation in French broadcast corpora and its impact on ASR performance. In Proceedings of the 1st International Workshop on AI for Smart TV Content Production, Access and Delivery. 3–9.
[77]
R.G. Garside, G. Leech, and A.M. Mcenery. 1997. Corpus Annotation: Linguistic Information from Computer Text Corpora. Routledge.
[78]
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé Iii, and Kate Crawford. 2021. Datasheets for datasets. Commun. ACM 64, 12 (2021), 86–92.
[79]
Jort F Gemmeke, Daniel PW Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R Channing Moore, Manoj Plakal, and Marvin Ritter. 2017. Audio set: An ontology and human-labeled dataset for audio events. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 776–780.
[80]
Mengzhe Geng, Xurong Xie, Zi Ye, Tianzi Wang, Guinan Li, Shujie Hu, Xunying Liu, and Helen Meng. 2022. Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition. arXiv preprint arXiv:2202.10290 (2022).
[81]
Kallirroi Georgila, Anton Leuski, Volodymyr Yanov, and David Traum. 2020. Evaluation of off-the-shelf speech recognizers across diverse dialogue domains. In Proceedings of the 12th language resources and evaluation conference. 6469–6476.
[82]
James Sneed German, Maria Candea, LeAnn Brown, Timothy Mahrt, and Oriana Reid-Collins. 2022. Gender Spectrum Speech Corpus. https://hdl.handle.net/11403/gender_spectrum_speech/v2.1 ORTOLANG (Open Resources and TOols for LANGuage) –www.ortolang.fr.
[83]
K. Gerson and S. Damaske. 2020. The Open. Oxford University Press, Oxford, NY.
[84]
K. Gerson and S. Damaske. 2020. The Science and Art of Interviewing. Oxford University Press, Oxford, NY.
[85]
Daniela Gerz, Pei-Hao Su, Razvan Kusztos, Avishek Mondal, Michał Lis, Eshan Singhal, Nikola Mrkšić, Tsung-Hsien Wen, and Ivan Vulić. 2021. Multilingual and cross-lingual intent detection from spoken data. arXiv preprint arXiv:2104.08524 (2021).
[86]
Simon Gonzalez, James Grama, and Catherine E Travis. 2020. Comparing the performance of forced aligners used in sociophonetic research. Linguistics Vanguard 6, 1 (2020).
[87]
Jan Gorisch, Michael Gref, and Thomas Schmidt. 2020. Using Automatic Speech Recognition in Spoken Corpus Curation. In Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC), May 11-16, 2020, Palais du Pharo, Marseille, France. European Language Resources Association, 6423–6428.
[88]
Kyle Gorman, Jonathan Howell, and Michael Wagner. 2011. Prosodylab-aligner: A tool for forced alignment of laboratory speech. Canadian Acoustics 39, 3 (2011), 192–193.
[89]
Jordan R. Green, Robert L. MacDonald, Pan-Pan Jiang, Julie Cattiau, Rus Heywood, Richard Cave, Katie Seaver, Marilyn A. Ladewig, Jimmy Tobin, Michael P. Brenner, Philip C. Nelson, and Katrin Tomanek. 2021. Automatic Speech Recognition of Disordered Speech: Personalized Models Outperforming Human Listeners on Short Phrases. In Proc. Interspeech 2021. 4778–4782. https://doi.org/10.21437/Interspeech.2021-1384
[90]
Roberto Gretter. 2014. Euronews: a multilingual speech corpus for ASR. In LREC. 2635–2638.
[91]
Nina Grønnum. 2009. A Danish phonetically annotated spontaneous speech corpus (DanPASS). Speech Communication 51, 7 (2009), 594–603.
[92]
Anhong Guo, Ece Kamar, Jennifer Wortman Vaughan, Hanna Wallach, and Meredith Ringel Morris. 2020. Toward fairness in AI for people with disabilities SBG@ a research roadmap. ACM SIGACCESS Accessibility and Computing125 (2020), 1–1.
[93]
Vikram Gupta, Rini Sharon, Ramit Sawhney, and Debdoot Mukherjee. 2022. ADIMA: Abuse Detection In Multilingual Audio. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6172–6176.
[94]
Nawar Halabi. 2016. Modern standard Arabic phonetics for speech synthesis. Ph. D. Dissertation. University of Southampton.
[95]
Margot Hanley, Apoorv Khandelwal, Hadar Averbuch-Elor, Noah Snavely, and Helen Nissenbaum. 2020. An ethical highlighter for people-centric dataset creation. arXiv preprint arXiv:2011.13583 (2020).
[96]
Harveenchadha. [n. d.]. Indic-Voice: Largest Open Source speech corpora for Indic languages. https://github.com/harveenchadha/indic-voice
[97]
François Hernandez, Vincent Nguyen, Sahar Ghannay, Natalia Tomashenko, and Yannick Esteve. 2018. TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation. In International conference on speech and computer. Springer, 198–208.
[98]
Jack Hessel, Zhenhai Zhu, Bo Pang, and Radu Soricut. 2020. Beyond instructional videos: Probing for more diverse visual-textual grounding on youtube. arXiv preprint arXiv:2004.14338 (2020).
[99]
Sarah Holland, Ahmed Hosny, Sarah Newman, Joshua Joseph, and Kasia Chmielinski. 2018. The dataset nutrition label: A framework to drive higher data quality standards. arXiv preprint arXiv:1805.03677 (2018).
[100]
Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miro Dudik, and Hanna Wallach. 2019. Improving fairness in machine learning systems: What do industry practitioners need?. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1–16.
[101]
Yi Hu and Philipos C Loizou. 2007. Subjective comparison and evaluation of speech enhancement algorithms. Speech communication 49, 7-8 (2007), 588–601.
[102]
Amir Hussein, Shinji Watanabe, and Ahmed Ali. 2022. Arabic speech recognition by end-to-end, modular systems and human. Computer Speech & Language 71 (2022), 101272.
[103]
Wiebke Toussaint Hutiri, Lauriane Gorce, and Aaron Yi Ding. 2022. Design Guidelines for Inclusive Speaker Verification Evaluation Datasets. arXiv preprint arXiv:2204.02281 (2022).
[104]
Deeply Inc. 202. Korean Read Speech Corpus. https://github.com/deeplyinc/Korean-Read-Speech-Corpus
[105]
Bahar Irfan, Mehdi Hellou, Alexandre Mazel, and Tony Belpaeme. 2020. Challenges of a real-world HRI study with non-native english speakers: Can personalisation save the day?. In Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction. 272–274.
[106]
Kazuki Irie, Shankar Kumar, Michael Nirschl, and Hank Liao. 2018. RADMM: Recurrent adaptive mixture model with applications to domain robust language modeling. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6079–6083.
[107]
Joseph Darius Jaafari and Nicole Lewis. 2019. In Court, Where Are Siri and Alexa?The Marshall Project (February 2019).
[108]
Abigail Z. Jacobs and Hanna Wallach. 2021. Measurement and Fairness. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. ACM. https://doi.org/10.1145/3442188.3445901
[109]
Adam Janin, Don Baron, Jane Edwards, Dan Ellis, David Gelbart, Nelson Morgan, Barbara Peskin, Thilo Pfau, Elizabeth Shriberg, Andreas Stolcke, 2003. The ICSI meeting corpus. In 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP’03)., Vol. 1. IEEE, I–I.
[110]
Dinesh Babu Jayagopi, Samira Sheikhi, David Klotz, Johannes Wienke, Jean-Marc Odobez, Sebastian Wrede, Vasil Khalidov, Laurent Nguyen, Britta Wrede, and Daniel Gatica-Perez. 2012. The vernissage corpus: A multimodal human-robot-interaction dataset. Technical Report.
[111]
Pin Ji, Yang Feng, Jia Liu, Zhihong Zhao, and Zhenyu Chen. 2022. ASRTest: automated testing for deep-neural-network-driven speech recognition systems. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 189–201.
[112]
Janne Bondi Johannessen, Kristin Hagen, Joel Priestley, and Lars Nygaard. 2007. An advanced speech corpus for Norwegian. In Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007). 29–36.
[113]
Khia A Johnson, Molly Babel, Ivan Fong, and Nancy Yiu. 2020. SpiCE: A new open-access corpus of conversational bilingual speech in Cantonese and English. In Proceedings of the 12th Language Resources and Evaluation Conference. 4089–4095.
[114]
Taylor Jones, Jessica Rose Kalbfeld, Ryan Hancock, and Robin Clark. 2019. Testifying while black: An experimental study of court reporter accuracy in transcription of African American English. Language 95, 2 (2019), e216–e252. https://doi.org/10.1353/lan.2019.0042
[115]
Chae Kwan Jung. 2021. Designing and building the Korean English Learners’ Spoken Corpus (KELSC). Studies in Foreign Language Education 35, 3 (2021), 209–223.
[116]
Virender Kadyan, Taniya Hasija, and Amitoj Singh. 2022. Prosody features based low resource Punjabi children ASR and T-NT classifier using data augmentation. Multimedia Tools and Applications (2022), 1–22.
[117]
Sayash Kapoor and Arvind Narayanan. 2022. Leakage and the Reproducibility Crisis in ML-based Science. https://doi.org/10.48550/ARXIV.2207.07048
[118]
Sayash Kapoor, Matthew Sun, Mona Wang, Klaudia Jazwinska, and Elizabeth Anne Watkins. 2022. Weaving Privacy and Power: On the Privacy Practices of Labor Organizers in the US Technology Industry. Proceedings of the ACM on Human-Computer Interaction 6, CSCW2 (2022), 1–33.
[119]
Penny Karanasou, Chunyang Wu, Mark Gales, and Philip C Woodland. 2017. I-vectors and structured neural networks for rapid adaptation of acoustic models. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25, 4 (2017), 818–828.
[120]
Nikolay Karpov, Alexander Denisenko, and Fedor Minkin. 2021. Golos: Russian dataset for speech research. arXiv preprint arXiv:2106.10161 (2021).
[121]
Kazuya Kawakami, Luyu Wang, Chris Dyer, Phil Blunsom, and Aaron van den Oord. 2020. Learning robust and multilingual speech representations. arXiv preprint arXiv:2001.11128 (2020).
[122]
Jodi Kearns. 2014. Librivox: Free public domain audiobooks. Reference Reviews 28, 1 (2014), 7–8.
[123]
Tyler Kendall and Charlie Farrington. 2018. The corpus of regional african american language. Version 6 (2018), 1.
[124]
Byungju Kim, Hyunwoo Kim, Kyungsu Kim, Sungjin Kim, and Junmo Kim. 2019. Learning Not to Learn: Training Deep Neural Networks With Biased Data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[125]
Sunhee Kim, Jooyoung Lee, S.G. Choi, Seunghun Ji, Jeemin Kang, Jongin Kim, Dohee Kim, Boryong Kim, Eungi Cho, Hojeong Kim, Jeongmin Jang, Jun Hyung Kim, Bon Ku, Hyung-Min Park, and Minhwa Chung. 2020. Building a Korean conversational speech database in the emergency medical domain. Phonetics and Speech Sciences 12 (12 2020), 81–90. https://doi.org/10.13064/KSSS.2020.12.4.081
[126]
Andreas Kirkedal, Marija Stepanović, and Barbara Plank. 2020. FT speech: Danish parliament speech corpus. arXiv preprint arXiv:2005.12368 (2020).
[127]
Keith Kirkpatrick. 2020. Natural language misunderstanding. Commun. ACM 63, 11 (2020), 17–18.
[128]
Allison Koenecke, Andrew Nam, Emily Lake, Joe Nudell, Minnie Quartey, Zion Mengesha, Connor Toups, John R. Rickford, Dan Jurafsky, and Sharad Goel. 2020. Racial disparities in automated speech recognition. Proceedings of the National Academy of Sciences 117, 14 (March 2020), 7684–7689. https://doi.org/10.1073/pnas.1915768117
[129]
Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, Tony Lee, Etienne David, Ian Stavness, Wei Guo, Berton Earnshaw, Imran Haque, Sara M Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, and Percy Liang. 2021. WILDS: A Benchmark of in-the-Wild Distribution Shifts. In Proceedings of the 38th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 5637–5664.
[130]
Rostislav Kolobov, Olga Okhapkina, Olga Omelchishina, Andrey Platunov, Roman Bedyakin, Vyacheslav Moshkin, Dmitry Menshikov, and Nikolay Mikhaylovskiy. 2021. Mediaspeech: Multilanguage asr benchmark and dataset. arXiv preprint arXiv:2103.16193 (2021).
[131]
Huib Kouwenhoven, Mirjam Ernestus, and Margot Van Mulken. 2018. Register variation by Spanish users of English: The Nijmegen Corpus of Spanish English. Corpus Linguistics and Linguistic Theory 14, 1 (2018), 35–63.
[132]
Roland Kuhn, Fineen Davis, Alain Désilets, Eric Joanis, Anna Kazantseva, Rebecca Knowles, Patrick Littell, Delaney Lothian, Aidan Pine, Caroline Running Wolf, 2020. The Indigenous Languages Technology project at NRC Canada: An empowerment-oriented approach to developing language software. In Proceedings of the 28th international conference on computational linguistics. 5866–5878.
[133]
Baybars Kulebi, Carme Armentano-Oller, Carlos Rodríguez-Penagos, and Marta Villegas. 2022. ParlamentParla: A speech corpus of catalan parliamentary sessions. In Proceedings of the Workshop ParlaCLARIN III within the 13th Language Resources and Evaluation Conference. 125–130.
[134]
Vinit Kumar, Avinash Kumar, and S Shahnawazuddin. 2022. Creating robust children’s ASR system in zero-resource condition through out-of-domain data augmentation. Circuits, Systems, and Signal Processing 41, 4 (2022), 2205–2220.
[135]
Raja S Kushalnagar, Walter S Lasecki, and Jeffrey P Bigham. 2012. A readability evaluation of real-time crowd captions in the classroom. In Proceedings of the 14th international ACM SIGACCESS conference on Computers and accessibility. 71–78.
[136]
Egor Lakomkin, Sven Magg, Cornelius Weber, and Stefan Wermter. 2019. KT-speech-crawler: Automatic dataset construction for speech recognition from YouTube videos. arXiv preprint arXiv:1903.00216 (2019).
[137]
Swaran Lata and Somnath Chandra Vijay Kumar. 2010. Development of Linguistic Resources and Tools for Providing Multilingual Solutions in Indian Languages—A Report on National Initiative. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10).
[138]
Alexander LeClair and Collin McMillan. 2019. Recommendations for datasets for source code summarization. arXiv preprint arXiv:1904.02660 (2019).
[139]
Bowon Lee, Mark Hasegawa-Johnson, Camille Goudeseune, Suketu Kamdar, Sarah Borys, Ming Liu, and Thomas Huang. 2004. AVICAR: Audio-visual speech corpus in a car environment. In Eighth International Conference on Spoken Language Processing.
[140]
Gilwoo Lee, Zhiwei Deng, Shugao Ma, Takaaki Shiratori, Siddhartha S Srinivasa, and Yaser Sheikh. 2019. Talking with hands 16.2 m: A large-scale dataset of synchronized body-finger motion and audio for conversational motion analysis and synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 763–772.
[141]
Seonwoo Lee, Sunhee Kim, and Minhwa Chung. 2022. Building A Speech Corpus Of Children With Cochlear Implants Via An Enhanced Metadata Structure. In 2022 25th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA). IEEE, 1–6.
[142]
Tan Lee, Yuanyuan Liu, Pei-Wen Huang, Jen-Tzung Chien, Wang Kong Lam, Yu Ting Yeung, Thomas KT Law, Kathy YS Lee, Anthony Pak-Hin Kong, and Sam-Po Law. 2016. Automatic speech recognition for acoustical analysis and assessment of cantonese pathological voice and speech. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 6475–6479.
[143]
S Lemmety. 2000. Review of speech synthesis technology, Helsinki University of Technology. Ph. D. Dissertation. Thèse.
[144]
Chengfei Li, Shuhao Deng, Yaoping Wang, Guangjing Wang, Yaguang Gong, Changbin Chen, and Jinfeng Bai. 2022. TALCS: An Open-Source Mandarin-English Code-Switching Corpus and a Speech Recognition Baseline. ArXiv abs/2206.13135 (2022).
[145]
Chak-Fai Li, Francis Keith, William Hartmann, and Matthew Snover. 2022. Combining Unsupervised and Text Augmented Semi-Supervised Learning For Low Resourced Autoregressive Speech Recognition. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6892–6896.
[146]
Jason Li, Ravi Gadde, Boris Ginsburg, and Vitaly Lavrukhin. 2018. Training neural speech recognition systems with synthetic speech augmentation. (2018).
[147]
Jing Li, Binling Wang, Yiming Zhi, Zheng Li, Lin Li, Qingyang Hong, and Dong Wang. 2021. Oriental language recognition (OLR) 2020: Summary and analysis. arXiv preprint arXiv:2107.05365 (2021).
[148]
Yuanchao Li, Catherine Lai, Divesh Lala, Koji Inoue, and Tatsuya Kawahara. 2022. Alzheimer’s Dementia Detection through Spontaneous Dialogue with Proactive Robotic Listeners. In HRI. 875–879.
[149]
Ying Li, Yue Yu, and Pascale Fung. 2012. A Mandarin-English Code-Switching Corpus. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12). European Language Resources Association (ELRA), Istanbul, Turkey, 2515–2519.
[150]
Yuan-Fu Liao, Yung-Hsiang Shawn Chang, Yu-Chen Lin, Wu-Hua Hsu, Matus Pleva, and Jozef Juhar. 2020. Formosa speech in the wild corpus for improving taiwanese mandarin speech-enabled human-computer interaction. Journal of Signal Processing Systems 92 (2020), 853–873.
[151]
Shaoshi Ling, Yuzong Liu, Julian Salazar, and Katrin Kirchhoff. 2020. Deep contextualized acoustic representations for semi-supervised speech recognition. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6429–6433.
[152]
R. Lippi-Green. 1997. English with an accent: Language ideology and discrimination in the United States. Routledge, London.
[153]
Chunxi Liu, Michael Picheny, Leda Sarı, Pooja Chitkara, Alex Xiao, Xiaohui Zhang, Mark Chou, Andres Alvarado, Caner Hazirbas, and Yatharth Saraf. 2022. Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6162–6166.
[154]
Yulan Liu, Charles Fox, Madina Hasan, and Thomas Hain. 2016. The sheffield wargame corpus-day two and day three. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. ISCA, 3833–3837.
[155]
Yi Liu, Pascale Fung, Yongsheng Yang, Denise DiPersio, Meghan Glenn, Stephanie Strassel, and Christopher Cieri. 2010. A Very Large Scale Mandarin Chinese Broadcast Corpus for GALE Project. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10).
[156]
Julio C. Hidalgo Lopez, Shelly Sandeep, MaKayla Wright, Grace M. Wandell, and Anthony B. Law. 2023. Quantifying and Improving the Performance of Speech Recognition Systems on Dysphonic Speech. Otolaryngology–Head and Neck Surgery 168, 5 (Jan. 2023), 1130–1138. https://doi.org/10.1002/ohn.170
[157]
Paula Lopez-Otero, Laura Docío Fernández, Alberto Abad, and Carmen Garcia-Mateo. 2017. Depression Detection Using Automatic Transcriptions of De-Identified Speech. In INTERSPEECH. 3157–3161.
[158]
Hieu-Thi Luong and Hai-Quan Vu. 2016. A non-expert Kaldi recipe for Vietnamese speech recognition system. In Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016). 51–55.
[159]
Saturnino Luz, Fasih Haider, Sofia de la Fuente, Davida Fromm, and Brian MacWhinney. 2020. Alzheimer’s Dementia Recognition through Spontaneous Speech: The ADReSS Challenge. In Proceedings of INTERSPEECH 2020. Shanghai, China. https://arxiv.org/abs/2004.06833
[160]
Dau-Cheng Lyu, Tien-Ping Tan, Eng Siong Chng, and Haizhou Li. 2010. Seame: a mandarin-english code-switching speech corpus in south-east asia. In Eleventh Annual Conference of the International Speech Communication Association.
[161]
Andrew Maas, Quoc V Le, Tyler M O’neil, Oriol Vinyals, Patrick Nguyen, and Andrew Y Ng. 2012. Recurrent neural networks for noise reduction in robust ASR. (2012).
[162]
Joel Mackenzie, Rodger Benham, Matthias Petri, Johanne R Trippas, J Shane Culpepper, and Alistair Moffat. 2020. CC-News-En: A large English news corpus. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 3077–3084.
[163]
B. MacWhinney. 2000. The CHILDES Project: Tools for analyzing talk. Lawrence Erlbaum Associates, Mahwah, NJ.
[164]
B. MacWhinney, D. Fromm, M. Forbes, and A. Holland. 2011. AphasiaBank: Methods for studying discourse. Aphasiology 25 (2011).
[165]
Alexandre Magueresse, Vincent Carles, and Evan Heetderks. 2020. Low-Resource Languages: A Review of Past Work and Future Challenges. arXiv preprint arXiv:2006.07264v1 (2020).
[166]
Khyati Mahajan and Samira Shaikh. 2021. On the need for thoughtful data collection for multi-party dialogue: A survey of available corpora and collection methods. In Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue. 338–352.
[167]
Raju Maharjan, Kevin Doherty, Darius Adam Rohani, Per Bækgaard, and Jakob E Bardram. 2022. Experiences of a Speech-enabled Conversational Agent for the Self-report of Well-being among People Living with Affective Disorders: An In-the-Wild Study. ACM Transactions on Interactive Intelligent Systems (TiiS) 12, 2 (2022), 1–29.
[168]
Tristan J Mahr, Visar Berisha, Kan Kawabata, Julie Liss, and Katherine C Hustad. 2021. Performance of forced-alignment algorithms on children’s speech. Journal of Speech, Language, and Hearing Research 64, 6S (2021), 2213–2222.
[169]
Adria Mallol-Ragolta, Nicholas Cummins, and Björn W Schuller. 2020. An Investigation of Cross-Cultural Semi-Supervised Learning for Continuous Affect Recognition. In INTERSPEECH. 511–515.
[170]
Nina Markl and Catherine Lai. 2021. Context-sensitive evaluation of automatic speech recognition: considering user experience & language variation. In Proceedings of the First Workshop on Bridging Human–Computer Interaction and Natural Language Processing. 34–40.
[171]
John Markoff. 2019. From Your Mouth to Your Screen, Transcribing Takes the Next Step. New York Times (October 2019).
[172]
Joshua L Martin. 2021. Spoken Corpora Data, Automatic Speech Recognition, and Bias Against African American Language: The case of Habitual’Be’. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. 284–284.
[173]
The Language Archive Max Planck Institute for Psycholinguistics. 2022. ELAN (Version 6.4). https://archive.mpi.nl/tla/elan
[174]
Michael McAuliffe, Michaela Socolof, Sarah Mihuc, Michael Wagner, and Morgan Sonderegger. 2017. Montreal forced aligner: Trainable text-speech alignment using kaldi. In Interspeech, Vol. 2017. 498–502.
[175]
Gita Mehta and Anne Cutler. 1988. Detection of target phonemes in spontaneous and read speech. Language and Speech 31, 2 (1988), 135–156.
[176]
Paul Meier. 2022 [Online]. AI Hub. https://aihub.or.kr/aihubdata/data/view.do?currMenu=116&topMenu=100&aihubDataSe=ty&dataSetSn=118
[177]
Carlos Mena, Michal Borsky, David Erik Mollberg, Smári Freyr, Guðmundsson Staffan, Hedström Ragnar, Pálsson Ólafur, Helgi Jónsson, Sunneva Þorsteinsdóttir, Jóhanna Vigdís Guðmundsdóttir, Eydís Huld Magnúsdóttir, Ragnheiður Þórhallsdóttir, and Jon Gudnason. 2021. Samrómur Children Icelandic Speech 21.09. Reykjavik University: Language and Voice Lab.
[178]
Helen Meng, PC Ching, Shuk Fong Chan, Yee Fong Wong, and Cheong Chat Chan. 2004. ISIS: An adaptive, trilingual conversational system with interleaving interaction and delegation dialogs. ACM Transactions on Computer-Human Interaction (TOCHI) 11, 3 (2004), 268–299.
[179]
Josh Meyer, David Adelani, Edresson Casanova, Alp Öktem, Daniel Whitenack, Julian Weber, Salomon Kabongo Kabenamualu, Elizabeth Salesky, Iroro Orife, Colin Leong, Perez Ogayo, Chris Chinenye Emezue, Jonathan Mukiibi, Salomey Osei, Apelete Agbolo, Victor Akinode, Bernard Opoku, Olanrewaju Samuel, Jesujoba Alabi, and Shamsuddeen Hassan Muhammad. 2022. BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus. In Interspeech. ISCA. https://arxiv.org/pdf/2207.03546.pdf
[180]
Josh Meyer, Lindy Rauchenstein, Joshua D Eisenberg, and Nicholas Howell. 2020. Artie bias corpus: An open dataset for detecting demographic bias in speech applications. In Proceedings of the 12th language resources and evaluation conference. 6462–6468.
[181]
Milagros Miceli, Tianling Yang, Laurens Naudts, Martin Schuessler, Diana Serbanescu, and Alex Hanna. 2021. Documenting computer vision datasets: an invitation to reflexive data practices. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 161–172.
[182]
Boyd Michailovsky, Martine Mazaudon, Alexis Michaud, Séverine Guillaume, Alexandre François, and Evangelia Adamou. 2014. Documenting and researching endangered languages: the Pangloss Collection. (2014).
[183]
Microsoft. [n. d.]. Training and testing datasets - speech service - azure cognitive services. https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-custom-speech-test-and-train
[184]
Juliette Millet and Neil Zeghidour. 2019. Learning to detect dysarthria from raw speech. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5831–5835.
[185]
James Milroy and Lesley Milroy. 2012. Authority in language: investigating standard english. Routledge, London, England.
[186]
Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, and Kristian Lum. 2021. Algorithmic Fairness: Choices, Assumptions, and Definitions. Annual Review of Statistics and Its Application 8, 1 (March 2021), 141–163. https://doi.org/10.1146/annurev-statistics-042720-125902
[187]
Omid Mohamad Nezami, Paria Jamshid Lou, and Mansoureh Karami. 2019. ShEMO: a large-scale validated database for Persian speech emotion detection. Language Resources and Evaluation 53, 1 (2019), 1–16.
[188]
Anssi Moisio, Dejan Porjazovski, Aku Rouhe, Yaroslav Getman, Anja Virkkunen, Tamás Grósz, Krister Lindén, and Mikko Kurimo. 2022. Lahjoita puhetta–a large-scale corpus of spoken Finnish with some benchmarks. arXiv preprint arXiv:2203.12906 (2022).
[189]
Nicolás Morales, Javier Tejedor, Javier Garrido, José Colás, and Doroteo T Toledano. 2008. STC-TIMIT: Generation of a single-channel telephone corpus. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08).
[190]
Hamdy Mubarak, Amir Hussein, Shammur Absar Chowdhury, and Ahmed Ali. 2021. QASR: QCRI Aljazeera Speech Resource–A Large Scale Annotated Arabic Speech Corpus. arXiv preprint arXiv:2106.13000 (2021).
[191]
David G. Myers and Morton Ann Gernsbacher. 2021. Captioning for All. Inside Higher Ed (September 2021).
[192]
Karen Nakamura. 2019. My algorithms have determined you’re not human: AI-ML, reverse turing-tests, and the disability experience. In The 21st International ACM SIGACCESS Conference on Computers and Accessibility. 1–2.
[193]
J Neto, Hugo Meinedo, and Márcio Viveiros. 2011. A media monitoring solution. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1813–1816.
[194]
Mauro Nicolao, Michiel Sanders, and Thomas Hain. 2018. Improved acoustic modelling for automatic literacy assessment of children. In Proceedings of Interspeech 2018. ISCA, 1666–1670.
[195]
Mohammad Niknazar, Aditya Vempaty, and Ravi Kokku. 2021. Voice Privacy with Smart Digital Assistants in Educational Settings. In International Conference on Intelligent Tutoring Systems. Springer, 286–290.
[196]
Takeshi Nishida. 2014. Promoting intercultural awareness through native-to-foreign speech accent conversion. In Proceedings of the 5th ACM international conference on Collaboration across boundaries: culture, distance & technology. 83–86.
[197]
Patrick K O’Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael D Shulman, 2021. Spgispeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition. arXiv preprint arXiv:2104.02014 (2021).
[198]
openslr.org. 2022. Openslr.org. http://openslr.org/
[199]
Madhab Pal, Rajib Roy, Soma Khan, Milton Samirakshma Bepari, and Joyanta Basu. 2018. PannoMulloKathan: Voice Enabled Mobile App for Agricultural Commodity Price Dissemination in Bengali Language. In INTERSPEECH. 1491–1492.
[200]
Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. 2015. Librispeech: an asr corpus based on public domain audio books. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 5206–5210.
[201]
Orestis Papakyriakopoulos and Alice Xiang. 2023. Considerations for Ethical Speech Recognition Datasets(WSDM ’23). Association for Computing Machinery, New York, NY, USA, 1287–1288. https://doi.org/10.1145/3539597.3575793
[202]
Sara Papi, Edmondo Trentin, Roberto Gretter, Marco Matassoni, and Daniele Falavigna. 2021. Mixtures of deep neural experts for automated speech scoring. arXiv preprint arXiv:2106.12475 (2021).
[203]
Kyubyong Park, Yo Joong Choe, and Jiyeon Ham. 2019. Jejueo Datasets for Machine Translation and Speech Synthesis. arXiv preprint arXiv:1911.12071 (2019).
[204]
Kyubyong Park and Thomas Mulc. 2019. Css10: A collection of single speaker speech datasets for 10 languages. arXiv preprint arXiv:1903.11269 (2019).
[205]
R. Paul. 1995. Language disorders from infancy through adolescence: Assessment and intervention. Mosby, St.Louis, MO.
[206]
Dawa Pengcuo and Daojie Ben. 2021. Research on the Construction of Multimodal Corpus of Tibetan Teaching. In 1st International Conference on Education: Current Issues and Digital Technologies (ICECIDT 2021). Atlantis Press, 408–412.
[207]
Bharathi Pilar 2022. Subword Dictionary Learning and Segmentation Techniques for Automatic Speech Recognition in Tamil and Kannada. arXiv preprint arXiv:2207.13331 (2022).
[208]
Mark A Pitt, Keith Johnson, Elizabeth Hume, Scott Kiesling, and William Raymond. 2005. The Buckeye corpus of conversational speech: Labeling conventions and a test of transcriber reliability. Speech Communication 45, 1 (2005), 89–95.
[209]
Vineel Pratap, Qiantong Xu, Anuroop Sriram, Gabriel Synnaeve, and Ronan Collobert. 2020. Mls: A large-scale multilingual dataset for speech research. arXiv preprint arXiv:2012.03411 (2020).
[210]
Mahima Pushkarna, Andrew Zaldivar, and Oddur Kjartansson. 2022. Data cards: Purposeful and transparent dataset documentation for responsible ai. In 2022 ACM Conference on Fairness, Accountability, and Transparency. 1776–1826.
[211]
Akam Qader and Hossein Hassani. 2019. Kurdish (Sorani) Speech to Text: Presenting an Experimental Dataset. arXiv preprint arXiv:1911.13087 (2019).
[212]
Stephan Radeck-Arneth, Benjamin Milde, Arvid Lange, Evandro Gouvea, Stefan Radomski, Max Mühlhäuser, and Chris Biemann. 2015. Open Source German Distant Speech Recognition: Corpus and Acoustic Model. In Proceedings Text, Speech and Dialogue (TSD). Pilsen, Czech Republic, 480–488.
[213]
Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2022. Robust Speech Recognition via Large-Scale Weak Supervision. arXiv:arXiv:2212.04356
[214]
Nan Bernstein Ratner and Brian MacWhinney. 2018. Fluency Bank: A new resource for fluency research and practice. Journal of fluency disorders 56 (2018), 69–80.
[215]
Chandan KA Reddy, Ebrahim Beyrami, Harishchandra Dubey, Vishak Gopal, Roger Cheng, Ross Cutler, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, Sebastian Braun, 2020. The interspeech 2020 deep noise suppression challenge: Datasets, subjective speech quality and testing framework. arXiv preprint arXiv:2001.08662 (2020).
[216]
Microsoft Research. 2022. Neural networks-based speech enhancement: AI to improve audio quality. https://www.microsoft.com/en-us/research/project/nn-speech-enhancement/
[217]
Colleen Richey, Maria A Barrios, Zeb Armstrong, Chris Bartels, Horacio Franco, Martin Graciarena, Aaron Lawson, Mahesh Kumar Nandwana, Allen Stauffer, Julien van Hout, 2018. Voices obscured in complex environmental settings (voices) corpus. arXiv preprint arXiv:1804.05053 (2018).
[218]
Meredith Ringel Morris. 2019. AI and Accessibility: A Discussion of Ethical Considerations. arXiv e-prints (2019), arXiv–1908.
[219]
Christophe Ris and Stephane Dupont. 2001. Assessing local noise level estimation methods: Application to noise robust ASR. Speech communication 34, 1-2 (2001), 141–158.
[220]
GS Robinson and JG Casali. 2000. Speech communications and signal detection in noise. The noise manual 5 (2000), 567–600.
[221]
Tony Robinson, Jeroen Fransen, David Pye, Jonathan Foote, and Steve Renals. 1995. WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition. In 1995 International Conference on Acoustics, Speech, and Signal Processing, Vol. 1. IEEE, 81–84.
[222]
Negar Rostamzadeh, Diana Mincu, Subhrajit Roy, Andrew Smart, Lauren Wilcox, Mahima Pushkarna, Jessica Schrouff, Razvan Amironesei, Nyalleng Moorosi, and Katherine Heller. 2022. Healthsheet: development of a transparency artifact for health datasets. In 2022 ACM Conference on Fairness, Accountability, and Transparency. 1943–1961.
[223]
Scott Sadowsky. 2022. The Sociolinguistic Speech Corpus of Chilean Spanish (COSCACH). A socially stratified text, audio and video corpus with multiple speech registers (2022).
[224]
Betul Erdogdu Sakar, M Erdem Isenkul, C Okan Sakar, Ahmet Sertbas, Fikret Gurgen, Sakir Delil, Hulya Apaydin, and Olcay Kursun. 2013. Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE Journal of Biomedical and Health Informatics 17, 4 (2013), 828–834.
[225]
Elizabeth Salesky, Eleanor Chodroff, Tiago Pimentel, Matthew Wiesner, Ryan Cotterell, Alan W Black, and Jason Eisner. 2020. A corpus for large-scale phonetic typology. arXiv preprint arXiv:2005.13962 (2020).
[226]
Ana Lúcia Santos, Michel Généreux, Aida Cardoso, Celina Agostinho, and Silvana Abalada. 2014. A corpus of European Portuguese child and child-directed speech. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014). European Language Resources Association, 1488–1491.
[227]
Patrick Schramowski, Christopher Tauchmann, and Kristian Kersting. 2022. Can Machines Help Us Answering Question 16 in Datasheets, and In Turn Reflecting on Inappropriate Content?. In 2022 ACM Conference on Fairness, Accountability, and Transparency. 1350–1361.
[228]
Garima Sharma and Abhinav Dhall. 2021. A survey on automatic multimodal emotion recognition in the wild. In Advances in Data Science: Methodologies and Applications. Springer, 35–64.
[229]
Shakeel Ahmad Sheikh, Md Sahidullah, Fabrice Hirsch, and Slim Ouni. 2021. Machine Learning for Stuttering Identification: Review, Challenges and Future Directions. https://doi.org/10.48550/ARXIV.2107.04057
[230]
Hua Shen, Yuguang Yang, Guoli Sun, Ryan Langman, Eunjung Han, Jasha Droppo, and Andreas Stolcke. 2022. Improving fairness in speaker verification via Group-adapted Fusion Network. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7077–7081.
[231]
David Sherfinski and Avi Asher-Schapiro. 2021. U.S. prisons mull AI to analyze inmate phone calls. Thomson Reuters Foundation News (August 2021).
[232]
Xian Shi, Fan Yu, Yizhou Lu, Yuhao Liang, Qiangze Feng, Daliang Wang, Yanmin Qian, and Lei Xie. 2021. The accented english speech recognition challenge 2020: open datasets, tracks, baselines, results and methods. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6918–6922.
[233]
Yao Shi, Hui Bu, Xin Xu, Shaoji Zhang, and Ming Li. 2020. Aishell-3: A multi-speaker mandarin tts corpus and the baselines. arXiv preprint arXiv:2010.11567 (2020).
[234]
Koichi Shinoda, Sadaoki Furui, [n. d.]. Tokyo Institute of Technology Multilingual Speech Corpus-Indonesian (TITML-IDN). ([n. d.]).
[235]
Joel Shor, Dotan Emanuel, Oran Lang, Omry Tuval, Michael Brenner, Julie Cattiau, Fernando Vieira, Maeve McNally, Taylor Charbonneau, Melissa Nollstadt, 2019. Personalizing ASR for dysarthric and accented speech with limited data. arXiv preprint arXiv:1907.13511 (2019).
[236]
Kathleen Siminyu, Kibibi Mohamed Amran, Abdulrahman Ndegwa Karatu, Mnata Resani, Mwimbi Makobo Junior, Rebecca Ryakitimbo, and Britone Mwasaru. 2022. Corpus Development of Kiswahili Speech Recognition Test and Evaluation sets, Preemptively Mitigating Demographic Bias Through Collaboration with Linguists. In Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages. 13–19.
[237]
Aghilas Sini, Damien Lolive, Gaëlle Vidal, Marie Tahon, and Élisabeth Delais-Roussarie. 2018. Synpaflex-corpus: An expressive french audiobooks corpus dedicated to expressive speech synthesis. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).
[238]
David Snyder, Guoguo Chen, and Daniel Povey. 2015. Musan: A music, speech, and noise corpus. arXiv preprint arXiv:1510.08484 (2015).
[239]
Ramya Srinivasan, Emily Denton, Jordan Famularo, Negar Rostamzadeh, Fernando Diaz, and Beth Coleman. 2021. Artsheets for Art Datasets. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
[240]
Brij Mohan Lal Srivastava, Nathalie Vauquier, Md Sahidullah, Aurélien Bellet, Marc Tommasi, and Emmanuel Vincent. 2020. Evaluating voice conversion-based privacy protection against informed attackers. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2802–2806.
[241]
Adriana Stan, Junichi Yamagishi, Simon King, and Matthew Aylett. 2011. The Romanian speech synthesis (RSS) corpus: Building a high quality HMM-based speech synthesis system using a high sampling rate. Speech Communication 53, 3 (2011), 442–450.
[242]
Luke Stark and Jevan Hutson. 2022. Physiognomic Artificial Intelligence. Fordham Intellectual Property, Media and Entertainment Law Journal 32, 4 (2022), 922.
[243]
Robert Stojnic, Ross Taylor, Marcin Kardas, Viktor Kerkez, and Ludovic Viaud. 2022. Papers with Code-The latest in Machine Learning. URL: https://paperswithcode. com (2022).
[244]
Vishal Sunder, Prashant Serai, and Eric Fosler-Lussier. 2022. Building an ASR Error Robust Spoken Virtual Patient System in a Highly Class-Imbalanced Scenario Without Speech Data. arXiv preprint arXiv:2204.05183 (2022).
[245]
Surfingtech. [n. d.]. Free ST American English Corpus. https://openslr.magicdatatech.com/45/
[246]
Shinnosuke Takamichi, Ludwig Kürzinger, Takaaki Saeki, Sayaka Shiota, and Shinji Watanabe. 2021. JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification. arXiv preprint arXiv:2112.09323 (2021).
[247]
Xu Tan, Tao Qin, Frank Soong, and Tie-Yan Liu. 2021. A survey on neural speech synthesis. arXiv preprint arXiv:2106.15561 (2021).
[248]
Rachael Tatman. 2017. Gender and Dialect Bias in YouTube's Automatic Captions. In Proceedings of the First ACL Workshop on Ethics in Natural Language Processing. Association for Computational Linguistics. https://doi.org/10.18653/v1/w17-1606
[249]
TEI. [n. d.]. TEI P5: Guidelines for Electronic Text Encoding and Interchange. ([n. d.]).
[250]
Louis ten Bosch. 2000. ASR, dialects, and acoustic/phonological distances. In INTERSPEECH. 1009–1012.
[251]
Daniela Teodorescu, Josie Matalski, Delaney Lothian, Denilson Barbosa, and Carrie Demmans Epp. 2022. Cree Corpus: A Collection of nêhiyawêwin Resources. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 6354–6364.
[252]
Paul Thompson. 2010. Building a specialised audio-visual corpus. The Routledge handbook of corpus linguistics (2010), 93–103.
[253]
Katrin Tomanek, Françoise Beaufays, Julie Cattiau, Angad Chandorkar, and Khe Chai Sim. 2021. On-device personalization of automatic speech recognition models for disordered speech. arXiv preprint arXiv:2106.10259 (2021).
[254]
Peter. Trudgill. 2003. A glossary of sociolinguistics. Oxford University Press, Oxford.
[255]
Rosanna Turrisi, Arianna Braccia, Marco Emanuele, Simone Giulietti, Maura Pugliatti, Mariachiara Sensi, Luciano Fadiga, and Leonardo Badino. 2021. EasyCall corpus: a dysarthric speech dataset. arXiv preprint arXiv:2104.02542 (2021).
[256]
Marvin I. Herzog Uriel Weinreich, William Labov. 1968. Empirical Foundations for a Theory of Language Change. In Directions for Historical Linguistics, Winfred P. Lehmann and Yakov Malkiel (Eds.). Univer’sity of Texas Press, Austin, 95–195.
[257]
Tomáš Valenta, Luboš Šmídl, Jan Švec, and Daniel Soutner. 2014. Inter-annotator agreement on spontaneous Czech language: Limits of automatic speech recognition accuracy. In Text, Speech and Dialogue: 17th International Conference, TSD 2014, Brno, Czech Republic, September 8-12, 2014. Proceedings 17. Springer, 390–397.
[258]
Christophe Veaux, Junichi Yamagishi, Kirsten MacDonald, 2017. CSTR VCTK corpus: English multi-speaker corpus for CSTR voice cloning toolkit. University of Edinburgh. The Centre for Speech Technology Research (CSTR) (2017).
[259]
Boris Villazón-Terrazas, Luis M Vilches-Blázquez, Oscar Corcho, and Asunción Gómez-Pérez. 2011. Methodological guidelines for publishing government linked data. Linking government data (2011), 27–49.
[260]
Petra Wagner, Jonas Beskow, Simon Betz, Jens Edlund, Joakim Gustafson, Gustav Eje Henter, Sébastien Le Maguer, Zofia Malisz, Eva Szekely, Christina Tånnander, 2019. Speech synthesis evaluation—state-of-the-art assessment and suggestion for a novel research program. In Proceedings of the 10th Speech Synthesis Workshop (SSW10).
[261]
Payton Walker, Nathan McClaran, Zihao Zheng, Nitesh Saxena, and Guofei Gu. 2022. BiasHacker: Voice Command Disruption by Exploiting Speaker Biases in Automatic Speech Recognition. In Proceedings of the 15th ACM Conference on Security and Privacy in Wireless and Mobile Networks. 119–124.
[262]
Changhan Wang, Morgane Riviere, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, and Emmanuel Dupoux. 2021. Voxpopuli: A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation. arXiv preprint arXiv:2101.00390 (2021).
[263]
Max Weber. 1949. " Objectivity" in social science and social policy. The methodology of the social sciences (1949), 49–112.
[264]
Kellie Webster, Marta Recasens, Vera Axelrod, and Jason Baldridge. 2018. Mind the GAP: A balanced corpus of gendered ambiguous pronouns. Transactions of the Association for Computational Linguistics 6 (2018), 605–617.
[265]
R. S. Weiss. 1995. Learning from Strangers: The Art and Method of Qualitative Interview Studies. Simon & Schuster, New York, NY.
[266]
Eline Westerhout and Paola Monachesi. 2006. A pilot study for a Corpus of Dutch Aphasic Speech (CoDAS). In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06). European Language Resources Association (ELRA), Genoa, Italy.
[267]
Gordon Wichern, Joe Antognini, Michael Flynn, Licheng Richard Zhu, Emmett McQuinn, Dwight Crow, Ethan Manilow, and Jonathan Le Roux. 2019. Wham!: Extending speech separation to noisy environments. arXiv preprint arXiv:1907.01160 (2019).
[268]
J Allen Williams Jr. 1968. Interviewer role performance: A further note on bias in the information interview. Public Opinion Quarterly 32, 2 (1968), 287–294.
[269]
Johannes Wirth and Rene Peinl. 2022. ASR in German: A Detailed Error Analysis. arXiv preprint arXiv:2204.05617 (2022).
[270]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, 2019. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019).
[271]
Philip C Woodland, Chris J Leggetter, JJ Odell, Valtcho Valtchev, and Steve J Young. 1995. The 1994 HTK large vocabulary speech recognition system. In 1995 international conference on acoustics, speech, and signal processing, Vol. 1. IEEE, 73–76.
[272]
Peter Wu, Paul Pu Liang, Jiatong Shi, Ruslan Salakhutdinov, Shinji Watanabe, and Louis-Philippe Morency. 2021. Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks. In 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 841–848.
[273]
Bo Yang, Xianlong Tan, Zhengmao Chen, Bing Wang, Dan Li, Zhongping Yang, Xiping Wu, and Yi Lin. 2019. ATCSpeech: A multilingual pilot-controller speech corpus from real air traffic control environment. arXiv preprint arXiv:1911.11365 (2019).
[274]
Zehui Yang, Yifan Chen, Lei Luo, Runyan Yang, Lingxuan Ye, Gaofeng Cheng, Ji Xu, Yaohui Jin, Qingqing Zhang, Pengyuan Zhang, 2022. Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational (RAMC) Speech Dataset. arXiv preprint arXiv:2203.16844 (2022).
[275]
Gary Yeung and Abeer Alwan. 2018. On the difficulties of automatic speech recognition for kindergarten-aged children. Interspeech 2018 (2018).
[276]
Gary Yeung and Abeer Alwan. 2019. A Frequency Normalization Technique for Kindergarten Speech Recognition Inspired by the Role of f0 in Vowel Perception. Interspeech 2019 (2019).
[277]
Su-Youn Yoon, Chong Min Lee, Klaus Zechner, and Keelan Evanini. 2019. Development of Robust Automated Scoring Models Using Adversarial Input for Oral Proficiency Assessment. In INTERSPEECH. 1871–1875.
[278]
Fan Yu, Zhuoyuan Yao, Xiong Wang, Keyu An, Lei Xie, Zhijian Ou, Bo Liu, Xiulin Li, and Guanqiong Miao. 2021. The SLT 2021 children speech recognition challenge: Open datasets, rules and baselines. In 2021 IEEE Spoken Language Technology Workshop (SLT). IEEE, 1117–1123.
[279]
Kyongsik Yun, Joseph Osborne, Madison Lee, Thomas Lu, and Edward Chow. 2018. Automatic speech recognition for launch control center communication using recurrent neural networks with data augmentation and custom language model. In Disruptive Technologies in Information Sciences, Vol. 10652. SPIE, 1065202.
[280]
Piotr Żelasko, Laureano Moro-Velázquez, Mark Hasegawa-Johnson, Odette Scharenborg, and Najim Dehak. 2020. That sounds familiar: an analysis of phonetic representations transfer across languages. arXiv preprint arXiv:2005.08118 (2020).
[281]
Heiga Zen, Viet Dang, Rob Clark, Yu Zhang, Ron J Weiss, Ye Jia, Zhifeng Chen, and Yonghui Wu. 2019. LibriTTS: A corpus derived from LibriSpeech for text-to-speech. arXiv preprint arXiv:1904.02882 (2019).
[282]
Junbo Zhang, Zhiwen Zhang, Yongqing Wang, Zhiyong Yan, Qiong Song, Yukai Huang, Ke Li, Daniel Povey, and Yujun Wang. 2021. speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment. In Proc. Interspeech 2021.
[283]
Ziping Zhao, Zhongtian Bao, Zixing Zhang, Nicholas Cummins, Haishuai Wang, and Björn Schuller. 2020. Hierarchical attention transfer networks for depression assessment from speech. In ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 7159–7163.
[284]
Marc A Zissman, Terry P Gleason, Deborah M Rekart, and Beth L Losiewicz. 1996. Automatic dialect identification of extemporaneous conversational, Latin American Spanish speech. In 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, Vol. 2. IEEE, 777–780.
[285]
Lindsey Zuloaga. 2021. The latest leap in HireVue’s assessment technology. HireVue (September 2021).
[286]
Juan Zuluaga-Gomez, Karel Veselỳ, Igor Szöke, Petr Motlicek, Martin Kocour, Mickael Rigault, Khalid Choukri, Amrutha Prasad, Seyyed Saeed Sarfjoo, Iuliia Nigmatulina, 2022. ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications. arXiv preprint arXiv:2211.04054 (2022).

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
FAccT '23: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency
June 2023
1929 pages
ISBN:9798400701924
DOI:10.1145/3593013
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2023

Check for updates

Author Tags

  1. datasets
  2. datasheets
  3. ethics
  4. speech
  5. transparency

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

FAccT '23

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,470
  • Downloads (Last 6 weeks)275
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media