skip to main content
10.1145/3640794.3665880acmconferencesArticle/Chapter ViewAbstractPublication PagescuiConference Proceedingsconference-collections
extended-abstract

Toward a Third-Kind Voice for Conversational Agents in an Era of Blurring Boundaries Between Machine and Human Sounds

Published: 08 July 2024 Publication History

Abstract

The voice of widely used conversational agents (CAs) is standardized to be highly intelligible, yet it still sounds machine-generated due to its artificial qualities. With advancements in deep neural networks, voice synthesis technology has become nearly indistinguishable from a real person. The voice enables users to discern the speakers’ identities and significantly impacts user perception, particularly in voice-only interactions. While more natural, human-sounding voices are generally preferred, their use in CAs raises potential ethical dilemmas, such as eliciting unwanted social responses or confusing the nature of the speaker. In this evolving landscape, it is necessary to understand the voice characteristics from multiple facets of voice design for CAs. Therefore, our study examines the voice characteristics of both artificial-sounding and human-sounding voices. Then, we propose a ‘third-kind’ of voice that considers the characteristics of each voice type. This discussion contributes to the debate on the future direction of voice design in the field of Conversational User Interface research.

References

[1]
Lawrence Abrams. 2023. Bing Chat has a secret ‘Celebrity’ mode to impersonate celebrities. https://www.bleepingcomputer.com/news/microsoft/bing-chat-has-a-secret-celebrity-mode-to-impersonate-celebrities/. Accessed: 15 August 2023.
[2]
Amazon. 2019. Alexa, speak slower. https://www.aboutamazon.com/news/devices/alexa-speak-slower. Accessed: 15 August 2023.
[3]
Amazon. 2020. Samuel L. Jackson celebrity voice for Alexa gets an update. https://www.amazon.science/latest-news/samuel-l-jackson-celebrity-voice-for-alexa-gets-an-update. Accessed: 15 August 2023.
[4]
Matthew P. Aylett, Benjamin R. Cowan, and Leigh Clark. 2019. Siri, Echo and Performance: You Have to Suffer Darling. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI EA ’19). Association for Computing Machinery, New York, NY, USA, 1–10. https://doi.org/10.1145/3290607.3310422
[5]
Matthew P. Aylett, Yolanda Vazquez-Alvarez, and Skaiste Butkute. 2020. Creating Robot Personality: Effects of Mixing Speech and Semantic Free Utterances. In Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction (Cambridge, United Kingdom) (HRI ’20). Association for Computing Machinery, New York, NY, USA, 110–112. https://doi.org/10.1145/3371382.3378330
[6]
Baidu Research. 2017. Deep Voice 3: 2000-Speaker Neural Text-to-Speech. http://research.baidu.com/Blog/index-view?id=91. Accessed: 15 August 2023.
[7]
Alice Baird, Emilia Parada-Cabaleiro, Simone Hantke, Felix Burkhardt, Nicholas Cummins, and Björn Schuller. 2018. The Perception and Analysis of the Likeability and Human Likeness of Synthesized Speech. In Interspeech. https://api.semanticscholar.org/CorpusID:52191344
[8]
Manuel Bronstein. 2019. Hey Google, talk like a Legend. https://blog.google/products/assistant/talk-like-a-legend/. Accessed: 15 August 2023.
[9]
Gillian Brown and George Yule. 1983. Discourse analysis. Cambridge university press.
[10]
Angelo Cafaro, Nadine Glas, and Catherine Pelachaud. 2016. The Effects of Interrupting Behavior on Interpersonal Attitude and Engagement in Dyadic Interactions. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems (Singapore, Singapore) (AAMAS ’16). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 911–920.
[11]
Julia Cambre, Jessica Colnago, Jim Maddock, Janice Tsai, and Jofish Kaye. 2020. Choice of Voices: A Large-Scale Evaluation of Text-to-Speech Voice Quality for Long-Form Content. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376789
[12]
Julia Cambre and Chinmay Kulkarni. 2019. One Voice Fits All? Social Implications and Research Challenges of Designing Voices for Smart Devices. Proc. ACM Hum.-Comput. Interact. 3, CSCW, Article 223 (nov 2019), 19 pages. https://doi.org/10.1145/3359325
[13]
Brian X. Chen and Cade Metz. 2019. Google’s Duplex Uses A.I. to Mimic Humans (Sometimes). https://www.nytimes.com/2019/05/22/technology/personaltech/ai-google-duplex.html. Accessed: 15 August 2023.
[14]
Hyojin Chin, Lebogang Wame Molefi, and Mun Yong Yi. 2020. Empathy Is All You Need: How a Conversational Agent Should Respond to Verbal Abuse. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376461
[15]
Dasom Choi, Daehyun Kwak, Minji Cho, and Sangsu Lee. 2020. "Nobody Speaks that Fast!" An Empirical Study of Speech Rate in Conversational Agents for People with Vision Impairments. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (, Honolulu, HI, USA, ) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376569
[16]
Leigh Clark, Nadia Pantidi, Orla Cooney, Philip Doyle, Diego Garaialde, Justin Edwards, Brendan Spillane, Emer Gilmartin, Christine Murad, Cosmin Munteanu, Vincent Wade, and Benjamin R. Cowan. 2019. What Makes a Good Conversation? Challenges in Designing Truly Conversational Agents. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3290605.3300705
[17]
Michelle Cohn and Georgia Zellou. 2020. Perception of Concatenative vs. Neural Text-To-Speech (TTS): Differences in Intelligibility in Noise and Language Attitudes. In Proceedings of Interspeech. 1733–1737. https://doi.org/10.21437/Interspeech.2020-1336
[18]
Alan S. Cowen, Hillary Anger Elfenbein, Petri Laukka, and Dacher Keltner. 2019. Mapping 24 emotions conveyed by brief human vocalization.The American psychologist (2019). https://api.semanticscholar.org/CorpusID:58563174
[19]
Andreea Danielescu, Sharone A Horowit-Hendler, Alexandria Pabst, Kenneth Michael Stewart, Eric M Gallo, and Matthew Peter Aylett. 2023. Creating Inclusive Voices for the 21st Century: A Non-Binary Text-to-Speech for Conversational Assistants. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 390, 17 pages. https://doi.org/10.1145/3544548.3581281
[20]
Tiffany D. Do, Ryan P. McMahan, and Pamela J. Wisniewski. 2022. A New Uncanny Valley? The Effects of Speech Fidelity and Human Listener Gender on Social Perceptions of a Virtual-Human Speaker. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 424, 11 pages. https://doi.org/10.1145/3491102.3517564
[21]
Philip R. Doyle, Justin Edwards, Odile Dumbleton, Leigh Clark, and Benjamin R. Cowan. 2019. Mapping Perceptions of Humanness in Intelligent Personal Assistant Interaction. In Proceedings of the 21st International Conference on Human-Computer Interaction with Mobile Devices and Services (Taipei, Taiwan) (MobileHCI ’19). Association for Computing Machinery, New York, NY, USA, Article 5, 12 pages. https://doi.org/10.1145/3338286.3340116
[22]
Patrick Gebhard, Tanja Schneeberger, Gregor Mehlmann, Tobias Baur, and Elisabeth André. 2019. Designing the Impression of Social Agents’ Real-time Interruption Handling. In Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents (Paris, France) (IVA ’19). Association for Computing Machinery, New York, NY, USA, 19–21. https://doi.org/10.1145/3308532.3329435
[23]
Google DeepMind. 2016. WaveNet: A generative model for raw audio. https://www.deepmind.com/blog/wavenet-a-generative-model-for-raw-audio. Accessed: 15 August 2023.
[24]
A. Govender and S. King. 2018. Measuring the cognitive load of synthetic speech using a dual task paradigm. In 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018. International Speech Communication Association, 2843–2847. https://doi.org/10.21437/Interspeech.2018-1199 Conference code: 139961.
[25]
Xuedong Huang. 2018. Microsoft’s new neural text-to-speech service helps machines speak like people. https://azure.microsoft.com/en-us/blog/microsoft-s-new-neural-text-to-speech-service-helps-machines-speak-like-people/. Accessed: 15 August 2023.
[26]
Hyeji Kim, Inchan Jung, and Youn-kyung Lim. 2022. Understanding the Negative Aspects of User Experience in Human-Likeness of Voice-Based Conversational Agents. In Designing Interactive Systems Conference (Virtual Event, Australia) (DIS ’22). Association for Computing Machinery, New York, NY, USA, 1418–1427. https://doi.org/10.1145/3532106.3533528
[27]
Yelim Kim, Mohi Reza, Joanna McGrenere, and Dongwook Yoon. 2021. Designers Characterize Naturalness in Voice User Interfaces: Their Goals, Practices, and Challenges. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 242, 13 pages. https://doi.org/10.1145/3411764.3445579
[28]
Hanae Koiso, Atsushi Shimojima, and Yasuhiro Katagiri. 1998. Collaborative Signaling of Informational Structures by Dynamic Speech Rate. Language and Speech 41, 3-4 (1998), 323–350. https://doi.org/10.1177/002383099804100405
[29]
Katharina Kühne, Martin H. Fischer, and Yuefang Zhou. 2020. The Human Takes It All: Humanlike Synthesized Voices Are Perceived as Less Eerie and More Likable. Evidence From a Subjective Ratings Study. Frontiers in Neurorobotics 14 (2020). https://doi.org/10.3389/fnbot.2020.593732
[30]
Yaniv Leviathan. 2018. Google Duplex: An AI System for Accomplishing Real-World Tasks Over the Phone. https://ai.googleblog.com/2018/05/duplex-ai-system-for-natural-conversation.html. Accessed: 15 August 2023.
[31]
Robin Lickley. 1994. Detecting disfluency in spontaneous speech. The University of Edinburgh (01 1994). http://hdl.handle.net/1842/21358
[32]
Natasha Lomas. 2018. Duplex shows Google failing at ethical and creative AI design. https://techcrunch.com/2018/05/10/duplex-shows-google-failing-at-ethical-and-creative-ai-design/. Accessed: 15 August 2023.
[33]
Aisha Malik. 2023. D-ID’s new web app gives a face and voice to OpenAI’s ChatGPT. https://techcrunch.com/2023/03/07/d-ids-new-web-app-gives-a-face-and-voice-to-openais-chatgpt/. Accessed: 15 August 2023.
[34]
Meta AI. 2021. Textless NLP: Generating expressive speech from raw audio. https://ai.facebook.com/blog/textless-nlp-generating-expressive-speech-from-raw-audio/. Accessed: 15 August 2023.
[35]
Microsoft. [n. d.]. Vall-E. https://www.microsoft.com/en-us/research/project/vall-e-x/. Accessed: 15 August 2023.
[36]
Roger K Moore. 2017. Appropriate voices for artefacts: some key insights. In 1st International workshop on vocal interactivity in-and-between humans, animals and robots.
[37]
John W Mullennix, Steven E Stern, Stephen J Wilson, and Corrie lynn Dyson. 2003. Social perception of male and female computer synthesized speech. Computers in Human Behavior 19, 4 (2003), 407–424. https://doi.org/10.1016/S0747-5632(02)00081-X
[38]
Clifford Nass and Kwan Min Lee. 2000. Does Computer-Generated Speech Manifest Personality? An Experimental Test of Similarity-Attraction. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (The Hague, The Netherlands) (CHI ’00). Association for Computing Machinery, New York, NY, USA, 329–336. https://doi.org/10.1145/332040.332452
[39]
Clifford Nass and Youngme Moon. 2000. Machines and Mindlessness: Social Responses to Computers. Journal of Social Issues 56, 1 (2000), 81–103. https://doi.org/10.1111/0022-4537.00153 arXiv:https://spssi.onlinelibrary.wiley.com/doi/pdf/10.1111/0022-4537.00153
[40]
Clifford Nass, Youngme Moon, and Nancy Green. 1997. Are Machines Gender Neutral? Gender-Stereotypic Responses to Computers With Voices. Journal of Applied Social Psychology 27, 10 (1997), 864–876. https://doi.org/10.1111/j.1559-1816.1997.tb00275.x arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1559-1816.1997.tb00275.x
[41]
Clifford Nass, Jonathan Steuer, and Ellen R. Tauber. 1994. Computers Are Social Actors. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Boston, Massachusetts, USA) (CHI ’94). Association for Computing Machinery, New York, NY, USA, 72–78. https://doi.org/10.1145/191666.191703
[42]
Clifford Ivar Nass and Scott Brave. 2005. Wired for speech: How voice activates and advances the human-computer relationship. MIT press Cambridge.
[43]
Andreea Niculescu, Betsy Dijk, Anton Nijholt, Haizhou Li, and Sl See. 2013. Making Social Robots More Attractive: The Effects of Voice Pitch, Humor and Empathy. International Journal of Social Robotics 5 (04 2013), 171–191. https://doi.org/10.1007/s12369-012-0171-x
[44]
Andreea Niculescu, George M. White, See Swee Lan, Ratna Utari Waloejo, and Yoko Kawaguchi. 2008. Impact of English Regional Accents on User Acceptance of Voice User Interfaces. In Proceedings of the 5th Nordic Conference on Human-Computer Interaction: Building Bridges (Lund, Sweden) (NordiCHI ’08). Association for Computing Machinery, New York, NY, USA, 523–526. https://doi.org/10.1145/1463160.1463235
[45]
Daniel E. O’Leary. 2019. GOOGLE’S Duplex: Pretending to Be Human. Int. J. Intell. Syst. Account. Financ. Manage. 26, 1 (mar 2019), 46–53. https://doi.org/10.1002/isaf.1443
[46]
OpenAI. 2024. Navigating the Challenges and Opportunities of Synthetic Voices. https://openai.com/blog/navigating-the-challenges-and-opportunities-of-synthetic-voices. Accessed: 15 August 2023.
[47]
Sarah Perez. 2022. Siri gains a new gender-neutral voice option in latest iOS update. https://techcrunch.com/2022/02/24/siri-gains-a-new-gender-neutral-voice-option-in-latest-ios-update/. Accessed: 15 August 2023.
[48]
Patrik Juslin Petri Laukka and Roberto Bresin. 2005. A dimensional approach to vocal expression of emotion. Cognition and Emotion 19, 5 (2005), 633–653. https://doi.org/10.1080/02699930441000445 arXiv:https://doi.org/10.1080/02699930441000445
[49]
Silvia Quarteroni. 2018. Natural language processing for industrial applications. Spektrum 41, 2018 (2018), 105.
[50]
Emma Rodero and Ignacio Lucas. 2023. Synthetic versus human voices in audiobooks: The human emotional intimacy effect. New Media & Society 25, 7 (2023), 1746–1764. https://doi.org/10.1177/14614448211024142 arXiv:https://doi.org/10.1177/14614448211024142
[51]
Simon Schreibelmayr and Martina Mara. 2022. Robot Voices in Daily Life: Vocal Human-Likeness and Application Context as Determinants of User Acceptance. Frontiers in Psychology 13 (2022). https://doi.org/10.3389/fpsyg.2022.787499
[52]
Juliana Schroeder, Michael Kardas, and Nicholas Epley. 2017. The Humanizing Voice: Speech Reveals, and Text Conceals, a More Thoughtful Mind in the Midst of Disagreement. Psychological Science 28, 12 (2017), 1745–1762. https://doi.org/10.1177/0956797617713798 arXiv:https://doi.org/10.1177/0956797617713798PMID: 29068763.
[53]
Dagmar M. Schuller and Björn W. Schuller. 2021. A Review on Five Recent and Near-Future Developments in Computational Processing of Emotion in the Human Voice. Emotion Review 13, 1 (2021), 44–50. https://doi.org/10.1177/1754073919898526 arXiv:https://doi.org/10.1177/1754073919898526
[54]
Eric Hal Schwartz. 2022. New Neosapience Tool Synthesizes Any Text into Emotion for Virtual Actor Speeches – Exclusive. https://voicebot.ai/2022/09/14/new-neosapience-tool-synthesizes-any-text-into-emotion-for-virtual-actor-speeches-exclusive/. Accessed: 15 August 2023.
[55]
Eric Hal Schwartz. 2023. Synthetic Speech Startup ElevenLabs Raises $2M for AI Voices With Context-Relevant Emotion. https://voicebot.ai/2023/01/23/synthetic-speech-startup-elevenlabs-raises-2m-for-ai-voices-with-context-relevant-emotion/. Accessed: 15 August 2023.
[56]
Alex Sciuto, Arnita Saini, Jodi Forlizzi, and Jason I. Hong. 2018. "Hey Alexa, What’s Up?": A Mixed-Methods Studies of In-Home Conversational Agent Usage. In Proceedings of the 2018 Designing Interactive Systems Conference (Hong Kong, China) (DIS ’18). Association for Computing Machinery, New York, NY, USA, 857–868. https://doi.org/10.1145/3196709.3196772
[57]
Daniel B. Shank, Christopher Graves, Alexander Gott, Patrick Gamez, and Sophia Rodriguez. 2019. Feeling our way to machine minds: People’s emotions when perceiving mind in artificial intelligence. Computers in Human Behavior 98 (2019), 256–266. https://doi.org/10.1016/j.chb.2019.04.001
[58]
Daniel B. Shank, Christopher Graves, Alexander Gott, Patrick Gamez, and Sophia Rodriguez. 2019. Feeling our way to machine minds: People’s emotions when perceiving mind in artificial intelligence. Computers in Human Behavior 98 (2019), 256–266. https://doi.org/10.1016/j.chb.2019.04.001
[59]
Toshiyuki Shiwa, Takayuki Kanda, Michita Imai, Hiroshi Ishiguro, and Norihiro Hagita. 2009. How quickly should a communication robot respond? Delaying strategies and habituation effects. International Journal of Social Robotics 1 (2009), 141–155.
[60]
Steven E Stern, John W Mullennix, Corrie-lynn Dyson, and Stephen J Wilson. 1999. The persuasiveness of synthetic speech versus human speech. Human Factors 41, 4 (1999), 588–595.
[61]
Suno. 2023. Bark. https://github.com/suno-ai/bark.
[62]
Selina Jeanne Sutton, Paul Foulkes, David Kirk, and Shaun Lawson. 2019. Voice as a Design Material: Sociophonetic Inspired Design Strategies in Human-Computer Interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3290605.3300833
[63]
Elevenlabs Team. 2022. The first AI that can laugh. https://elevenlabs.io/blog/the_first_ai_that_can_laugh/. Accessed: 15 August 2023.
[64]
Sherry Turkle. 2017. Why these friendly robots can’t be good friends to our kids. https://www.washingtonpost.com/outlook/why-these-friendly-robots-cant-be-good-friends-to-our-kids/2017/12/07/bce1eaea-d54f-11e7-b62d-d9345ced896d_story.html. Accessed: 15 August 2023.
[65]
Sherry Turkle, Cynthia Breazeal, Olivia Dasté, and Brian Scassellati. 2006. Encounters with kismet and cog: Children respond to relational artifacts. Digital media: Transformations in human communication 120 (2006).
[66]
Eric Vanman and Arvid Kappas. 2019. “Danger, Will Robinson!” The challenges of social robots for intergroup relations. Social and Personality Psychology Compass 13 (07 2019). https://doi.org/10.1111/spc3.12489
[67]
Chengyi Wang, Sanyuan Chen, Yu Wu, Ziqiang Zhang, Long Zhou, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li, Lei He, Sheng Zhao, and Furu Wei. 2023. Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers. arxiv:2301.02111 [cs.CL]
[68]
Lingli Wang, Ni Huang, Yili Hong, Luning Liu, Xunhua Guo, and Guoqing Chen. 2023. Voice-based AI in call center customer service: A natural field experiment. Production and Operations Management 32, 4 (2023), 1002–1018. https://doi.org/10.1111/poms.13953 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/poms.13953
[69]
Yuxuan Wang and RJ Skerry-Ryan. 2018. Expressive Speech Synthesis with Tacotron. https://ai.googleblog.com/2018/03/expressive-speech-synthesis-with.html. Accessed: 15 August 2023.
[70]
Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, 2017. Tacotron: Towards end-to-end speech synthesis. arXiv preprint arXiv:1703.10135 (2017).
[71]
Adam Waytz, Joy Heafner, and Nicholas Epley. 2014. The mind in the machine: Anthropomorphism increases trust in an autonomous vehicle. Journal of Experimental Social Psychology 52 (2014), 113–117. https://doi.org/10.1016/j.jesp.2014.01.005
[72]
Cliff Weitzman. 2022. Most famous voice actors. https://speechify.com/blog/most-famous-voice-actors/. Accessed: 15 August 2023.

Index Terms

  1. Toward a Third-Kind Voice for Conversational Agents in an Era of Blurring Boundaries Between Machine and Human Sounds

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CUI '24: Proceedings of the 6th ACM Conference on Conversational User Interfaces
    July 2024
    616 pages
    ISBN:9798400705113
    DOI:10.1145/3640794
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 July 2024

    Check for updates

    Author Tags

    1. Artificial-sounding voice
    2. Efficiency
    3. Human-sounding voice
    4. Naturalness
    5. Speech synthesis
    6. Transparency
    7. Voice assistant
    8. Voice interaction
    9. Voice user interface
    10. Voice-based conversational agent

    Qualifiers

    • Extended-abstract
    • Research
    • Refereed limited

    Conference

    CUI '24
    Sponsor:
    CUI '24: ACM Conversational User Interfaces 2024
    July 8 - 10, 2024
    Luxembourg, Luxembourg

    Acceptance Rates

    Overall Acceptance Rate 34 of 100 submissions, 34%

    Upcoming Conference

    CUI '25
    ACM Conversational User Interfaces 2025
    July 7 - 9, 2025
    Waterloo , ON , Canada

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 81
      Total Downloads
    • Downloads (Last 12 months)81
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media