Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJuly 2024
S3: Speech, Script and Scene driven Head and Eye Animation
ACM Transactions on Graphics (TOG), Volume 43, Issue 4Article No.: 47, Pages 1–12https://doi.org/10.1145/3658172We present S3, a novel approach to generating expressive, animator-centric 3D head and eye animation of characters in conversation. Given speech audio, a Directorial script and a cinematographic 3D scene as input, we automatically output the animated 3D ...
- ArticleJune 2024
Feature Engineering for Music/Speech Detection in Costa Rica Radio Broadcast
- Juan Angel Acosta-Ceja,
- Marvin Coto-Jiménez,
- Máximo Eduardo Sánchez-Gutiérrez,
- Alma Rocío Sagaceta-Mejía,
- Julián Alberto Fresán-Figueroa
Pattern RecognitionJun 2024, Pages 84–95https://doi.org/10.1007/978-3-031-62836-8_9AbstractThe exponential growth of audio data in radio broadcasts has generated the need for efficient tools for their manipulation and analysis to develop systems such as audio content classification and enhance user experience. In this study, we explore ...
- research-articleMay 2024
Uncovering Human Traits in Determining Real and Spoofed Audio: Insights from Blind and Sighted Individuals
CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing SystemsMay 2024, Article No.: 949, Pages 1–14https://doi.org/10.1145/3613904.3642817This paper explores how blind and sighted individuals perceive real and spoofed audio, highlighting differences and similarities between the groups. Through two studies, we find that both groups focus on specific human traits in audio–such as accents, ...
Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation
- Susan Lin,
- Jeremy Warner,
- J.D. Zamfirescu-Pereira,
- Matthew G Lee,
- Sauhard Jain,
- Shanqing Cai,
- Piyawat Lertvittayakumjorn,
- Michael Xuelin Huang,
- Shumin Zhai,
- Bjoern Hartmann,
- Can Liu
CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing SystemsMay 2024, Article No.: 1043, Pages 1–19https://doi.org/10.1145/3613904.3642217Dictation enables efficient text input on mobile devices. However, writing with speech can produce disfluent, wordy, and incoherent text and thus requires heavy post-processing. This paper presents Rambler, an LLM-powered graphical user interface that ...
- research-articleMay 2024
Deepfake Speech Detection: A Spectrogram Analysis
SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied ComputingApril 2024, Pages 1312–1320https://doi.org/10.1145/3605098.3635911The current voice biometric systems have no natural mechanics to defend against deepfake spoofing attacks. Thus, supporting these systems with a deepfake detection solution is necessary. One of the latest approaches to deepfake speech detection is ...
-
- short-paperMarch 2024Best Student Paper
Development of a Socially Cognizant Robotic Campus Guide
- Benjamin Greenberg,
- Daniel Nakhimovich,
- Richard Magnotti,
- Hriday Purohit,
- Sanskar Shah,
- Aniket Satish Kulkarni,
- Uriel Gonzalez-Bravo,
- Noah R. Carver
HRI '24: Companion of the 2024 ACM/IEEE International Conference on Human-Robot InteractionMarch 2024, Pages 1229–1232https://doi.org/10.1145/3610978.3641263A robotic system to help lost students find their way around a college campus was designed, built, and tested. Socially cognizant design practices, including stakeholder engagement, and interdisciplinary team-building, were practiced. Users can interact ...
- research-articleJanuary 2024
Investigating Generalizability of Speech-based Suicidal Ideation Detection Using Mobile Phones
- Arvind Pillai,
- Subigya Kumar Nepal,
- Weichen Wang,
- Matthew Nemesure,
- Michael Heinz,
- George Price,
- Damien Lekkas,
- Amanda C. Collins,
- Tess Griffin,
- Benjamin Buck,
- Sarah Masud Preum,
- Trevor Cohen,
- Nicholas C. Jacobson,
- Dror Ben-Zeev,
- Andrew Campbell
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), Volume 7, Issue 4Article No.: 174, Pages 1–38https://doi.org/10.1145/3631452Speech-based diaries from mobile phones can capture paralinguistic patterns that help detect mental illness symptoms such as suicidal ideation. However, previous studies have primarily evaluated machine learning models on a single dataset, making their ...
- research-articleMarch 2024
Analysis of modified palatal surface for better speech in edentulous patients: A clinico-analytical study
- Anuj K. Shukla,
- Saurabh Chaturvedi,
- Abdul Razzaq Ahmed,
- Hoda Lofty Abouzeid,
- Ghazala Suleman,
- Rania A. Sharif,
- Vishwanath Gurumurthy,
- Marco Cicciù,
- Giuseppe Minervini
Technology and Health Care (TAHC), Volume 32, Issue 22024, Pages 1055–1065https://doi.org/10.3233/THC-230477BACKGROUND:Phonetics with mechanics and aesthetics are considered cardinal factors contributing to the success of complete dentures.
OBJECTIVE:The aim of the current study was to evaluate the changes in speech in complete denture patients with and ...
- ArticleNovember 2023
Time Distributed Multiview Representation for Speech Emotion Recognition
Progress in Pattern Recognition, Image Analysis, Computer Vision, and ApplicationsNov 2023, Pages 148–162https://doi.org/10.1007/978-3-031-49018-7_11AbstractIn recent years, speech-emotion recognition (SER) techniques have gained importance, mainly in human-computer interaction studies and applications. This research area has different challenges, including developing new and efficient detection ...
- research-articleOctober 2023
Pantœnna: Mouth pose estimation for ar/vr headsets using low-profile antenna and impedance characteristic sensing
UIST '23: Proceedings of the 36th Annual ACM Symposium on User Interface Software and TechnologyOctober 2023, Article No.: 83, Pages 1–12https://doi.org/10.1145/3586183.3606805Methods for faithfully capturing a user’s holistic pose have immediate uses in AR/VR, ranging from multimodal input to expressive avatars. Although body-tracking has received the most attention, the mouth is also of particular importance, given that it ...
- demonstrationOctober 2023
LiveLocalizer: Augmenting Mobile Speech-to-Text with Microphone Arrays, Optimized Localization and Beamforming
UIST '23 Adjunct: Adjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and TechnologyOctober 2023, Article No.: 75, Pages 1–3https://doi.org/10.1145/3586182.3615789Speech-to-text capabilities on mobile devices have proven helpful for language translation, note-taking, hearing and speech accessibility, and meeting transcripts. However, their usefulness is constrained by being unable to distinguish between multiple ...
- research-articleOctober 2023
Multimodal Analysis and Assessment of Therapist Empathy in Motivational Interviews
- Trang Tran,
- Yufeng Yin,
- Leili Tavabi,
- Joannalyn Delacruz,
- Brian Borsari,
- Joshua D Woolley,
- Stefan Scherer,
- Mohammad Soleymani
ICMI '23: Proceedings of the 25th International Conference on Multimodal InteractionOctober 2023, Pages 406–415https://doi.org/10.1145/3577190.3614105The quality and effectiveness of psychotherapy sessions are highly influenced by the therapists’ ability to meaningfully connect with clients. Automated assessment of therapist empathy provides cost-effective and systematic means of assessing the ...
- research-articleSeptember 2023
Automated Face-To-Face Conversation Detection on a Commodity Smartwatch with Acoustic Sensing
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), Volume 7, Issue 3Article No.: 109, Pages 1–29https://doi.org/10.1145/3610882Understanding social interactions is relevant across many domains and applications, including psychology, behavioral sciences, human computer interaction, and healthcare. In this paper, we present a practical approach for automatically detecting face-to-...
- ArticleSeptember 2023
Ternary Data, Triangle Decoding, Three Tasks, a Multitask Learning Speech Translation Model
Artificial Neural Networks and Machine Learning – ICANN 2023Sep 2023, Pages 579–590https://doi.org/10.1007/978-3-031-44213-1_48AbstractDirect end-to-end approaches for speech translation (ST) are now competing with the traditional cascade solutions. However, end-to-end models still suffer from the challenge of ST data scarcity. How to effectively utilize the limited ST data or ...
- ArticleNovember 2023
Multimodal Emotion Recognition System Through Three Different Channels (MER-3C)
Advanced Concepts for Intelligent Vision SystemsAug 2023, Pages 196–208https://doi.org/10.1007/978-3-031-45382-3_17AbstractThe field of machine learning and computer science known as “affective computing” focuses on how to recognize and analyze human emotions. Different modalities can complement or enhance one another. This paper focuses on merging three modalities, ...
- research-articleJuly 2023
RoboClean: Contextual Language Grounding for Human-Robot Interactions in Specialised Low-Resource Environments
CUI '23: Proceedings of the 5th International Conference on Conversational User InterfacesJuly 2023, Article No.: 35, Pages 1–11https://doi.org/10.1145/3571884.3597137Building effective voice interfaces for the instruction of service robots in specialised environments is difficult due to the local knowledge of workers, such as specific terminology for objects and space, leading to limited data to train language ...
- research-articleJuly 2023
Gist and Verbatim: Understanding Speech to Inform New Interfaces for Verbal Text Composition
CUI '23: Proceedings of the 5th International Conference on Conversational User InterfacesJuly 2023, Article No.: 15, Pages 1–11https://doi.org/10.1145/3571884.3597134Recent interest in speech-to-text applications has found speech to be an efficient modality for text input. However, the spontaneity of speech makes direct transcriptions of spoken compositions effortful to edit. While previous works in Human-Computer ...
- research-articleJune 2023
Augmented Datasheets for Speech Datasets and Ethical Decision-Making
- Orestis Papakyriakopoulos,
- Anna Seo Gyeong Choi,
- William Thong,
- Dora Zhao,
- Jerone Andrews,
- Rebecca Bourke,
- Alice Xiang,
- Allison Koenecke
FAccT '23: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and TransparencyJune 2023, Pages 881–904https://doi.org/10.1145/3593013.3594049Speech datasets are crucial for training Speech Language Technologies (SLT); however, the lack of diversity of the underlying training data can lead to serious limitations in building equitable and robust SLT products, especially along dimensions of ...
- demonstrationApril 2023
External noise reduction using WhisperMask, a mask-type wearable microphone
CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing SystemsApril 2023, Article No.: 454, Pages 1–5https://doi.org/10.1145/3544549.3583936Conversation through voice is one of the basic means of human communication. Online communication systems and voice user interfaces for operating smartphones or smart devices are increasingly important as an interface that anyone can use daily. However, ...
- extended-abstractMarch 2023
Measuring Trust in Children's Speech: Towards Responsible Robot-Supported Information Search
HRI '23: Companion of the 2023 ACM/IEEE International Conference on Human-Robot InteractionMarch 2023, Pages 748–750https://doi.org/10.1145/3568294.3579973Children use conversational agents, such as Alexa or Siri, to search for information, but also tend to trust these agents which might influence their information assessment. It is challenging for children to assess the veracity of information retrieved ...