Keyword: speech : Search

research-article

S3: Speech, Script and Scene driven Head and Eye Animation

ACM Transactions on Graphics (TOG), Volume 43, Issue 4Article No.: 47, Pages 1–12https://doi.org/10.1145/3658172

We present S³, a novel approach to generating expressive, animator-centric 3D head and eye animation of characters in conversation. Given speech audio, a Directorial script and a cinematographic 3D scene as input, we automatically output the animated 3D ...

Article

Feature Engineering for Music/Speech Detection in Costa Rica Radio Broadcast

Pattern RecognitionJun 2024, Pages 84–95https://doi.org/10.1007/978-3-031-62836-8_9

Abstract

The exponential growth of audio data in radio broadcasts has generated the need for efficient tools for their manipulation and analysis to develop systems such as audio content classification and enhance user experience. In this study, we explore ...

research-article

Uncovering Human Traits in Determining Real and Spoofed Audio: Insights from Blind and Sighted Individuals

CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing SystemsMay 2024, Article No.: 949, Pages 1–14https://doi.org/10.1145/3613904.3642817

This paper explores how blind and sighted individuals perceive real and spoofed audio, highlighting differences and similarities between the groups. Through two studies, we find that both groups focus on specific human traits in audio–such as accents, ...

research-article

Open Access

Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation

CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing SystemsMay 2024, Article No.: 1043, Pages 1–19https://doi.org/10.1145/3613904.3642217

Dictation enables efficient text input on mobile devices. However, writing with speech can produce disfluent, wordy, and incoherent text and thus requires heavy post-processing. This paper presents Rambler, an LLM-powered graphical user interface that ...

research-article

Open Access

Deepfake Speech Detection: A Spectrogram Analysis

SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied ComputingApril 2024, Pages 1312–1320https://doi.org/10.1145/3605098.3635911

The current voice biometric systems have no natural mechanics to defend against deepfake spoofing attacks. Thus, supporting these systems with a deepfake detection solution is necessary. One of the latest approaches to deepfake speech detection is ...

short-paper

Open Access

Best Student Paper

Development of a Socially Cognizant Robotic Campus Guide

HRI '24: Companion of the 2024 ACM/IEEE International Conference on Human-Robot InteractionMarch 2024, Pages 1229–1232https://doi.org/10.1145/3610978.3641263

A robotic system to help lost students find their way around a college campus was designed, built, and tested. Socially cognizant design practices, including stakeholder engagement, and interdisciplinary team-building, were practiced. Users can interact ...

research-article

Open Access

Investigating Generalizability of Speech-based Suicidal Ideation Detection Using Mobile Phones

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), Volume 7, Issue 4Article No.: 174, Pages 1–38https://doi.org/10.1145/3631452

Speech-based diaries from mobile phones can capture paralinguistic patterns that help detect mental illness symptoms such as suicidal ideation. However, previous studies have primarily evaluated machine learning models on a single dataset, making their ...

research-article

Analysis of modified palatal surface for better speech in edentulous patients: A clinico-analytical study

Technology and Health Care (TAHC), Volume 32, Issue 22024, Pages 1055–1065https://doi.org/10.3233/THC-230477

BACKGROUND:

Phonetics with mechanics and aesthetics are considered cardinal factors contributing to the success of complete dentures.

OBJECTIVE:

The aim of the current study was to evaluate the changes in speech in complete denture patients with and ...

Article

Time Distributed Multiview Representation for Speech Emotion Recognition

Progress in Pattern Recognition, Image Analysis, Computer Vision, and ApplicationsNov 2023, Pages 148–162https://doi.org/10.1007/978-3-031-49018-7_11

Abstract

In recent years, speech-emotion recognition (SER) techniques have gained importance, mainly in human-computer interaction studies and applications. This research area has different challenges, including developing new and efficient detection ...

research-article

Open Access

Pantœnna: Mouth pose estimation for ar/vr headsets using low-profile antenna and impedance characteristic sensing

UIST '23: Proceedings of the 36th Annual ACM Symposium on User Interface Software and TechnologyOctober 2023, Article No.: 83, Pages 1–12https://doi.org/10.1145/3586183.3606805

Methods for faithfully capturing a user’s holistic pose have immediate uses in AR/VR, ranging from multimodal input to expressive avatars. Although body-tracking has received the most attention, the mouth is also of particular importance, given that it ...

demonstration

LiveLocalizer: Augmenting Mobile Speech-to-Text with Microphone Arrays, Optimized Localization and Beamforming

UIST '23 Adjunct: Adjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and TechnologyOctober 2023, Article No.: 75, Pages 1–3https://doi.org/10.1145/3586182.3615789

Speech-to-text capabilities on mobile devices have proven helpful for language translation, note-taking, hearing and speech accessibility, and meeting transcripts. However, their usefulness is constrained by being unable to distinguish between multiple ...

research-article

Open Access

Multimodal Analysis and Assessment of Therapist Empathy in Motivational Interviews

ICMI '23: Proceedings of the 25th International Conference on Multimodal InteractionOctober 2023, Pages 406–415https://doi.org/10.1145/3577190.3614105

The quality and effectiveness of psychotherapy sessions are highly influenced by the therapists’ ability to meaningfully connect with clients. Automated assessment of therapist empathy provides cost-effective and systematic means of assessing the ...

research-article

Open Access

Automated Face-To-Face Conversation Detection on a Commodity Smartwatch with Acoustic Sensing

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), Volume 7, Issue 3Article No.: 109, Pages 1–29https://doi.org/10.1145/3610882

Understanding social interactions is relevant across many domains and applications, including psychology, behavioral sciences, human computer interaction, and healthcare. In this paper, we present a practical approach for automatically detecting face-to-...

Article

Ternary Data, Triangle Decoding, Three Tasks, a Multitask Learning Speech Translation Model

Artificial Neural Networks and Machine Learning – ICANN 2023Sep 2023, Pages 579–590https://doi.org/10.1007/978-3-031-44213-1_48

Abstract

Direct end-to-end approaches for speech translation (ST) are now competing with the traditional cascade solutions. However, end-to-end models still suffer from the challenge of ST data scarcity. How to effectively utilize the limited ST data or ...

Article

Multimodal Emotion Recognition System Through Three Different Channels (MER-3C)

Advanced Concepts for Intelligent Vision SystemsAug 2023, Pages 196–208https://doi.org/10.1007/978-3-031-45382-3_17

Abstract

The field of machine learning and computer science known as “affective computing” focuses on how to recognize and analyze human emotions. Different modalities can complement or enhance one another. This paper focuses on merging three modalities, ...

research-article

RoboClean: Contextual Language Grounding for Human-Robot Interactions in Specialised Low-Resource Environments

CUI '23: Proceedings of the 5th International Conference on Conversational User InterfacesJuly 2023, Article No.: 35, Pages 1–11https://doi.org/10.1145/3571884.3597137

Building effective voice interfaces for the instruction of service robots in specialised environments is difficult due to the local knowledge of workers, such as specific terminology for objects and space, leading to limited data to train language ...

research-article

Gist and Verbatim: Understanding Speech to Inform New Interfaces for Verbal Text Composition

CUI '23: Proceedings of the 5th International Conference on Conversational User InterfacesJuly 2023, Article No.: 15, Pages 1–11https://doi.org/10.1145/3571884.3597134

Recent interest in speech-to-text applications has found speech to be an efficient modality for text input. However, the spontaneity of speech makes direct transcriptions of spoken compositions effortful to edit. While previous works in Human-Computer ...

research-article

Open Access

Augmented Datasheets for Speech Datasets and Ethical Decision-Making

FAccT '23: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and TransparencyJune 2023, Pages 881–904https://doi.org/10.1145/3593013.3594049

Speech datasets are crucial for training Speech Language Technologies (SLT); however, the lack of diversity of the underlying training data can lead to serious limitations in building equitable and robust SLT products, especially along dimensions of ...

demonstration

Public Access

External noise reduction using WhisperMask, a mask-type wearable microphone

CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing SystemsApril 2023, Article No.: 454, Pages 1–5https://doi.org/10.1145/3544549.3583936

Conversation through voice is one of the basic means of human communication. Online communication systems and voice user interfaces for operating smartphones or smart devices are increasingly important as an interface that anyone can use daily. However, ...

extended-abstract

Measuring Trust in Children's Speech: Towards Responsible Robot-Supported Information Search

Ella Velner

HRI '23: Companion of the 2023 ACM/IEEE International Conference on Human-Robot InteractionMarch 2023, Pages 748–750https://doi.org/10.1145/3568294.3579973

Children use conversational agents, such as Alexa or Siri, to search for information, but also tend to trust these agents which might influence their information assessment. It is challenging for children to assess the veracity of information retrieved ...

Applied Filters

People

Names

Institutions

Authors

Editors

Reviewers

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Paper Award

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences