Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- surveyJanuary 2025JUST ACCEPTED
A Survey on Speech Deepfake Detection
ACM Computing Surveys (CSUR), Just Accepted https://doi.org/10.1145/3714458The availability of smart devices leads to an exponential increase in multimedia content. However, advancements in deep learning have also enabled the creation of highly sophisticated deepfake content, including speech Deepfakes, which pose a serious ...
- ArticleSeptember 2024
Effects of Training Strategies and the Amount of Speech Data on the Quality of Speech Synthesis
Text, Speech, and DialoguePages 94–104https://doi.org/10.1007/978-3-031-70566-3_9AbstractDuring the development of a speech synthesizer, we often face a lack of training data. This paper describes how the amount of data used to train a speech synthesizer affects the quality of the final synthetic speech. To answer this question, we ...
- ArticleAugust 2024
RSET: Remapping-Based Sorting Method for Emotion Transfer Speech Synthesis
AbstractAlthough current Text-To-Speech (TTS) models are able to generate high-quality speech samples, there are still challenges in developing emotion intensity controllable TTS. Most existing TTS models achieve emotion intensity control by extracting ...
- articleJuly 2024
A Teaching Mode of College English Listening in Intelligent Phonetic Environments
International Journal of e-Collaboration (IJEC-IGI), Volume 20, Issue 1Pages 1–17https://doi.org/10.4018/IJeC.347986This paper discusses the integration of cutting-edge technologies, especially artificial intelligence (AI) and speech synthesis in UETL environment. By using methods based on artificial intelligence, such as Fuzzy Convolutional Neural Network (FCNN) and ...
- extended-abstractJuly 2024
Toward a Third-Kind Voice for Conversational Agents in an Era of Blurring Boundaries Between Machine and Human Sounds
CUI '24: Proceedings of the 6th ACM Conference on Conversational User InterfacesArticle No.: 55, Pages 1–7https://doi.org/10.1145/3640794.3665880The voice of widely used conversational agents (CAs) is standardized to be highly intelligible, yet it still sounds machine-generated due to its artificial qualities. With advancements in deep neural networks, voice synthesis technology has become ...
-
- research-articleMay 2024
Optimizing feature fusion for improved zero-shot adaptation in text-to-speech synthesis
EURASIP Journal on Audio, Speech, and Music Processing (EJASMP), Volume 2024, Issue 1https://doi.org/10.1186/s13636-024-00351-9AbstractIn the era of advanced text-to-speech (TTS) systems capable of generating high-fidelity, human-like speech by referring a reference speech, voice cloning (VC), or zero-shot TTS (ZS-TTS), stands out as an important subtask. A primary challenge in ...
- research-articleApril 2024
Though this be hesitant, yet there is method in ’t: Effects of disfluency patterns in neural speech synthesis for cultural heritage presentations
Computer Speech and Language (CSPL), Volume 85, Issue Chttps://doi.org/10.1016/j.csl.2023.101585AbstractThis study presents the results of two perception experiments aimed at evaluating the effect that specific patterns of disfluencies have on people listening to synthetic speech. We consider the particular case of Cultural Heritage presentations ...
Highlights- Neural speech synthesis systems model speech phenomena in a natural-sounding way.
- Neural synthesis can work as a tool to investigate human speech behaviours.
- In specific contexts, speech disfluency phenomena foster listeners’ ...
- research-articleJuly 2024
Improvement of 2.4kbps LPC10 algorithm based on LSF parameters
SSPS '24: Proceedings of the 2024 6th International Symposium on Signal Processing SystemsPages 22–28https://doi.org/10.1145/3665053.3665064Short-wave communication has unstable channel conditions, but it is still widely used in military and diplomatic fields due to its security, high destruction resistance and full coverage characteristics. Therefore, in order to adapt to transmission in ...
- research-articleApril 2024
Creating an African American-Sounding TTS: Guidelines, Technical Challenges, and Surprising Evaluations
IUI '24: Proceedings of the 29th International Conference on Intelligent User InterfacesPages 259–273https://doi.org/10.1145/3640543.3645165Representations of AI agents in user interfaces and robotics are predominantly White, not only in terms of facial and skin features, but also in the synthetic voices they use. In this paper we explore some unexpected challenges in the representation of ...
- research-articleJanuary 2024
Choosing only the best voice imitators: Top-K many-to-many voice conversion with StarGAN
Speech Communication (SPCO), Volume 156, Issue Chttps://doi.org/10.1016/j.specom.2023.103022AbstractVoice conversion systems have become increasingly important as the use of voice technology grows. Deep learning techniques, specifically generative adversarial networks (GANs), have enabled significant progress in the creation of synthetic media, ...
Highlights- Top-K improves GAN-based voice conversion systems for better quality & naturalness.
- Top-K improves convergence & training stability of voice conversion systems.
- Effectiveness of Top-K is supported through quantitative and ...
- ArticleNovember 2023
Curriculum Learning Based Approach for Faster Convergence of TTS Model
Speech and ComputerPages 208–221https://doi.org/10.1007/978-3-031-48312-7_17AbstractWith the advent of deep learning, Text-to-Speech technology has been revolutionized, and current state-of-the-art models are capable of synthesizing almost human-like speech. Recent Text-to-Speech models use a sequence-to-sequence architecture ...
- demonstrationMay 2024
InnoGuideGPT: Integrating conversational interface and command interpretation for navigation robots
- Rahul Sundar,
- Shreyash Gadgil,
- Tankala Satya Sai,
- Sathi Sai Krishna Reddy,
- Gautam B,
- Ishita Mittal,
- Jyotsna Sree Guduguntla,
- Shanmukesh Pujala
AIMLSystems '23: Proceedings of the Third International Conference on AI-ML SystemsArticle No.: 57, Pages 1–3https://doi.org/10.1145/3639856.3639915Integrating natural language understanding, voice command interpretation and natural language generation for realtime inference is a challenging problem. However, developing a proof of concept is now possible in just a few lines of code which was ...
- research-articleOctober 2023
Determining spectral stability in vowels: A comparison and assessment of different metrics
Speech Communication (SPCO), Volume 154, Issue Chttps://doi.org/10.1016/j.specom.2023.102984Highlights- Different metrics for spectral stability identification in vowels are discussed.
- A new metric is introduced.
- The different metrics are assessed both on synthesized and natural speech.
- Higher-dimensional metrics capture spectral ...
This study investigated the performance of several metrics used to evaluate spectral stability in vowels. Four metrics suggested in the literature and a newly developed one were tested and compared to the traditional method of associating the ...
Graphical abstractDisplay Omitted
- research-articleJune 2023
Battling voice spoofing: a review, comparative analysis, and generalizability evaluation of state-of-the-art voice spoofing counter measures
Artificial Intelligence Review (ARTR), Volume 56, Issue Suppl 1Pages 513–566https://doi.org/10.1007/s10462-023-10539-8AbstractWith the advent of automated speaker verification (ASV) systems comes an equal and opposite development: malicious actors may seek to use voice spoofing attacks to fool those same systems. Various counter measures have been proposed to detect ...
- research-articleJune 2023
SpoTNet: A spoofing-aware Transformer Network for Effective Synthetic Speech Detection
MAD '23: Proceedings of the 2nd ACM International Workshop on Multimedia AI against DisinformationPages 10–18https://doi.org/10.1145/3592572.3592841The prevalence of voice spoofing attacks in today’s digital world has become a critical security concern. Attackers employ various techniques, such as voice conversion (VC) and text-to-speech (TTS), to generate synthetic speech that imitates the victim’...
- research-articleJune 2023
Deep learning-based speaker-adaptive postfiltering with limited adaptation data for embedded text-to-speech synthesis systems
Computer Speech and Language (CSPL), Volume 81, Issue Chttps://doi.org/10.1016/j.csl.2023.101520AbstractEnd-to-end (e2e) speech synthesis systems have become popular with the recent introduction of text-to-spectrogram conversion systems, such as Tacotron, that use encoder–decoder-based neural architectures. Even though those sequence-to-...
Highlights- Few-shot speaker adaptation using lightweight neural postfilters for text-to-speech.
- ArticleNovember 2022
Analysis-By-Synthesis Modeling of Bengali Intonation
Speech and ComputerPages 533–544https://doi.org/10.1007/978-3-031-20980-2_45AbstractThe main concern behind deriving natural sounding synthesized speech lies in the objective mapping of the relation between formal and functional representations of prosody in human speech. Besides stress, rhythm, and duration, intonation is the ...
- research-articleOctober 2022
Investigations on speaker adaptation using a continuous vocoder within recurrent neural network based text-to-speech synthesis
Multimedia Tools and Applications (MTAA), Volume 82, Issue 10Pages 15635–15649https://doi.org/10.1007/s11042-022-14005-5AbstractThis paper presents an investigation of speaker adaptation using a continuous vocoder for parametric text-to-speech (TTS) synthesis. In purposes that demand low computational complexity, conventional vocoder-based statistical parametric speech ...
- research-articleSeptember 2022
A deep learning approaches in text-to-speech system: a systematic review and recent research perspective
Multimedia Tools and Applications (MTAA), Volume 82, Issue 10Pages 15171–15197https://doi.org/10.1007/s11042-022-13943-4AbstractText-to-speech systems (TTS) have come a long way in the last decade and are now a popular research topic for creating various human-computer interaction systems. Although, a range of speech synthesis models for various languages with several ...
- brief-reportSeptember 2022
TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese
- Edresson Casanova,
- Arnaldo Candido Junior,
- Christopher Shulby,
- Frederico Santos de Oliveira,
- João Paulo Teixeira,
- Moacir Antonelli Ponti,
- Sandra Aluísio
Language Resources and Evaluation (SPLRE), Volume 56, Issue 3Pages 1043–1055https://doi.org/10.1007/s10579-021-09570-4AbstractSpeech provides a natural way for human–computer interaction. In particular, speech synthesis systems are popular in different applications, such as personal assistants, GPS applications, screen readers and accessibility tools. However, not all ...