Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
In this paper, we present an application, which recognizes spoken Tamil utterances and speaks out the recognized text in Tamil through our Tamil text-to-speech (TTS) system. Further, we translate the recognized Tamil text to English using google translate and play it through our English TTS. Our Tamil speech recognition system, which can recognize about 75,000 words, has been trained on a 150-hour transcribed speech corpus. We have trained a deep neural network for the acoustic model and employed tri-gram language models to build our recognition system. Our Thirukkural TTS system performs unit-selection based, concatenative speech synthesis, using 2.5 hours of Tamil spoken utterances transcribed at the phone-level. Our English TTS uses 2.7 hours of phone-transcribed utterances. This is a technology demonstration of a complete web application, which, when perfected, could be used to assist Tamil users in learning English, by speaking in Tamil into the system. The playback of the recognized text from Tamil TTS serves to demonstrate the effectiveness of the Tamil ASR to the majority of the conference registrants (who cannot read the recognized Tamil text.
Intelligent Automation & Soft Computing
Speech processing is emerged as an essential task in the context of modern communication system. The concept of speech translation deals with the speech signals in a source language A to the target language B. In my work mainly speech to speech translation aiming our local language Telugu to English translation and our national language Hindi to English. In this process, first extraction of features and then reducing noise which can further be used to transfer into text form is done. In this paper, deep learning based modelling technique is employed for speech recognition. After conversion of the text, the data is compared with dictionary data as per the transcriptions for language identification. Mapping is used to generate the signals for transcription. The index value of the recognition is used for language identification. After the language is identified the phonetic approach is used for generate the corresponding text to speech signal.
In the current scenario, speech recognition for several languages is becoming more popular. Recognizing speech is a very difficult task in the Malayalam language. This project aims to establish a Formal Malayalam Speech to Text converter for the language of Malayalam. The system considers only isolated words with constrained vocabulary. The word which is spoken by the speaker is given as the input to the system is presented in the display as the output. We are using deep learning and feature extraction techniques for this project. The proposed system is taking around 5-10 isolated words for tutoring the machine. Since the system is depending on the speaker voice, at the beginning the words are stored in .wav (waveform audio) file for training procedure. Several samples are stored and trained for each word. The input audio word will be collated with these stored words. Pre-processing process includes the transformation of speech signal into digitized format. This digital signal is passed to the first order filters for the smoothening signals, which would help in the rise of signal's energy at a higher frequency. MFCC is the systematic technique for feature extraction. Mel-frequency cepstral coefficients are obtained, after the completion of this phase. MFCC examines the frequencies with human perception sensitivity. Following the pre-processing, syllabification, and feature extraction procedure, HMM is used to identify the speech and training. The speech recognition system based on ANN was implemented using LSTM which is a common form of neural network.
Interspeech 2018, 2018
Proceedings of the IPSL Technical Sessions, 21 (2005) 1-8
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion
IRJET, 2022
The term "machine translation" refers to the automatic translation of one natural language into another. The fundamental goal is to bridge the language divide between people, communities, or countries who speak different languages. There are 18 official languages and ten widely used scripts. The majority of Indians, particularly isolated peasants, do not understand, read, or write English, necessitating the implementation of an effective language translator. Machine translation systems that convert text from one language to another will help Indians live in a more enlightened society that is free of language barriers. We propose an English to Hindi machine translation system based on recurrent neural networks (RNN), LSTM (Long short-term memory), and attention mechanisms, as English is a worldwide language and Hindi is the language spoken by the majority of Indians.
International Journal for Research in Applied Science & Engineering Technology (IJRASET), 2021
This paper introduces a new text-to-speech presentation from end-to-end (E2E-TTS) using toolkit called ESPnet-TTS, which is an open source extension. ESPnet speech processing tools kit. Various models come under ESPnet TTS TacoTron 2, Transformer TTS, and Fast Speech. This also provides recipes recommended by the Kaldi speech recognition tool kit (ASR). Recipes based on the composition combined with the ESPnet ASR recipe, which provides high performance. This toolkit also provides pre-trained models and samples of all recipes for users to use as a base .It works on TTS-STT and translation features for various indicator languages, with a strong focus on English, Marathi and Hindi. This paper also shows that neural sequence-to-sequence models find the state of the art or near the effects of the art state on existing databases. We also analyze some of the key design challenges that contribute to the development of a multilingual business translation system, which includes processing bilingual business data sets and evaluating multiple translation methods. The test result can be obtained using tokens and these test results show that our models can achieve modern performance compared to the latest LJ Speech tool kit data. Terms of Reference-Open source, end-to-end, text-to-speech I.
Research Data Repository, Duke University
Iranian Journal of Operations Research, 2024
Medicine & Science in Sports & Exercise, 2014
Archäologische Ausgrabungen in Baden-Württemberg, 2023
Problems and Perspectives in Management, 2019