default search action
Tomoki Toda
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j71]Haruki Yamashita, Takuma Okamoto, Ryoichi Takashima, Yamato Ohtani, Tetsuya Takiguchi, Tomoki Toda, Hisashi Kawai:
Fast Neural Speech Waveform Generative Models With Fully-Connected Layer-Based Upsampling. IEEE Access 12: 31409-31421 (2024) - [j70]Mohammad Eshghi, Tomoki Toda:
An Investigation of Fundamental Frequency Pattern Prediction for Japanese Electrolaryngeal Speech Enhancement Based on Frame-Wise Phoneme Representations. IEEE Access 12: 50137-50153 (2024) - [j69]Wen-Chin Huang, Yi-Chiao Wu, Tomoki Toda:
Multi-Speaker Text-to-Speech Training With Speaker Anonymized Data. IEEE Signal Process. Lett. 31: 2995-2999 (2024) - [j68]Rui Wang, Li Li, Tomoki Toda:
Dual-Channel Target Speaker Extraction Based on Conditional Variational Autoencoder and Directional Information. IEEE ACM Trans. Audio Speech Lang. Process. 32: 1968-1979 (2024) - [j67]Lester Phillip Violeta, Ding Ma, Wen-Chin Huang, Tomoki Toda:
Pretraining and Adaptation Techniques for Electrolaryngeal Speech Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 32: 2777-2789 (2024) - [j66]Shuming Luan, Yukoh Wakabayashi, Tomoki Toda:
Unequally Spaced Sound Field Interpolation for Rotation-Robust Beamforming. IEEE ACM Trans. Audio Speech Lang. Process. 32: 3185-3199 (2024) - [c339]Takuya Fujimura, Keisuke Imoto, Tomoki Toda:
Discriminative Neighborhood Smoothing for Generative Anomalous Sound Detection. EUSIPCO 2024: 156-160 - [c338]Jiachen Wang, Tomoki Toda:
Unsupervised Training of Neural Network-Based Virtual Microphone Estimator. EUSIPCO 2024: 256-260 - [c337]Tatsuya Komatsu, Yusuke Fujita, Kazuya Takeda, Tomoki Toda:
Audio Difference Learning for Audio Captioning. ICASSP 2024: 1456-1460 - [c336]Yamato Ohtani, Takuma Okamoto, Tomoki Toda, Hisashi Kawai:
FIRNet: Fundamental Frequency Controllable Fast Neural Vocoder With Trainable Finite Impulse Response Filter. ICASSP 2024: 10871-10875 - [c335]Lester Phillip Violeta, Wen-Chin Huang, Ding Ma, Ryuichi Yamamoto, Kazuhiro Kobayashi, Tomoki Toda:
Electrolaryngeal Speech Intelligibility Enhancement through Robust Linguistic Encoders. ICASSP 2024: 10961-10965 - [c334]Jiajun He, Xiaohan Shi, Xingfeng Li, Tomoki Toda:
MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, Asr Error Detection, and Asr Error Correction. ICASSP 2024: 11066-11070 - [c333]Takuma Okamoto, Yamato Ohtani, Tomoki Toda, Hisashi Kawai:
Convnext-TTS And Convnext-VC: Convnext-Based Fast End-To-End Sequence-To-Sequence Text-To-Speech And Voice Conversion. ICASSP 2024: 12456-12460 - [i85]Jiajun He, Xiaohan Shi, Xingfeng Li, Tomoki Toda:
MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, ASR Error Detection, and ASR Error Correction. CoRR abs/2401.13260 (2024) - [i84]Yusuke Yasuda, Tomoki Toda:
Automatic design optimization of preference-based subjective evaluation with online learning in crowdsourcing environment. CoRR abs/2403.06100 (2024) - [i83]Yuka Hashizume, Li Li, Atsushi Miyashita, Tomoki Toda:
Learning Multidimensional Disentangled Representations of Instrumental Sounds for Musical Similarity Assessment. CoRR abs/2404.06682 (2024) - [i82]You Zhang, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Tomoki Toda, Zhiyao Duan:
SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan. CoRR abs/2405.05244 (2024) - [i81]Wen-Chin Huang, Yi-Chiao Wu, Tomoki Toda:
Multi-speaker Text-to-speech Training with Speaker Anonymized Data. CoRR abs/2405.11767 (2024) - [i80]Yongyi Zang, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu, Wenxiao Zhao, Jing Guo, Tomoki Toda, Zhiyao Duan:
CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection. CoRR abs/2406.02438 (2024) - [i79]Jiajun He, Tomoki Toda:
2DP-2MRC: 2-Dimensional Pointer-based Machine Reading Comprehension Method for Multimodal Moment Retrieval. CoRR abs/2406.06201 (2024) - [i78]Bence Mark Halpern, Thomas Tienkamp, Wen-Chin Huang, Lester Phillip Violeta, Teja Rebernik, Sebastiaan A. H. J. de Visscher, Max J. H. Witjes, Martijn Wieling, Defne Abur, Tomoki Toda:
Quantifying the effect of speech pathology on automatic and human speaker verification. CoRR abs/2406.06208 (2024) - [i77]You Zhang, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, Zhiyao Duan:
SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge. CoRR abs/2408.16132 (2024) - [i76]Wen-Chin Huang, Szu-Wei Fu, Erica Cooper, Ryandhimas E. Zezario, Tomoki Toda, Hsin-Min Wang, Junichi Yamagishi, Yu Tsao:
The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction. CoRR abs/2409.07001 (2024) - [i75]Takuya Fujimura, Ibuki Kuroyanagi, Tomoki Toda:
Improvements of Discriminative Feature Space Training for Anomalous Sound Detection in Unlabeled Conditions. CoRR abs/2409.09332 (2024) - [i74]Jinyi Mi, Xiaohan Shi, Ding Ma, Jiajun He, Takuya Fujimura, Tomoki Toda:
Two-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions. CoRR abs/2409.19585 (2024) - [i73]Jinyi Mi, Sehun Kim, Tomoki Toda:
Improved Architecture for High-resolution Piano Transcription to Efficiently Capture Acoustic Characteristics of Music Signals. CoRR abs/2409.19614 (2024) - 2023
- [j65]Keisuke Matsubara, Takuma Okamoto, Ryoichi Takashima, Tetsuya Takiguchi, Tomoki Toda, Hisashi Kawai:
Harmonic-Net: Fundamental Frequency and Speech Rate Controllable Fast Neural Vocoder. IEEE ACM Trans. Audio Speech Lang. Process. 31: 1902-1915 (2023) - [j64]Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda:
High-Fidelity and Pitch-Controllable Neural Vocoder Based on Unified Source-Filter Networks. IEEE ACM Trans. Audio Speech Lang. Process. 31: 3717-3729 (2023) - [j63]Chao Xie, Tomoki Toda:
Noisy-to-Noisy Voice Conversion Under Variations of Noisy Condition. IEEE ACM Trans. Audio Speech Lang. Process. 31: 3871-3882 (2023) - [c332]Wen-Chin Huang, Tomoki Toda:
Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion. APSIPA ASC 2023: 1161-1166 - [c331]Lester Phillip Violeta, Tomoki Toda:
An Analysis of Personalized Speech Recognition System Development for the Deaf and Hard-of-Hearing. APSIPA ASC 2023: 1862-1867 - [c330]Erica Cooper, Wen-Chin Huang, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi:
The Voicemos Challenge 2023: Zero-Shot Subjective Speech Quality Prediction for Multiple Domains. ASRU 2023: 1-7 - [c329]Bence Mark Halpern, Wen-Chin Huang, Lester Phillip Violeta, R. J. J. H. van Son, Tomoki Toda:
Improving Severity Preservation of Healthy-to-Pathological Voice Conversion With Global Style Tokens. ASRU 2023: 1-7 - [c328]Jiajun He, Zekun Yang, Tomoki Toda:
ED-CEC: Improving Rare word Recognition Using ASR Postprocessing Based on Error Detection and Context-Aware Error Correction. ASRU 2023: 1-6 - [c327]Wen-Chin Huang, Lester Phillip Violeta, Songxiang Liu, Jiatong Shi, Tomoki Toda:
The Singing Voice Conversion Challenge 2023. ASRU 2023: 1-8 - [c326]Takuma Okamoto, Haruki Yamashita, Yamato Ohtani, Tomoki Toda, Hisashi Kawai:
WaveNeXt: ConvNeXt-Based Fast Neural Vocoder Without ISTFT layer. ASRU 2023: 1-8 - [c325]Ryuichi Yamamoto, Reo Yoneyama, Lester Phillip Violeta, Wen-Chin Huang, Tomoki Toda:
A Comparative Study of Voice Conversion Models With Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023. ASRU 2023: 1-6 - [c324]Shuming Luan, Yukoh Wakabayashi, Tomoki Toda:
Sound Field Interpolation with Unsupervised Calibration for Freely Spaced Circular Microphone Array in Rotation-Robust Beamforming. EUSIPCO 2023: 21-25 - [c323]Takuya Fujimura, Tomoki Toda:
Analysis Of Noisy-Target Training For Dnn-Based Speech Enhancement. ICASSP 2023: 1-5 - [c322]Kazuhiro Kobayashi, Tomoki Hayashi, Tomoki Toda:
Low-Latency Electrolaryngeal Speech Enhancement Based on Fastspeech2-Based Voice Conversion and Self-Supervised Speech Representation. ICASSP 2023: 1-5 - [c321]Atsushi Miyashita, Tomoki Toda:
Representation of Vocal Tract Length Transformation Based on Group Theory. ICASSP 2023: 1-5 - [c320]Lester Phillip Violeta, Ding Ma, Wen-Chin Huang, Tomoki Toda:
Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition. ICASSP 2023: 1-5 - [c319]Ryuichi Yamamoto, Reo Yoneyama, Tomoki Toda:
NNSVS: A Neural Network-Based Singing Voice Synthesis Toolkit. ICASSP 2023: 1-5 - [c318]Yusuke Yasuda, Tomoki Toda:
Text-To-Speech Synthesis Based on Latent Variable Conversion Using Diffusion Probabilistic Model and Variational Autoencoder. ICASSP 2023: 1-5 - [c317]Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda:
Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural Vocoder. ICASSP 2023: 1-5 - [c316]Cheng-Hung Hu, Yusuke Yasuda, Tomoki Toda:
Preference-based training framework for automatic speech quality assessment using deep neural network. INTERSPEECH 2023: 546-550 - [c315]Xiaohan Shi, Xingfeng Li, Tomoki Toda:
Emotion Awareness in Multi-utterance Turn for Improving Emotion Prediction in Multi-Speaker Conversation. INTERSPEECH 2023: 765-769 - [c314]Takuma Okamoto, Tomoki Toda, Hisashi Kawai:
E2E-S2S-VC: End-To-End Sequence-To-Sequence Voice Conversion. INTERSPEECH 2023: 2043-2047 - [c313]Yeonjong Choi, Chao Xie, Tomoki Toda:
Reverberation-Controllable Voice Conversion Using Reverberation Time Estimator. INTERSPEECH 2023: 2103-2107 - [c312]Yusuke Yasuda, Tomoki Toda:
Analysis of Mean Opinion Scores in Subjective Evaluation of Synthetic Speech Based on Tail Probabilities. INTERSPEECH 2023: 5491-5495 - [c311]Sehun Kim, Kazuya Takeda, Tomoki Toda:
Sequence-to-Sequence Network Training Methods for Automatic Guitar Transcription With Tokenized Outputs. ISMIR 2023: 524-531 - [c310]Jingguang Tian, Desheng Hu, Xiaohan Shi, Jiajun He, Xingfeng Li, Yuan Gao, Tomoki Toda, Xinkang Xu, Xinhui Hu:
Semi-supervised Multimodal Emotion Recognition with Consensus Decision-making and Label Correction. MRAC@MM 2023: 67-73 - [c309]Atsushi Miyashita, Tomoki Toda:
Differentiable Representation of Warping Based on Lie Group Theory. WASPAA 2023: 1-5 - [c308]Rui Wang, Tomoki Toda:
Directional Target Speaker Extraction under Noisy Underdetermined Conditions through Conditional Variational Autoencoder with Global Style Tokens. WASPAA 2023: 1-5 - [i72]Lester Phillip Violeta, Tomoki Toda:
An Analysis of Personalized Speech Recognition System Development for the Deaf and Hard-of-Hearing. CoRR abs/2306.13953 (2023) - [i71]Wen-Chin Huang, Lester Phillip Violeta, Songxiang Liu, Jiatong Shi, Yusuke Yasuda, Tomoki Toda:
The Singing Voice Conversion Challenge 2023. CoRR abs/2306.14422 (2023) - [i70]Wen-Chin Huang, Tomoki Toda:
Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion. CoRR abs/2309.02133 (2023) - [i69]Wen-Chin Huang, Kazuhiro Kobayashi, Tomoki Toda:
AAS-VC: On the Generalization Ability of Automatic Alignment Search based Non-autoregressive Sequence-to-sequence Voice Conversion. CoRR abs/2309.07598 (2023) - [i68]Tatsuya Komatsu, Yusuke Fujita, Kazuya Takeda, Tomoki Toda:
Audio Difference Learning for Audio Captioning. CoRR abs/2309.08141 (2023) - [i67]Lester Phillip Violeta, Wen-Chin Huang, Ding Ma, Ryuichi Yamamoto, Kazuhiro Kobayashi, Tomoki Toda:
Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders. CoRR abs/2309.09627 (2023) - [i66]Bence Mark Halpern, Wen-Chin Huang, Lester Phillip Violeta, R. J. J. H. van Son, Tomoki Toda:
Improving severity preservation of healthy-to-pathological voice conversion with global style tokens. CoRR abs/2310.02570 (2023) - [i65]Jiajun He, Zekun Yang, Tomoki Toda:
ed-cec: improving rare word recognition using asr postprocessing based on error detection and context-aware error correction. CoRR abs/2310.05129 (2023) - [i64]Ryuichi Yamamoto, Reo Yoneyama, Lester Phillip Violeta, Wen-Chin Huang, Tomoki Toda:
A Comparative Study of Voice Conversion Models with Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023. CoRR abs/2310.05203 (2023) - [i63]Xiaohan Shi, Jiajun He, Xingfeng Li, Tomoki Toda:
On the Effectiveness of ASR Representations in Real-world Noisy Speech Emotion Recognition. CoRR abs/2311.07093 (2023) - 2022
- [j62]Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Tomoki Toda:
A Comparative Study of Self-Supervised Speech Representation Based Voice Conversion. IEEE J. Sel. Top. Signal Process. 16(6): 1308-1318 (2022) - [j61]Yusuke Yasuda, Tomoki Toda:
Investigation of Japanese PnG BERT Language Model in Text-to-Speech Synthesis for Pitch Accent Language. IEEE J. Sel. Top. Signal Process. 16(6): 1319-1328 (2022) - [j60]Takuma Okamoto, Keisuke Matsubara, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
Neural speech-rate conversion with multispeaker WaveNet vocoder. Speech Commun. 138: 1-12 (2022) - [c307]Sehun Kim, Tomoki Hayashi, Tomoki Toda:
Note-level Automatic Guitar Transcription Using Attention Mechanism. EUSIPCO 2022: 229-233 - [c306]Ibuki Kuroyanagi, Tomoki Hayashi, Kazuya Takeda, Tomoki Toda:
Improvement of Serial Approach to Anomalous Sound Detection by Incorporating Two Binary Cross-Entropies for Outlier Exposure. EUSIPCO 2022: 294-298 - [c305]Shuming Luan, Yukoh Wakabayashi, Tomoki Toda:
Modified Sound Field Interpolation Method for Rotation-robust Beamforming with Unequally Spaced Circular Microphone Array. EUSIPCO 2022: 344-348 - [c304]Wen-Chin Huang, Erica Cooper, Junichi Yamagishi, Tomoki Toda:
LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech. ICASSP 2022: 896-900 - [c303]Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe, Tomoki Toda:
S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations. ICASSP 2022: 6552-6556 - [c302]Wen-Chin Huang, Bence Mark Halpern, Lester Phillip Violeta, Odette Scharenborg, Tomoki Toda:
Towards Identity Preserving Normal to Dysarthric Voice Conversion. ICASSP 2022: 6672-6676 - [c301]Chao Xie, Yi-Chiao Wu, Patrick Lumban Tobing, Wen-Chin Huang, Tomoki Toda:
Direct Noisy Speech Modeling for Noisy-To-Noisy Voice Conversion. ICASSP 2022: 6787-6791 - [c300]Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda:
An Investigation of Streaming Non-Autoregressive sequence-to-sequence Voice Conversion. ICASSP 2022: 6802-6806 - [c299]Erica Cooper, Wen-Chin Huang, Tomoki Toda, Junichi Yamagishi:
Generalization Ability of MOS Prediction Networks. ICASSP 2022: 8442-8446 - [c298]Lester Phillip Violeta, Wen-Chin Huang, Tomoki Toda:
Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition. INTERSPEECH 2022: 41-45 - [c297]Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda:
Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation. INTERSPEECH 2022: 848-852 - [c296]Wen-Chin Huang, Erica Cooper, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi:
The VoiceMOS Challenge 2022. INTERSPEECH 2022: 4536-4540 - [c295]Daiki Yoshioka, Yusuke Yasuda, Noriyuki Matsunaga, Yamato Ohtani, Tomoki Toda:
Spoken-Text-Style Transfer with Conditional Variational Autoencoder and Content Word Storage. INTERSPEECH 2022: 4576-4580 - [c294]Yeonjong Choi, Chao Xie, Tomoki Toda:
An Evaluation of Three-Stage Voice Conversion Framework for Noisy and Reverberant Conditions. INTERSPEECH 2022: 4910-4914 - [c293]Ding Ma, Lester Phillip Violeta, Kazuhiro Kobayashi, Tomoki Toda:
Two-Stage Training Method for Japanese Electrolaryngeal Speech Enhancement Based on Sequence-to-Sequence Voice Conversion. SLT 2022: 949-954 - [i62]Wen-Chin Huang, Erica Cooper, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi:
The VoiceMOS Challenge 2022. CoRR abs/2203.11389 (2022) - [i61]Lester Phillip Violeta, Wen-Chin Huang, Tomoki Toda:
Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition. CoRR abs/2203.15431 (2022) - [i60]Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda:
Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation. CoRR abs/2205.06053 (2022) - [i59]Ibuki Kuroyanagi, Tomoki Hayashi, Kazuya Takeda, Tomoki Toda:
Improvement of Serial Approach to Anomalous Sound Detection by Incorporating Two Binary Cross-Entropies for Outlier Exposure. CoRR abs/2206.05929 (2022) - [i58]Yeonjong Choi, Chao Xie, Tomoki Toda:
An Evaluation of Three-Stage Voice Conversion Framework for Noisy and Reverberant Conditions. CoRR abs/2206.15155 (2022) - [i57]Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Tomoki Toda:
A Comparative Study of Self-supervised Speech Representation Based Voice Conversion. CoRR abs/2207.04356 (2022) - [i56]Yi-Chiao Wu, Patrick Lumban Tobing, Kazuki Yasuhara, Noriyuki Matsunaga, Yamato Ohtani, Tomoki Toda:
A Cyclical Approach to Synthetic and Natural Speech Mismatch Refinement of Neural Post-filter for Low-cost Text-to-speech System. CoRR abs/2207.05913 (2022) - [i55]Ding Ma, Lester Phillip Violeta, Kazuhiro Kobayashi, Tomoki Toda:
Two-stage training method for Japanese electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion. CoRR abs/2210.10314 (2022) - [i54]Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda:
Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural Vocoder. CoRR abs/2210.15533 (2022) - [i53]Ryuichi Yamamoto, Reo Yoneyama, Tomoki Toda:
NNSVS: A Neural Network-Based Singing Voice Synthesis Toolkit. CoRR abs/2210.15987 (2022) - [i52]Lester Phillip Violeta, Ding Ma, Wen-Chin Huang, Tomoki Toda:
Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition. CoRR abs/2211.01079 (2022) - [i51]Takuya Fujimura, Tomoki Toda:
Analysis of Noisy-target Training for DNN-based speech enhancement. CoRR abs/2211.01198 (2022) - [i50]Yuka Hashizume, Li Li, Tomoki Toda:
Music Similarity Calculation of Individual Instrumental Sounds Using Metric Learning. CoRR abs/2211.07863 (2022) - [i49]Yusuke Yasuda, Tomoki Toda:
Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language. CoRR abs/2212.08321 (2022) - [i48]Yusuke Yasuda, Tomoki Toda:
Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder. CoRR abs/2212.08329 (2022) - 2021
- [j59]Keisuke Matsubara, Takuma Okamoto, Ryoichi Takashima, Tetsuya Takiguchi, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio With a CPU. IEEE Access 9: 94923-94933 (2021) - [j58]Hirokazu Kameoka, Wen-Chin Huang, Kou Tanaka, Takuhiro Kaneko, Nobukatsu Hojo, Tomoki Toda:
Many-to-Many Voice Transformer Network. IEEE ACM Trans. Audio Speech Lang. Process. 29: 656-670 (2021) - [j57]Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda:
Pretraining Techniques for Sequence-to-Sequence Voice Conversion. IEEE ACM Trans. Audio Speech Lang. Process. 29: 745-755 (2021) - [j56]Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda:
Quasi-Periodic Parallel WaveGAN: A Non-Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network. IEEE ACM Trans. Audio Speech Lang. Process. 29: 792-806 (2021) - [j55]Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda:
Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network. IEEE ACM Trans. Audio Speech Lang. Process. 29: 1134-1148 (2021) - [c292]Zhaopeng Qian, Haijun Niu, Li Wang, Kazuhiro Kobayashi, Shaochuan Zhang, Tomoki Toda:
Mandarin Electro-Laryngeal Speech Enhancement based on Statistical Voice Conversion and Manual Tone Control. APSIPA ASC 2021: 546-552 - [c291]Chao Xie, Yi-Chiao Wu, Patrick Lumban Tobing, Wen-Chin Huang, Tomoki Toda:
Noisy-to-Noisy Voice Conversion Framework with Denoising Model. APSIPA ASC 2021: 814-820 - [c290]Ding Ma, Wen-Chin Huang, Tomoki Toda:
Investigation of Text-to-Speech-based Synthetic Parallel Data for Sequence-to-Sequence Non-Parallel Voice Conversion. APSIPA ASC 2021: 870-877 - [c289]Yi-Syuan Liou, Wen-Chin Huang, Ming-Chi Yen, Shu-Wei Tsai, Yu-Huai Peng, Tomoki Toda, Yu Tsao, Hsin-Min Wang:
Time Alignment using Lip Images for Frame-based Electrolaryngeal Voice Conversion. APSIPA ASC 2021: 1234-1238 - [c288]Takuma Okamoto, Tomoki Toda, Hisashi Kawai:
Multi-Stream HiFi-GAN with Data-Driven Waveform Decomposition. ASRU 2021: 610-617 - [c287]Wen-Chin Huang, Tomoki Hayashi, Xinjian Li, Shinji Watanabe, Tomoki Toda:
On Prosody Modeling for ASR+TTS Based Voice Conversion. ASRU 2021: 642-649 - [c286]Ming-Chi Yen, Wen-Chin Huang, Kazuhiro Kobayashi, Yu-Huai Peng, Shu-Wei Tsai, Yu Tsao, Tomoki Toda, Jyh-Shing Roger Jang, Hsin-Min Wang:
Mandarin Electrolaryngeal Speech Voice Conversion with Sequence-to-Sequence Modeling. ASRU 2021: 650-657 - [c285]Hsin-Tien Chiang, Yi-Chiao Wu, Cheng Yu, Tomoki Toda, Hsin-Min Wang, Yih-Chun Hu, Yu Tsao:
HASA-Net: A Non-Intrusive Hearing-Aid Speech Assessment Network. ASRU 2021: 907-913 - [c284]Ibuki Kuroyanagi, Tomoki Hayashi, Yusuke Adachi, Takenori Yoshimura, Kazuya Takeda, Tomoki Toda:
An Ensemble Approach to Anomalous Sound Detection Based on Conformer-Based Autoencoder and Binary Classifier Incorporated with Metric Learning. DCASE 2021: 110-114 - [c283]Ibuki Kuroyanagi, Tomoki Hayashi, Kazuya Takeda, Tomoki Toda:
Anomalous Sound Detection Using a Binary Classification Model and Class Centroids. EUSIPCO 2021: 1995-1999 - [c282]Kazuhiro Kobayashi, Wen-Chin Huang, Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Tomoki Toda:
Crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder. ICASSP 2021: 5934-5938 - [c281]Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
Noise Level Limited Sub-Modeling for Diffusion Probabilistic Vocoders. ICASSP 2021: 6029-6033 - [c280]Atsushi Ando, Ryo Masumura, Hiroshi Sato, Takafumi Moriya, Takanori Ashihara, Yusuke Ijima, Tomoki Toda:
Speech Emotion Recognition Based on Listener Adaptive Models. ICASSP 2021: 6274-6278 - [c279]Keisuke Matsubara, Takuma Okamoto, Ryoichi Takashima, Tetsuya Takiguchi, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
High-Intelligibility Speech Synthesis for Dysarthric Speakers with LPCNet-Based TTS and CycleVAE-Based VC. ICASSP 2021: 7058-7062 - [c278]Tomoki Hayashi, Wen-Chin Huang, Kazuhiro Kobayashi, Tomoki Toda:
Non-Autoregressive Sequence-To-Sequence Voice Conversion. ICASSP 2021: 7068-7072 - [c277]Wen-Chin Huang, Chia-Hua Wu, Shang-Bao Luo, Kuan-Yu Chen, Hsin-Min Wang, Tomoki Toda:
Speech Recognition by Simply Fine-Tuning Bert. ICASSP 2021: 7343-7347 - [c276]Wen-Chin Huang, Kazuhiro Kobayashi, Yu-Huai Peng, Ching-Feng Liu, Yu Tsao, Hsin-Min Wang, Tomoki Toda:
A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion. Interspeech 2021: 1329-1333 - [c275]Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda:
Unified Source-Filter GAN: Unified Source-Filter Network Based On Factorization of Quasi-Periodic Parallel WaveGAN. Interspeech 2021: 2187-2191 - [c274]Patrick Lumban Tobing, Tomoki Toda:
High-Fidelity and Low-Latency Universal Neural Vocoder Based on Multiband WaveRNN with Data-Driven Linear Prediction for Discrete Waveform Modeling. Interspeech 2021: 2217-2221 - [c273]Yi-Chiao Wu, Cheng-Hung Hu, Hung-Shin Lee, Yu-Huai Peng, Wen-Chin Huang, Yu Tsao, Hsin-Min Wang, Tomoki Toda:
Relational Data Selection for Data Augmentation of Speaker-Dependent Multi-Band MelGAN Vocoder. Interspeech 2021: 3630-3634 - [c272]Shogo Seki, Haruka Taga, Tomoki Toda:
Singing Fundamental Frequency Contour Generation Using Generalized Command-Response Model and Score-Conditional Variational Autoencoder. MLSP 2021: 1-3 - [c271]Patrick Lumban Tobing, Tomoki Toda:
Low-latency real-time non-parallel voice conversion based on cyclic variational autoencoder and multiband WaveRNN with data-driven linear prediction. SSW 2021: 142-147 - [i47]Wen-Chin Huang, Chia-Hua Wu, Shang-Bao Luo, Kuan-Yu Chen, Hsin-Min Wang, Tomoki Toda:
Speech Recognition by Simply Fine-tuning BERT. CoRR abs/2102.00291 (2021) - [i46]Kazuhiro Kobayashi, Wen-Chin Huang, Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Tomoki Toda:
crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder. CoRR abs/2103.02858 (2021) - [i45]Cheng-Hung Hu, Yi-Chiao Wu, Wen-Chin Huang, Yu-Huai Peng, Yu-Wen Chen, Pin-Jui Ku, Tomoki Toda, Yu Tsao, Hsin-Min Wang:
The AS-NU System for the M2VoC Challenge. CoRR abs/2104.03009 (2021) - [i44]Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda:
Unified Source-Filter GAN: Unified Source-filter Network Based On Factorization of Quasi-Periodic Parallel WaveGAN. CoRR abs/2104.04668 (2021) - [i43]Tomoki Hayashi, Wen-Chin Huang, Kazuhiro Kobayashi, Tomoki Toda:
Non-autoregressive sequence-to-sequence voice conversion. CoRR abs/2104.06793 (2021) - [i42]Patrick Lumban Tobing, Tomoki Toda:
High-Fidelity and Low-Latency Universal Neural Vocoder based on Multiband WaveRNN with Data-Driven Linear Prediction for Discrete Waveform Modeling. CoRR abs/2105.09856 (2021) - [i41]Patrick Lumban Tobing, Tomoki Toda:
Low-Latency Real-Time Non-Parallel Voice Conversion based on Cyclic Variational Autoencoder and Multiband WaveRNN with Data-Driven Linear Prediction. CoRR abs/2105.09858 (2021) - [i40]Wen-Chin Huang, Kazuhiro Kobayashi, Yu-Huai Peng, Ching-Feng Liu, Yu Tsao, Hsin-Min Wang, Tomoki Toda:
A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion. CoRR abs/2106.01415 (2021) - [i39]Ibuki Kuroyanagi, Tomoki Hayashi, Kazuya Takeda, Tomoki Toda:
Anomalous Sound Detection Using a Binary Classification Model and Class Centroids. CoRR abs/2106.06151 (2021) - [i38]Wen-Chin Huang, Tomoki Hayashi, Xinjian Li, Shinji Watanabe, Tomoki Toda:
On Prosody Modeling for ASR+TTS based Voice Conversion. CoRR abs/2107.09477 (2021) - [i37]Yi-Syuan Liou, Wen-Chin Huang, Ming-Chi Yen, Shu-Wei Tsai, Yu-Huai Peng, Tomoki Toda, Yu Tsao, Hsin-Min Wang:
Time Alignment using Lip Images for Frame-based Electrolaryngeal Voice Conversion. CoRR abs/2109.03551 (2021) - [i36]Chao Xie, Yi-Chiao Wu, Patrick Lumban Tobing, Wen-Chin Huang, Tomoki Toda:
Noisy-to-Noisy Voice Conversion Framework with Denoising Model. CoRR abs/2109.10608 (2021) - [i35]Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe, Tomoki Toda:
S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations. CoRR abs/2110.06280 (2021) - [i34]Wen-Chin Huang, Bence Mark Halpern, Lester Phillip Violeta, Odette Scharenborg, Tomoki Toda:
Towards Identity Preserving Normal to Dysarthric Voice Conversion. CoRR abs/2110.08213 (2021) - [i33]Wen-Chin Huang, Erica Cooper, Junichi Yamagishi, Tomoki Toda:
LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech. CoRR abs/2110.09103 (2021) - [i32]Hsin-Tien Chiang, Yi-Chiao Wu, Cheng Yu, Tomoki Toda, Hsin-Min Wang, Yih-Chun Hu, Yu Tsao:
HASA-net: A non-intrusive hearing-aid speech assessment network. CoRR abs/2111.05691 (2021) - [i31]Chao Xie, Yi-Chiao Wu, Patrick Lumban Tobing, Wen-Chin Huang, Tomoki Toda:
Direct Noisy Speech Modeling for Noisy-to-Noisy Voice Conversion. CoRR abs/2111.07116 (2021) - 2020
- [j54]Yi-Chiao Wu, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Hayashi, Tomoki Toda:
Non-Parallel Voice Conversion System With WaveNet Vocoder and Collapsed Speech Suppression. IEEE Access 8: 62094-62106 (2020) - [j53]Atsushi Ando, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa, Yushi Aono, Tomoki Toda:
Customer Satisfaction Estimation in Contact Center Calls Based on a Hierarchical Multi-Task Model. IEEE ACM Trans. Audio Speech Lang. Process. 28: 715-728 (2020) - [c270]Hikaru Nakatani, Patrick Lumban Tobing, Kazuya Takeda, Tomoki Toda:
Cross-Lingual Voice Conversion using a Cyclic Variational Auto-encoder and a WaveNet Vocoder. APSIPA 2020: 520-526 - [c269]Mohammad Eshghi, Kazuhiro Kobayashi, Kou Tanaka, Hirokazu Kameoka, Tomoki Toda:
Phoneme Embeddings on Predicting Fundamental Frequency Pattern for Electrolaryngeal Speech. APSIPA 2020: 572-577 - [c268]Yi Zhao, Wen-Chin Huang, Xiaohai Tian, Junichi Yamagishi, Rohan Kumar Das, Tomi Kinnunen, Zhen-Hua Ling, Tomoki Toda:
Voice Conversion Challenge 2020 -- Intra-lingual semi-parallel and cross-lingual voice conversion --. Blizzard Challenge / Voice Conversion Challenge 2020 - [c267]Rohan Kumar Das, Tomi Kinnunen, Wen-Chin Huang, Zhen-Hua Ling, Junichi Yamagishi, Yi Zhao, Xiaohai Tian, Tomoki Toda:
Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions. Blizzard Challenge / Voice Conversion Challenge 2020 - [c266]Wen-Chin Huang, Tomoki Hayashi, Shinji Watanabe, Tomoki Toda:
The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS. Blizzard Challenge / Voice Conversion Challenge 2020 - [c265]Wen-Chin Huang, Patrick Lumban Tobing, Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Toda:
The NU Voice Conversion System for the Voice Conversion Challenge 2020: On the Effectiveness of Sequence-to-sequence Models and Autoregressive Neural Vocoders. Blizzard Challenge / Voice Conversion Challenge 2020 - [c264]Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Toda:
Baseline System of Voice Conversion Challenge 2020 with Cyclic Variational Autoencoder and Parallel WaveGAN. Blizzard Challenge / Voice Conversion Challenge 2020 - [c263]Koichi Miyazaki, Tatsuya Komatsu, Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Kazuya Takeda:
Conformer-Based Sound Event Detection with Semi-Supervised Learning and Data Augmentation. DCASE 2020: 100-104 - [c262]Kazuhiro Kobayashi, Tomoki Toda:
Implementation of low-latency electrolaryngeal speech enhancement based on multi-task CLDNN. EUSIPCO 2020: 396-400 - [c261]Moe Takada, Shogo Seki, Patrick Lumban Tobing, Tomoki Toda:
Semi-Supervised Enhancement and Suppression of Self-Produced Speech Using Correspondence between Air- and Body-Conducted Signals. EUSIPCO 2020: 456-460 - [c260]Koichi Miyazaki, Tatsuya Komatsu, Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Kazuya Takeda:
Weakly-Supervised Sound Event Detection with Self-Attention. ICASSP 2020: 66-70 - [c259]Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
Transformer-Based Text-to-Speech with Weighted Forced Attention. ICASSP 2020: 6729-6733 - [c258]Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda:
Efficient Shallow Wavenet Vocoder Using Multiple Samples Output Based on Laplacian Distribution and Linear Prediction. ICASSP 2020: 7204-7208 - [c257]Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, Takenori Yoshimura, Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Yu Zhang, Xu Tan:
Espnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit. ICASSP 2020: 7654-7658 - [c256]Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda:
Quasi-Periodic Parallel WaveGAN Vocoder: A Non-Autoregressive Pitch-Dependent Dilated Convolution Model for Parametric Speech Generation. INTERSPEECH 2020: 3535-3539 - [c255]Yi-Chiao Wu, Patrick Lumban Tobing, Kazuki Yasuhara, Noriyuki Matsunaga, Yamato Ohtani, Tomoki Toda:
A Cyclical Post-Filtering Approach to Mismatch Refinement of Neural Vocoder for Text-to-Speech Systems. INTERSPEECH 2020: 3540-3544 - [c254]Shogo Seki, Moe Takada, Tomoki Toda:
Semi-Supervised Self-Produced Speech Enhancement and Suppression Based on Joint Source Modeling of Air- and Body-Conducted Signals Using Variational Autoencoder. INTERSPEECH 2020: 4039-4043 - [c253]Shu Hikosaka, Shogo Seki, Tomoki Hayashi, Kazuhiro Kobayashi, Kazuya Takeda, Hideki Banno, Tomoki Toda:
Intelligibility Enhancement Based on Speech Waveform Modification Using Hearing Impairment. INTERSPEECH 2020: 4059-4063 - [c252]Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda:
Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining. INTERSPEECH 2020: 4676-4680 - [c251]Patrick Lumban Tobing, Tomoki Hayashi, Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Toda:
Cyclic Spectral Modeling for Unsupervised Unit Discovery into Voice Conversion with Excitation and Waveform Modeling. INTERSPEECH 2020: 4861-4865 - [e1]Junichi Yamagishi, Zhenhua Ling, Rohan Kumar Das, Simon King, Tomi Kinnunen, Tomoki Toda, Wen-Chin Huang, Xiao Zhou, Xiaohai Tian, Yi Zhao:
Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, Shanghai, China, October 30, 2020. ISCA 2020 [contents] - [i30]Yi-Chiao Wu, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Hayashi, Tomoki Toda:
Non-parallel Voice Conversion System with WaveNet Vocoder and Collapsed Speech Suppression. CoRR abs/2003.11750 (2020) - [i29]Hirokazu Kameoka, Wen-Chin Huang, Kou Tanaka, Takuhiro Kaneko, Nobukatsu Hojo, Tomoki Toda:
Many-to-Many Voice Transformer Network. CoRR abs/2005.08445 (2020) - [i28]Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda:
Quasi-Periodic Parallel WaveGAN Vocoder: A Non-autoregressive Pitch-dependent Dilated Convolution Model for Parametric Speech Generation. CoRR abs/2005.08654 (2020) - [i27]Yi-Chiao Wu, Patrick Lumban Tobing, Kazuki Yasuhara, Noriyuki Matsunaga, Yamato Ohtani, Tomoki Toda:
A Cyclical Post-filtering Approach to Mismatch Refinement of Neural Vocoder for Text-to-speech Systems. CoRR abs/2005.08659 (2020) - [i26]Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda:
Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model with Pitch-dependent Dilated Convolution Neural Network. CoRR abs/2007.05663 (2020) - [i25]Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda:
Quasi-Periodic Parallel WaveGAN: A Non-autoregressive Raw Waveform Generative Model with Pitch-dependent Dilated Convolution Neural Network. CoRR abs/2007.12955 (2020) - [i24]Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda:
Pretraining Techniques for Sequence-to-Sequence Voice Conversion. CoRR abs/2008.03088 (2020) - [i23]Yi Zhao, Wen-Chin Huang, Xiaohai Tian, Junichi Yamagishi, Rohan Kumar Das, Tomi Kinnunen, Zhen-Hua Ling, Tomoki Toda:
Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion. CoRR abs/2008.12527 (2020) - [i22]Rohan Kumar Das, Tomi Kinnunen, Wen-Chin Huang, Zhen-Hua Ling, Junichi Yamagishi, Yi Zhao, Xiaohai Tian, Tomoki Toda:
Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions. CoRR abs/2009.03554 (2020) - [i21]Wen-Chin Huang, Tomoki Hayashi, Shinji Watanabe, Tomoki Toda:
The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS. CoRR abs/2010.02434 (2020) - [i20]Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Toda:
Baseline System of Voice Conversion Challenge 2020 with Cyclic Variational Autoencoder and Parallel WaveGAN. CoRR abs/2010.04429 (2020) - [i19]Wen-Chin Huang, Patrick Lumban Tobing, Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Toda:
The NU Voice Conversion System for the Voice Conversion Challenge 2020: On the Effectiveness of Sequence-to-sequence Models and Autoregressive Neural Vocoders. CoRR abs/2010.04446 (2020) - [i18]Wen-Chin Huang, Yi-Chiao Wu, Tomoki Hayashi, Tomoki Toda:
Any-to-One Sequence-to-Sequence Voice Conversion using Self-Supervised Discrete Speech Representations. CoRR abs/2010.12231 (2020)
2010 – 2019
- 2019
- [j52]Shogo Seki, Hirokazu Kameoka, Li Li, Tomoki Toda, Kazuya Takeda:
Underdetermined Source Separation Based on Generalized Multichannel Variational Autoencoder. IEEE Access 7: 168104-168115 (2019) - [j51]Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda:
Voice Conversion With CycleRNN-Based Spectral Mapping and Finely Tuned WaveNet Vocoder. IEEE Access 7: 171114-171125 (2019) - [j50]Karthika Vijayan, Haizhou Li, Tomoki Toda:
Speech-to-Singing Voice Conversion: The Challenges and Strategies for Improving Vocal Conversion Processes. IEEE Signal Process. Mag. 36(1): 95-102 (2019) - [c250]Patrick Lumban Tobing, Tomoki Hayashi, Tomoki Toda:
Investigation of Shallow Wavenet Vocoder with Laplacian Distribution Output. ASRU 2019: 176-183 - [c249]Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
Tacotron-Based Acoustic Model Using Phoneme Alignment for Practical Neural Text-to-Speech Systems. ASRU 2019: 214-221 - [c248]Farzaneh Ahmadi, Kazuhiro Kobayashi, Tomoki Toda:
Development of a Real-time Bionic Voice Generation System based on Statistical Excitation Prediction. ASSETS 2019: 655-657 - [c247]Wen-Chin Huang, Yi-Chiao Wu, Hsin-Te Hwang, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang:
Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion. EUSIPCO 2019: 1-5 - [c246]Shogo Seki, Hirokazu Kameoka, Li Li, Tomoki Toda, Kazuya Takeda:
Generalized Multichannel Variational Autoencoder for Underdetermined Source Separation. EUSIPCO 2019: 1-5 - [c245]Tatsuya Komatsu, Tomoki Hayashi, Reishi Kondo, Tomoki Toda, Kazuya Takeda:
Scene-dependent Anomalous Acoustic-event Detection Based on Conditional Wavenet and I-vector. ICASSP 2019: 870-874 - [c244]Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda:
Voice Conversion with Cyclic Recurrent Neural Network and Fine-tuned Wavenet Vocoder. ICASSP 2019: 6815-6819 - [c243]Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
Investigations of Real-time Gaussian Fftnet and Parallel Wavenet Neural Vocoders with Simple Acoustic Features. ICASSP 2019: 7020-7024 - [c242]Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda:
Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation. INTERSPEECH 2019: 196-200 - [c241]Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda:
Non-Parallel Voice Conversion with Cyclic Variational Autoencoder. INTERSPEECH 2019: 674-678 - [c240]Yusuke Kurita, Kazuhiro Kobayashi, Kazuya Takeda, Tomoki Toda:
Robustness of Statistical Voice Conversion Based on Direct Waveform Modification Against Background Sounds. INTERSPEECH 2019: 684-688 - [c239]Wen-Chin Huang, Yi-Chiao Wu, Chen-Chou Lo, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang:
Investigation of F0 Conditioning and Fully Convolutional Networks in Variational Autoencoder Based Voice Conversion. INTERSPEECH 2019: 709-713 - [c238]Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
Real-Time Neural Text-to-Speech with Sequence-to-Sequence Acoustic Model and WaveGlow or Single Gaussian WaveRNN Vocoders. INTERSPEECH 2019: 1308-1312 - [c237]Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Shubham Toshniwal, Karen Livescu:
Pre-Trained Text Embeddings for Enhanced Text-to-Speech Synthesis. INTERSPEECH 2019: 4430-4434 - [c236]Li Li, Tomoki Toda, Kazuho Morikawa, Kazuhiro Kobayashi, Shoji Makino:
Improving Singing Aid System for Laryngectomees With Statistical Voice Conversion and VAE-SPACE. ISMIR 2019: 784-790 - [c235]Wen-Chin Huang, Yi-Chiao Wu, Kazuhiro Kobayashi, Yu-Huai Peng, Hsin-Te Hwang, Patrick Lumban Tobing, Yu Tsao, Hsin-Min Wang, Tomoki Toda:
Generalization of Spectrum Differential based Direct Waveform Modification for Voice Conversion. SSW 2019: 57-62 - [c234]Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda:
Statistical Voice Conversion with Quasi-periodic WaveNet Vocoder. SSW 2019: 63-68 - [c233]Mohammad Eshghi, Kou Tanaka, Kazuhiro Kobayashi, Hirokazu Kameoka, Tomoki Toda:
An Investigation of Features for Fundamental Frequency Pattern Prediction in Electrolaryngeal Speech Enhancement. SSW 2019: 251-256 - [i17]Wen-Chin Huang, Yi-Chiao Wu, Chen-Chou Lo, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang:
Investigation of F0 conditioning and Fully Convolutional Networks in Variational Autoencoder based Voice Conversion. CoRR abs/1905.00615 (2019) - [i16]Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda:
Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation. CoRR abs/1907.00797 (2019) - [i15]Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda:
Statistical Voice Conversion with Quasi-Periodic WaveNet Vocoder. CoRR abs/1907.08940 (2019) - [i14]Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda:
Non-Parallel Voice Conversion with Cyclic Variational Autoencoder. CoRR abs/1907.10185 (2019) - [i13]Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, Takenori Yoshimura, Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Yu Zhang, Xu Tan:
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit. CoRR abs/1910.10909 (2019) - [i12]Xin Wang, Junichi Yamagishi, Massimiliano Todisco, Héctor Delgado, Andreas Nautsch, Nicholas W. D. Evans, Md. Sahidullah, Ville Vestman, Tomi Kinnunen, Kong Aik Lee, Lauri Juvela, Paavo Alku, Yu-Huai Peng, Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Sébastien Le Maguer, Markus Becker, Fergus Henderson, Rob Clark, Yu Zhang, Quan Wang, Ye Jia, Kai Onuma, Koji Mushika, Takashi Kaneda, Yuan Jiang, Li-Juan Liu, Yi-Chiao Wu, Wen-Chin Huang, Tomoki Toda, Kou Tanaka, Hirokazu Kameoka, Ingmar Steiner, Driss Matrouf, Jean-François Bonastre, Avashna Govender, Srikanth Ronanki, Jing-Xuan Zhang, Zhen-Hua Ling:
The ASVspoof 2019 database. CoRR abs/1911.01601 (2019) - [i11]Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda:
Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining. CoRR abs/1912.06813 (2019) - 2018
- [j49]Tomoki Hayashi, Masafumi Nishida, Norihide Kitaoka, Tomoki Toda, Kazuya Takeda:
Daily Activity Recognition with Large-Scaled Real-Life Recording Datasets Based on Deep Neural Network Using Multi-Modal Signals. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 101-A(1): 199-210 (2018) - [j48]Shogo Seki, Tomoki Toda, Kazuya Takeda:
Stereophonic Music Separation Based on Non-Negative Tensor Factorization with Cepstral Distance Regularization. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 101-A(7): 1057-1064 (2018) - [j47]Takatomo Kano, Shinnosuke Takamichi, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
An end-to-end model for cross-lingual transformation of paralinguistic information. Mach. Transl. 32(4): 353-368 (2018) - [j46]Kazuhiro Kobayashi, Tomoki Toda, Satoshi Nakamura:
Intra-gender statistical singing voice conversion with direct waveform modification using log-spectral differential. Speech Commun. 99: 211-220 (2018) - [c232]Moe Takada, Shogo Seki, Tomoki Toda:
Self-Produced Speech Enhancement and Suppression Method using Air- and Body-Conductive Microphones. APSIPA 2018: 1240-1245 - [c231]Shunya Seiya, Ryuya Ito, Kosuke Okamoto, Ukyo Tanikawa, Shigeki Ohira, Daisuke Deguchi, Tomoki Toda:
Development of "KamiRepo" system with automatic student identification to handle handwritten assignments on LMS. EDUCON 2018: 835-842 - [c230]Koichi Miyazaki, Tomoki Hayashi, Tomoki Toda, Kazuya Takeda:
Connectionist Temporal Classification-based Sound Event Encoder for Converting Sound Events into Onomatopoeic Representations. EUSIPCO 2018: 852-856 - [c229]Kazuhiro Kobayashi, Tomoki Toda:
Electrolaryngeal Speech Enhancement with Statistical Voice Conversion based on CLDNN. EUSIPCO 2018: 2115-2119 - [c228]Tomoki Hayashi, Tatsuya Komatsu, Reishi Kondo, Tomoki Toda, Kazuya Takeda:
Anomalous Sound Event Detection Based on WaveNet. EUSIPCO 2018: 2494-2498 - [c227]Takuma Okamoto, Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
An Investigation of Subband Wavenet Vocoder Covering Entire Audible Frequency Range with Limited Acoustic Features. ICASSP 2018: 5654-5658 - [c226]Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
An Investigation of Noise Shaping with Perceptual Weighting for Wavenet-Based Speech Generation. ICASSP 2018: 5664-5668 - [c225]Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Kazuya Takeda:
Multi-Head Decoder for End-to-End Speech Recognition. INTERSPEECH 2018: 801-805 - [c224]Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Hayashi, Patrick Lumban Tobing, Tomoki Toda:
Collapsed Speech Segment Detection and Suppression for WaveNet Vocoder. INTERSPEECH 2018: 1988-1992 - [c223]Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda, Toshio Irino:
Frequency Domain Variants of Velvet Noise and Their Application to Speech Processing and Synthesis. INTERSPEECH 2018: 2027-2031 - [c222]Satoshi Tamura, Kento Horio, Hajime Endo, Satoru Hayamizu, Tomoki Toda:
Audio-visual Voice Conversion Using Deep Canonical Correlation Analysis for Deep Bottleneck Features. INTERSPEECH 2018: 2469-2473 - [c221]Farzaneh Ahmadi, Tomoki Toda:
Designing a Pneumatic Bionic Voice Prosthesis - A Statistical Approach for Source Excitation Generation. INTERSPEECH 2018: 3142-3146 - [c220]Tomi Kinnunen, Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Zhen-Hua Ling:
A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment. Odyssey 2018: 187-194 - [c219]Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Tomi Kinnunen, Zhen-Hua Ling:
The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods. Odyssey 2018: 195-202 - [c218]Kazuhiro Kobayashi, Tomoki Toda:
sprocket: Open-Source Voice Conversion Software. Odyssey 2018: 203-210 - [c217]Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda:
The NU Non-Parallel Voice Conversion System for the Voice Conversion Challenge 2018. Odyssey 2018: 211-218 - [c216]Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda:
NU Voice Conversion System for the Voice Conversion Challenge 2018. Odyssey 2018: 219-226 - [c215]Patrick Lumban Tobing, Tomoki Hayashi, Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Toda:
An Evaluation of Deep Spectral Mappings and WaveNet Vocoder for Voice Conversion. SLT 2018: 297-303 - [c214]Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
Improving FFTNet Vocoder with Noise Shaping and Subband Approaches. SLT 2018: 304-311 - [c213]Tomoki Hayashi, Shinji Watanabe, Yu Zhang, Tomoki Toda, Takaaki Hori, Ramón Fernandez Astudillo, Kazuya Takeda:
Back-Translation-Style Data Augmentation for end-to-end ASR. SLT 2018: 426-433 - [i10]Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Tomi Kinnunen, Zhen-Hua Ling:
The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods. CoRR abs/1804.04262 (2018) - [i9]Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Kazuya Takeda:
Multi-Head Decoder for End-to-End Speech Recognition. CoRR abs/1804.08050 (2018) - [i8]Tomi Kinnunen, Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Zhen-Hua Ling:
A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment. CoRR abs/1804.08438 (2018) - [i7]Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Hayashi, Patrick Lumban Tobing, Tomoki Toda:
Collapsed speech segment detection and suppression for WaveNet vocoder. CoRR abs/1804.11055 (2018) - [i6]Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda, Toshio Irino:
Frequency domain variants of velvet noise and their application to speech processing and synthesis: with appendices. CoRR abs/1806.06812 (2018) - [i5]Tomoki Hayashi, Shinji Watanabe, Yu Zhang, Tomoki Toda, Takaaki Hori, Ramón Fernandez Astudillo, Kazuya Takeda:
Back-Translation-Style Data Augmentation for End-to-End ASR. CoRR abs/1807.10893 (2018) - [i4]Shogo Seki, Hirokazu Kameoka, Li Li, Tomoki Toda, Kazuya Takeda:
Generalized Multichannel Variational Autoencoder for Underdetermined Source Separation. CoRR abs/1810.00223 (2018) - [i3]Wen-Chin Huang, Yi-Chiao Wu, Hsin-Te Hwang, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang:
Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion. CoRR abs/1811.11078 (2018) - 2017
- [j45]Kou Tanaka, Tomoki Toda, Satoshi Nakamura:
A Vibration Control Method of an Electrolarynx Based on Statistical F0 Pattern Prediction. IEICE Trans. Inf. Syst. 100-D(9): 2165-2173 (2017) - [j44]Quoc Truong Do, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Preserving Word-Level Emphasis in Speech-to-Speech Translation. IEEE ACM Trans. Audio Speech Lang. Process. 25(3): 544-556 (2017) - [j43]Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Takaaki Hori, Jonathan Le Roux, Kazuya Takeda:
Duration-Controlled LSTM for Polyphonic Sound Event Detection. IEEE ACM Trans. Audio Speech Lang. Process. 25(11): 2059-2070 (2017) - [j42]Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda:
Articulatory Controllable Speech Modification Based on Statistical Inversion and Production Mappings. IEEE ACM Trans. Audio Speech Lang. Process. 25(12): 2337-2350 (2017) - [c212]Kazuho Morikawa, Tomoki Toda:
Electrolaryngeal speech modification towards singing aid system for laryngectomees. APSIPA 2017: 610-613 - [c211]Patrick Lumban Tobing, Hirokazu Kameoka, Tomoki Toda:
Deep acoustic-to-articulatory inversion mapping with latent trajectory modeling. APSIPA 2017: 1274-1277 - [c210]Akira Tamamori, Tomoki Hayashi, Tomoki Toda, Kazuya Takeda:
An investigation of recurrent neural network for daily activity recognition using multi-modal signals. APSIPA 2017: 1334-1340 - [c209]Kazutaka Kubo, Kazuhiro Kobayashi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
An investigation of how to design control parameters for statistical voice timbre control. APSIPA 2017: 1520-1523 - [c208]Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda:
Accurate estimation of f0 and aperiodicity based on periodicity detector residuals and deviations of phase derivatives. APSIPA 2017: 1556-1564 - [c207]Takuma Okamoto, Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
Subband wavenet with overlapped single-sideband filterbanks. ASRU 2017: 698-704 - [c206]Tomoki Hayashi, Akira Tamamori, Kazuhiro Kobayashi, Kazuya Takeda, Tomoki Toda:
An investigation of multi-speaker training for wavenet vocoder. ASRU 2017: 712-718 - [c205]Shogo Seki, Tomoki Toda, Kazuya Takeda:
Stereophonic music separation based on non-negative tensor factorization with cepstrum regularization. EUSIPCO 2017: 981-985 - [c204]Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Takaaki Hori, Jonathan Le Roux, Kazuya Takeda:
BLSTM-HMM hybrid system combined with sound activity detection network for polyphonic Sound Event Detection. ICASSP 2017: 766-770 - [c203]Yusuke Tajiri, Hirokazu Kameoka, Tomoki Toda:
A noise suppression method for body-conducted soft speech based on non-negative tensor factorization of air- and body-conducted signals. ICASSP 2017: 4960-4964 - [c202]Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda:
A Modulation Property of Time-Frequency Derivatives of Filtered Phase and its Application to Aperiodicity and fo Estimation. INTERSPEECH 2017: 424-428 - [c201]Kou Tanaka, Hirokazu Kameoka, Tomoki Toda, Satoshi Nakamura:
Physically Constrained Statistical F0 Prediction for Electrolaryngeal Speech Enhancement. INTERSPEECH 2017: 1069-1073 - [c200]Akira Tamamori, Tomoki Hayashi, Kazuhiro Kobayashi, Kazuya Takeda, Tomoki Toda:
Speaker-Dependent WaveNet Vocoder. INTERSPEECH 2017: 1118-1122 - [c199]Kazuhiro Kobayashi, Tomoki Hayashi, Akira Tamamori, Tomoki Toda:
Statistical Voice Conversion with WaveNet-Based Waveform Generation. INTERSPEECH 2017: 1138-1142 - [c198]Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda, Toshio Irino:
A New Cosine Series Antialiasing Function and its Application to Aliasing-Free Glottal Source Models for Speech and Singing Synthesis. INTERSPEECH 2017: 1358-1362 - [c197]Li Li, Hirokazu Kameoka, Tomoki Toda, Shoji Makino:
Speech Enhancement Using Non-Negative Spectrogram Models with Mel-Generalized Cepstral Regularization. INTERSPEECH 2017: 1998-2002 - [c196]Shogo Seki, Hirokazu Kameoka, Tomoki Toda, Kazuya Takeda:
Missing component restoration for masked speech signals based on time-domain spectrogram factorization. MLSP 2017: 1-6 - [i2]Hideki Kawahara, Ken-Ichi Sakakibara, Hideki Banno, Masanori Morise, Tomoki Toda, Toshio Irino:
A new cosine series antialiasing function and its application to aliasing-free glottal source models for speech and singing synthesis. CoRR abs/1702.06724 (2017) - [i1]Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda:
A modulation property of time-frequency derivatives of filtered phase and its application to aperiodicity and fo estimation. CoRR abs/1706.02964 (2017) - 2016
- [j41]Hayato Maki, Tomoki Toda, Sakriani Sakti, Graham Neubig, Satoshi Nakamura:
Enhancing Event-Related Potentials Based on Maximum a Posteriori Estimation with a Spatial Correlation Prior. IEICE Trans. Inf. Syst. 99-D(6): 1437-1446 (2016) - [j40]Shinnosuke Takamichi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
A Statistical Sample-Based Approach to GMM-Based Voice Conversion Using Tied-Covariance Acoustic Models. IEICE Trans. Inf. Syst. 99-D(10): 2490-2498 (2016) - [j39]Kazuhiro Kobayashi, Tomoki Toda, Tomoyasu Nakano, Masataka Goto, Satoshi Nakamura:
Improvements of Voice Timbre Control Based on Perceived Age in Singing Voice Conversion. IEICE Trans. Inf. Syst. 99-D(11): 2767-2777 (2016) - [j38]Yuji Oshima, Shinnosuke Takamichi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Non-Native Text-to-Speech Preserving Speaker Individuality Based on Partial Correction of Prosodic and Phonetic Characteristics. IEICE Trans. Inf. Syst. 99-D(12): 3132-3139 (2016) - [j37]Takuya Hiraoka, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Learning cooperative persuasive dialogue policies using framing. Speech Commun. 84: 83-96 (2016) - [j36]Shinnosuke Takamichi, Tomoki Toda, Alan W. Black, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Postfilters to Modify the Modulation Spectrum for Statistical Parametric Speech Synthesis. IEEE ACM Trans. Audio Speech Lang. Process. 24(4): 755-767 (2016) - [j35]Zhizheng Wu, Phillip L. De Leon, Cenk Demiroglu, Ali Khodabakhsh, Simon King, Zhen-Hua Ling, Daisuke Saito, Bryan Stewart, Tomoki Toda, Mirjam Wester, Junichi Yamagishi:
Anti-Spoofing for Text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance. IEEE ACM Trans. Audio Speech Lang. Process. 24(4): 768-783 (2016) - [j34]Hiroki Tanaka, Sakriani Sakti, Graham Neubig, Tomoki Toda, Hideki Negoro, Hidemi Iwasaka, Satoshi Nakamura:
Teaching Social Communication Skills Through Human-Agent Interaction. ACM Trans. Interact. Intell. Syst. 6(2): 18:1-18:26 (2016) - [c195]Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Takaaki Hori, Jonathan Le Roux, Kazuya Takeda:
Bidirectional LSTM-HMM Hybrid System for Polyphonic Sound Event Detection. DCASE 2016: 35-39 - [c194]Hayato Maki, Tomoki Toda, Sakriani Sakti, Graham Neubig, Satoshi Nakamura:
Removing noise from event-related potentials using a probabilistic generative model with grouped covariance matrices. EMBC 2016: 3728-3731 - [c193]Kou Tanaka, Tomoki Toda, Graham Neubig, Satoshi Nakamura:
Real-time vibration control of an electrolarynx based on statistical F0 contour prediction. EUSIPCO 2016: 1333-1337 - [c192]Soichi Yamane, Kazuhiro Kobayashi, Tomoki Toda, Tomoyasu Nakano, Masataka Goto, Satoshi Nakamura:
An estimation method of voice timbre evaluation values using feature extraction with Gaussian mixture model based on reference singer. ICASSP 2016: 5265-5269 - [c191]Kou Tanaka, Hirokazu Kameoka, Tomoki Toda, Satoshi Nakamura:
Statistical F0 prediction for electrolaryngeal speech enhancement considering generative process of F0 contours within product of experts framework. ICASSP 2016: 5665-5669 - [c190]Kazuhiro Kobayashi, Tomoki Toda, Satoshi Nakamura:
Implementation of F0 transformation for statistical singing voice conversion based on direct waveform modification. ICASSP 2016: 5670-5674 - [c189]Yusuke Tajiri, Tomoki Toda, Satoshi Nakamura:
Noise suppression method for body-conducted soft speech enhancement based on external noise monitoring. ICASSP 2016: 5935-5939 - [c188]Patrick Lumban Tobing, Tomoki Toda, Hirokazu Kameoka, Satoshi Nakamura:
Acoustic-to-Articulatory Inversion Mapping Based on Latent Trajectory Gaussian Mixture Model. INTERSPEECH 2016: 953-957 - [c187]Tomoki Toda, Ling-Hui Chen, Daisuke Saito, Fernando Villavicencio, Mirjam Wester, Zhizheng Wu, Junichi Yamagishi:
The Voice Conversion Challenge 2016. INTERSPEECH 2016: 1632-1636 - [c186]Kazuhiro Kobayashi, Shinnosuke Takamichi, Satoshi Nakamura, Tomoki Toda:
The NU-NAIST Voice Conversion System for the Voice Conversion Challenge 2016. INTERSPEECH 2016: 1667-1671 - [c185]Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
Model Integration for HMM- and DNN-Based Speech Synthesis Using Product-of-Experts Framework. INTERSPEECH 2016: 2288-2292 - [c184]Quoc Truong Do, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
A Hybrid System for Continuous Word-Level Emphasis Modeling Based on HMM State Clustering and Adaptive Training. INTERSPEECH 2016: 3196-3200 - [c183]Takuya Hiraoka, Graham Neubig, Koichiro Yoshino, Tomoki Toda, Satoshi Nakamura:
Active Learning for Example-Based Dialog Systems. IWSDS 2016: 67-78 - [c182]Kazuhiro Kobayashi, Tomoki Toda, Satoshi Nakamura:
F0 transformation techniques for statistical voice conversion with direct waveform modification with spectral differential. SLT 2016: 693-700 - [c181]Yusuke Tajiri, Tomoki Toda:
Nonaudible murmur enhancement based on statistical voice conversion and noise suppression with external noise monitoring. SSW 2016: 52-58 - 2015
- [j33]Hiroki Tanaka, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
NOCOA+: Multimodal Computer-Based Training for Social and Communication Skills. IEICE Trans. Inf. Syst. 98-D(8): 1536-1544 (2015) - [j32]Philip Arthur, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Semantic Parsing of Ambiguous Input through Paraphrasing and Verification. Trans. Assoc. Comput. Linguistics 3: 571-584 (2015) - [c180]Masahiro Mizukami, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Linguistic Individuality Transformation for Spoken Language. IWSDS 2015: 129-143 - [c179]Fajri Koto, Sakriani Sakti, Graham Neubig, Tomoki Toda, Mirna Adriani, Satoshi Nakamura:
A Study on Natural Expressive Speech: Automatic Memorable Spoken Quote Detection. IWSDS 2015: 145-152 - [c178]Takuya Hiraoka, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Evaluation of a Fully Automatic Cooperative Persuasive Dialogue System. IWSDS 2015: 153-167 - [c177]Takafumi Sasakura, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
Unknown Word Detection Based on Event-Related Brain Desynchronization Responses. IWSDS 2015: 169-175 - [c176]Yuiko Tsunomori, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
An Analysis Towards Dialogue-Based Deception Detection. IWSDS 2015: 177-187 - [c175]Yusuke Oda, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Syntax-based Simultaneous Translation through Prediction of Unseen Syntactic Constituents. ACL (1) 2015: 198-207 - [c174]Akiva Miura, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Improving Pivot Translation by Remembering the Pivot. ACL (2) 2015: 573-577 - [c173]Hideki Kawahara, Ken-Ichi Sakakibara, Hideki Banno, Masanori Morise, Tomoki Toda, Toshio Irino:
Aliasing-free implementation of discrete-time glottal source models and their applications to speech synthesis and F0 extractor evaluation. APSIPA 2015: 520-529 - [c172]Sakriani Sakti, Faiz Ilham, Graham Neubig, Tomoki Toda, Ayu Purwarianti, Satoshi Nakamura:
Incremental sentence compression using LSTM recurrent networks. ASRU 2015: 252-258 - [c171]Quoc Truong Do, Michael Heck, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
The NAIST ASR system for the 2015 Multi-Genre Broadcast challenge: On combination of deep learning systems using a rank-score function. ASRU 2015: 654-659 - [c170]Nurul Lubis, Sakriani Sakti, Graham Neubig, Koichiro Yoshino, Tomoki Toda, Satoshi Nakamura:
A study of social-affective communication: Automatic prediction of emotion triggers and responses in television talk shows. ASRU 2015: 777-783 - [c169]Masahiro Mizukami, Hideaki Kizuki, Toshio Nomura, Graham Neubig, Koichiro Yoshino, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Adaptive selection from multiple response candidates in example-based dialogue. ASRU 2015: 784-790 - [c168]Kou Tanaka, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
An Enhanced Electrolarynx with Automatic Fundamental Frequency Control based on Statistical Prediction. ASSETS 2015: 435-436 - [c167]Shinnosuke Takamichi, Kazuhiro Kobayashi, Kou Tanaka, Tomoki Toda, Satoshi Nakamura:
The NAIST Text-to-Speech System for the Blizzard Challenge 2015. Blizzard Challenge 2015 - [c166]Hayato Maki, Tomoki Toda, Sakriani Sakti, Graham Neubig, Satoshi Nakamura:
An evaluation of EEG ocular artifact removal with a multi-channel wiener filter based on probabilistic generative model. EMBC 2015: 2775-2778 - [c165]Hayato Maki, Tomoki Toda, Sakriani Sakti, Graham Neubig, Satoshi Nakamura:
EEG signal enhancement using multi-channel wiener filter with a spatial correlation prior. ICASSP 2015: 2639-2643 - [c164]Shinnosuke Takamichi, Tomoki Toda, Alan W. Black, Satoshi Nakamura:
Parameter generation algorithm considering Modulation Spectrum for HMM-based speech synthesis. ICASSP 2015: 4210-4214 - [c163]Zhizheng Wu, Ali Khodabakhsh, Cenk Demiroglu, Junichi Yamagishi, Daisuke Saito, Tomoki Toda, Simon King:
SAS: A speaker verification spoofing database containing diverse attacks. ICASSP 2015: 4440-4444 - [c162]Andros Tjandra, Sakriani Sakti, Graham Neubig, Tomoki Toda, Mirna Adriani, Satoshi Nakamura:
Combination of two-dimensional cochleogram and spectrogram features for deep learning-based ASR. ICASSP 2015: 4525-4529 - [c161]Shinnosuke Takamichi, Tomoki Toda, Alan W. Black, Satoshi Nakamura:
Modulation spectrum-constrained trajectory training algorithm for GMM-based Voice Conversion. ICASSP 2015: 4859-4863 - [c160]Yuji Oshima, Shinnosuke Takamichi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Non-native speech synthesis preserving speaker individuality based on partial correction of prosodic and phonetic characteristics. INTERSPEECH 2015: 299-303 - [c159]Shinnosuke Takamichi, Tomoki Toda, Alan W. Black, Satoshi Nakamura:
Modulation spectrum-constrained trajectory training algorithm for HMM-based speech synthesis. INTERSPEECH 2015: 1206-1210 - [c158]Takashi Mieno, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Speed or accuracy? a study in evaluation of simultaneous speech translation. INTERSPEECH 2015: 2267-2271 - [c157]The Tung Nguyen, Graham Neubig, Hiroyuki Shindo, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
A latent variable model for joint pause prediction and dependency parsing. INTERSPEECH 2015: 2719-2723 - [c156]Kazuhiro Kobayashi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Statistical singing voice conversion based on direct waveform modification with global variance. INTERSPEECH 2015: 2754-2758 - [c155]Yusuke Tajiri, Kou Tanaka, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Non-audible murmur enhancement based on statistical conversion using air- and body-conductive microphones in noisy environments. INTERSPEECH 2015: 2769-2773 - [c154]Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Articulatory controllable speech modification based on Gaussian mixture models with direct waveform modification using spectrum differential. INTERSPEECH 2015: 3350-3354 - [c153]Quoc Truong Do, Shinnosuke Takamichi, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
Preserving word-level emphasis in speech-to-speech translation using linear regression HSMMs. INTERSPEECH 2015: 3665-3669 - [c152]Hiroki Tanaka, Sakriani Sakti, Graham Neubig, Tomoki Toda, Hideki Negoro, Hidemi Iwasaka, Satoshi Nakamura:
Automated Social Skills Trainer. IUI 2015: 17-27 - [c151]Quoc Truong Do, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
Improving translation of emphasis with pause prediction in speech-to-speech translation systems. IWSLT 2015 - [c150]Yusuke Oda, Hiroyuki Fudaba, Graham Neubig, Hideaki Hata, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation (T). ASE 2015: 574-584 - [c149]Hiroyuki Fudaba, Yusuke Oda, Koichi Akabe, Graham Neubig, Hideaki Hata, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Pseudogen: A Tool to Automatically Generate Pseudo-Code from Source Code. ASE 2015: 824-829 - [c148]Yusuke Oda, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Ckylark: A More Robust PCFG-LA Parser. HLT-NAACL 2015: 41-45 - [c147]Nurul Lubis, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
Construction and analysis of social-affective interaction corpus in English and Indonesian. O-COCOSDA/CASLRE 2015: 202-206 - [c146]Kyoshiro Sugiyama, Masahiro Mizukami, Graham Neubig, Koichiro Yoshino, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering. WMT@EMNLP 2015: 442-449 - 2014
- [j31]Kazuhiro Kobayashi, Tomoki Toda, Hironori Doi, Tomoyasu Nakano, Masataka Goto, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Voice Timbre Control Based on Perceived Age in Singing Voice Conversion. IEICE Trans. Inf. Syst. 97-D(6): 1419-1428 (2014) - [j30]Kou Tanaka, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
A Hybrid Approach to Electrolaryngeal Speech Enhancement Based on Noise Reduction and Statistical Excitation Generation. IEICE Trans. Inf. Syst. 97-D(6): 1429-1437 (2014) - [j29]Keigo Kubo, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
Structured Adaptive Regularization of Weight Vectors for a Robust Grapheme-to-Phoneme Conversion Model. IEICE Trans. Inf. Syst. 97-D(6): 1468-1476 (2014) - [j28]Lasguido Nio, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
Utilizing Human-to-Human Conversation Examples for a Multi Domain Chat-Oriented Dialog System. IEICE Trans. Inf. Syst. 97-D(6): 1497-1505 (2014) - [j27]Shinnosuke Takamichi, Tomoki Toda, Yoshinori Shiga, Sakriani Sakti, Graham Neubig, Satoshi Nakamura:
Parameter Generation Methods With Rich Context Models for High-Quality and Flexible Text-To-Speech Synthesis. IEEE J. Sel. Top. Signal Process. 8(2): 239-250 (2014) - [j26]Hironori Doi, Tomoki Toda, Keigo Nakamura, Hiroshi Saruwatari, Kiyohiro Shikano:
Alaryngeal Speech Enhancement Based on One-to-Many Eigenvoice Conversion. IEEE ACM Trans. Audio Speech Lang. Process. 22(1): 172-183 (2014) - [c145]Yusuke Oda, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Optimizing Segmentation Strategies for Simultaneous Speech Translation. ACL (2) 2014: 551-556 - [c144]Hiroki Tanaka, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
Linguistic and Acoustic Features for Automatic Identification of Autism Spectrum Disorders in Children's Narrative. CLPsych@ACL 2014: 88-96 - [c143]Hideki Kawahara, Masanori Morise, Tomoki Toda, Hideki Banno, Ryuichi Nisimura, Toshio Irino:
Excitation source design for high-quality speech manipulation systems based on a temporally static group delay representation of periodic signals. APSIPA 2014: 1-10 - [c142]Kazuhiro Kobayashi, Tomoki Toda, Tomoyasu Nakano, Masataka Goto, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Gender-dependent spectrum differential models for perceived age control based on direct waveform modification in singing voice conversion. APSIPA 2014: 1-4 - [c141]Fajri Koto, Sakriani Sakti, Graham Neubig, Tomoki Toda, Mirna Adriani, Satoshi Nakamura:
The use of semantic and acoustic features for open-domain TED talk summarization. APSIPA 2014: 1-4 - [c140]Lasguido Nio, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
Recursive neural network paraphrase identification for example-based dialog retrieval. APSIPA 2014: 1-4 - [c139]Sakriani Sakti, Yu Odagaki, Takafumi Sasakura, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
An event-related brain potential study on the impact of speech recognition errors. APSIPA 2014: 1-4 - [c138]Shinnosuke Takamichi, Tomoki Toda, Alan W. Black, Satoshi Nakamura:
Modulation spectrum-based post-filter for GMM-based Voice Conversion. APSIPA 2014: 1-4 - [c137]Kou Tanaka, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
An inter-speaker evaluation through simulation of electrolarynx control based on statistical F0 prediction. APSIPA 2014: 1-4 - [c136]Sakura Tsuruta, Kou Tanaka, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
An evaluation of target speech for a nonaudible murmur enhancement system in noisy environments. APSIPA 2014: 1-4 - [c135]Riki Yoshida, Takuya Hiraoka, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Unnecessary utterance detection for avoiding digressions in discussion. APSIPA 2014: 1-4 - [c134]Koichi Akabe, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Discriminative Language Models as a Tool for Machine Translation Error Analysis. COLING 2014: 1124-1132 - [c133]Takuya Hiraoka, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Reinforcement Learning of Cooperative Persuasive Dialogue Policies using Framing. COLING 2014: 1706-1717 - [c132]Hoa Trong Vu, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Acquiring a Dictionary of Emotion-Provoking Events. EACL 2014: 128-132 - [c131]Shinnosuke Takamichi, Tomoki Toda, Alan W. Black, Satoshi Nakamura:
Modified post-filter to recover modulation spectrum for HMM-based speech synthesis. GlobalSIP 2014: 547-551 - [c130]Tomoki Toda:
Augmented speech production based on real-time statistical voice conversion. GlobalSIP 2014: 592-596 - [c129]Shinnosuke Takamichi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
A postfilter to modify the modulation spectrum in HMM-based speech synthesis. ICASSP 2014: 290-294 - [c128]Keigo Kubo, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
Narrow Adaptive Regularization of weights for grapheme-to-phoneme conversion. ICASSP 2014: 2589-2593 - [c127]Kou Tanaka, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
An evaluation of excitation feature prediction in a hybrid approach to electrolaryngeal speech enhancement. ICASSP 2014: 4488-4492 - [c126]Kazuhiro Kobayashi, Tomoki Toda, Tomoyasu Nakano, Masataka Goto, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Regression approaches to perceptual age control in singing voice conversion. ICASSP 2014: 7904-7908 - [c125]Kou Tanaka, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Direct F0 control of an electrolarynx based on statistical excitation feature prediction and its evaluation through simulation. INTERSPEECH 2014: 31-35 - [c124]Nozomi Jinbo, Shinnosuke Takamichi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
A hearing impairment simulation method using audiogram-based approximation of auditory charatecteristics. INTERSPEECH 2014: 490-494 - [c123]Keigo Kubo, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
Structured soft margin confidence weighted learning for grapheme-to-phoneme conversion. INTERSPEECH 2014: 1263-1267 - [c122]Sho Matsumiya, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
Data-driven generation of text balloons based on linguistic and acoustic features of a comics-anime corpus. INTERSPEECH 2014: 1801-1805 - [c121]Hideki Kawahara, Masanori Morise, Tomoki Toda, Hideki Banno, Ryuichi Nisimura, Toshio Irino:
Excitation source analysis for high-quality speech manipulation systems based on an interference-free representation of group delay with minimum phase response compensation. INTERSPEECH 2014: 2243-2247 - [c120]Patrick Lumban Tobing, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura, Ayu Purwarianti:
Articulatory controllable speech modification based on statistical feature mapping with Gaussian mixture models. INTERSPEECH 2014: 2298-2302 - [c119]Kazuhiro Kobayashi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Statistical singing voice conversion with direct waveform modification based on the spectrum differential. INTERSPEECH 2014: 2514-2518 - [c118]Nurul Lubis, Sakriani Sakti, Graham Neubig, Tomoki Toda, Ayu Purwarianti, Satoshi Nakamura:
Emotion and Its Triggers in Human Spoken Dialogue: Recognition and Analysis. IWSDS 2014: 103-110 - [c117]Takuya Hiraoka, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Construction and Analysis of a Persuasive Dialogue Corpus. IWSDS 2014: 125-138 - [c116]Hiroaki Shimizu, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Collection of a Simultaneous Translation Corpus for Comparative Analysis. LREC 2014: 670-673 - [c115]Sakriani Sakti, Keigo Kubo, Sho Matsumiya, Graham Neubig, Tomoki Toda, Satoshi Nakamura, Fumihiro Adachi, Ryosuke Isotani:
Towards Multilingual Conversations in the Medical Domain: Development of Multilingual Medical Data and A Network-based ASR System. LREC 2014: 2639-2643 - [c114]Quoc Truong Do, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Collection and analysis of a Japanese-English emphasized speech corpora. O-COCOSDA 2014: 1-5 - [c113]Fajri Koto, Sakriani Sakti, Graham Neubig, Tomoki Toda, Mirna Adriani, Satoshi Nakamura:
Memorable spoken quote corpora of TED public speaking. O-COCOSDA 2014: 1-4 - [c112]Masahiro Mizukami, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Building a free, general-domain paraphrase database for Japanese. O-COCOSDA 2014: 1-4 - [c111]Lasguido Nio, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
Conversation dialog corpora from television and movie scripts. O-COCOSDA 2014: 1-4 - [c110]Lasguido Nio, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
Improving the robustness of example-based dialog retrieval using recursive neural network paraphrase identification. SLT 2014: 306-311 - [c109]Yuto Hatakoshi, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Rule-based Syntactic Preprocessing for Syntax-based Machine Translation. SSST@EMNLP 2014: 34-42 - 2013
- [j25]Keiichi Tokuda, Yoshihiko Nankaku, Tomoki Toda, Heiga Zen, Junichi Yamagishi, Keiichiro Oura:
Speech Synthesis Based on Hidden Markov Models. Proc. IEEE 101(5): 1234-1252 (2013) - [c108]Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura, Yuji Matsumoto, Ryosuke Isotani, Yukichi Ikeda:
Towards High-Reliability Speech Translation in the Medical Domain. NLPHealthcare@IJCNLP 2013: 22-29 - [c107]Takuya Hiraoka, Yuki Yamauchi, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Dialogue management for leading the conversation in persuasive dialogue systems. ASRU 2013: 114-119 - [c106]Philip Arthur, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Inter-Sentence Features and Thresholded Minimum Error Rate Training: NAIST at CLEF 2013 QA4MRE. CLEF (Working Notes) 2013 - [c105]Philip Arthur, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
NAIST at the CLEF 2013 QA4MRE Pilot Task. CLEF (Working Notes) 2013 - [c104]Hiroki Tanaka, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
Modality and contextual differences in computer based non-verbal communication training. CogInfoCom 2013: 127-132 - [c103]Hideki Kawahara, Masanori Morise, Tomoki Toda, Ryuichi Nisimura, Toshio Irino:
Beyond bandlimited sampling of speech spectral envelope imposed by the harmonic structure of voiced sounds. INTERSPEECH 2013: 34-38 - [c102]Shinnosuke Takamichi, Tomoki Toda, Yoshinori Shiga, Sakriani Sakti, Graham Neubig, Satoshi Nakamura:
Improvements to HMM-based speech synthesis based on parameter generation with rich context models. INTERSPEECH 2013: 364-368 - [c101]Kazuhiro Kobayashi, Hironori Doi, Tomoki Toda, Tomoyasu Nakano, Masataka Goto, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
An investigation of acoustic features for singing voice conversion based on perceptual age. INTERSPEECH 2013: 1057-1061 - [c100]Hironori Doi, Tomoki Toda, Tomoyasu Nakano, Masataka Goto, Satoshi Nakamura:
Evaluation of a singing voice conversion method based on many-to-many eigenvoice conversion. INTERSPEECH 2013: 1067-1071 - [c99]Keigo Kubo, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
Grapheme-to-phoneme conversion based on adaptive regularization of weight vectors. INTERSPEECH 2013: 1946-1950 - [c98]Takatomo Kano, Shinnosuke Takamichi, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
Generalizing continuous-space translation of paralinguistic information. INTERSPEECH 2013: 2614-2618 - [c97]Masaya Ohgushi, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
An empirical comparison of joint optimization techniques for speech translation. INTERSPEECH 2013: 2619-2623 - [c96]Kou Tanaka, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
A hybrid approach to electrolaryngeal speech enhancement based on spectral subtraction and statistical voice conversion. INTERSPEECH 2013: 3067-3071 - [c95]Takuto Moriguchi, Tomoki Toda, Motoaki Sano, Hiroshi Sato, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
A digital signal processor implementation of silent/electrolaryngeal speech enhancement based on real-time statistical voice conversion. INTERSPEECH 2013: 3072-3076 - [c94]Tomoki Fujita, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Simple, lexicalized choice of translation timing for simultaneous speech translation. INTERSPEECH 2013: 3487-3491 - [c93]Sakriani Sakti, Keigo Kubo, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
The NAIST English speech recognition system for IWSLT 2013. IWSLT (Evaluation Campaign) 2013 - [c92]Hiroaki Shimizu, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Constructing a speech translation system using simultaneous interpretation data. IWSLT 2013 - [c91]Tatsuo Inukai, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Investigation of intra-speaker spectral parameter variation and its prediction towards improvement of spectral conversion metric. SSW 2013: 89-94 - 2012
- [j24]Tomoaki Nakamura, Komei Sugiura, Takayuki Nagai, Naoto Iwahashi, Tomoki Toda, Hiroyuki Okada, Takashi Omori:
Learning Novel Objects for Extended Mobile Manipulation. J. Intell. Robotic Syst. 66(1-2): 187-204 (2012) - [j23]Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech. Speech Commun. 54(1): 134-146 (2012) - [j22]Tomoki Toda, Mikihiro Nakagiri, Kiyohiro Shikano:
Statistical Voice Conversion Techniques for Body-Conducted Unvoiced Speech Enhancement. IEEE Trans. Speech Audio Process. 20(9): 2505-2517 (2012) - [c90]Hironori Doi, Tomoki Toda, Tomoyasu Nakano, Masataka Goto, Satoshi Nakamura:
Singing voice conversion method based on many-to-many eigenvoice conversion and training data generation using a singing-to-singing synthesis system. APSIPA 2012: 1-6 - [c89]Hiroki Tanaka, Sakriani Sakti, Graham Neubig, Tomoki Toda, Nick Campbell, Satoshi Nakamura:
Non-verbal cognitive skills and autistic conditions: An analysis and training tool. CogInfoCom 2012: 41-46 - [c88]Kenzo Yamamoto, Tomoki Toda, Hironori Doi, Hiroshi Saruwatari, Kiyohiro Shikano:
Statistical approach to voice quality control in esophageal speech enhancement. ICASSP 2012: 4497-4500 - [c87]Tomoki Toda, Takashi Muramatsu, Hideki Banno:
Implementation of Computationally Efficient Real-Time Voice Conversion. INTERSPEECH 2012: 94-97 - [c86]Shinnosuke Takamichi, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, Sakriani Sakti, Satoshi Nakamura:
An Evaluation of Parameter Generation Methods with Rich Context Models in HMM-Based Speech Synthesis. INTERSPEECH 2012: 1139-1142 - [c85]Miyuki Itoi, Ryoichi Miyazaki, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Blind speech extraction for Non-Audible Murmur speech with speaker's movement noise. ISSPIT 2012: 320-325 - [c84]Lasguido Nio, Sakriani Sakti, Graham Neubig, Tomoki Toda, Mirna Adriani, Satoshi Nakamura:
Developing Non-goal Dialog System Based on Examples of Drama Television. IWSDS 2012: 355-361 - [c83]Graham Neubig, Kevin Duh, Masaya Ogushi, Takatomo Kano, Tetsuo Kiso, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
The NAIST machine translation system for IWSLT2012. IWSLT 2012: 54-60 - [c82]Christian Saam, Christian Mohr, Kevin Kilgour, Michael Heck, Matthias Sperber, Keigo Kubo, Sebastian Stüker, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura, Alex Waibel:
The 2012 KIT and KIT-NAIST English ASR systems for the IWSLT evaluation. IWSLT 2012: 87-90 - [c81]Michael Heck, Keigo Kubo, Matthias Sperber, Sakriani Sakti, Sebastian Stüker, Christian Saam, Kevin Kilgour, Christian Mohr, Graham Neubig, Tomoki Toda, Satoshi Nakamura, Alex Waibel:
The KIT-NAIST (contrastive) English ASR system for IWSLT 2012. IWSLT 2012: 91-95 - [c80]Takatomo Kano, Sakriani Sakti, Shinnosuke Takamichi, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
A method for translation of paralinguistic information. IWSLT 2012: 158-163 - 2011
- [c79]Shunta Ishii, Tomoki Toda, Hiroshi Saruwatari, Sakriani Sakti, Satoshi Nakamura:
Blind noise suppression for Non-Audible Murmur recognition with stereo signal processing. ASRU 2011: 494-499 - [c78]Hironori Doi, Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
An evaluation of alaryngeal speech enhancement methods based on voice conversion techniques. ICASSP 2011: 5136-5139 - [c77]Denis Babani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Acoustic model training for non-audible murmur recognition using transformed normal speech data. ICASSP 2011: 5224-5227 - [c76]Nobuhiko Hattori, Tomoki Toda, Hisashi Kawai, Hiroshi Saruwatari, Kiyohiro Shikano:
Speaker-Adaptive Speech Synthesis Based on Eigenvoice Conversion and Language-Dependent Prosodic Conversion in Speech-to-Speech Translation. INTERSPEECH 2011: 2769-2772 - 2010
- [j21]Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Adaptive Training for Voice Conversion Based on Eigenvoices. IEICE Trans. Inf. Syst. 93-D(6): 1589-1598 (2010) - [j20]Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Evaluation of Extremely Small Sound Source Signals Used in Speaking-Aid System with Statistical Voice Conversion. IEICE Trans. Inf. Syst. 93-D(7): 1909-1917 (2010) - [j19]Hironori Doi, Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Esophageal Speech Enhancement Based on Statistical Voice Conversion with Gaussian Mixture Models. IEICE Trans. Inf. Syst. 93-D(9): 2472-2482 (2010) - [j18]Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Improvements of the One-to-Many Eigenvoice Conversion System. IEICE Trans. Inf. Syst. 93-D(9): 2491-2499 (2010) - [j17]Tatsuya Hirahara, Makoto Otani, Shota Shimizu, Tomoki Toda, Keigo Nakamura, Yoshitaka Nakajima, Kiyohiro Shikano:
Silent-speech enhancement using body-conducted vocal-tract resonance signals. Speech Commun. 52(4): 301-313 (2010) - [j16]Viet-Anh Tran, Gérard Bailly, Hélène Loevenbruck, Tomoki Toda:
Improvement to a NAM-captured whisper-to-speech system. Speech Commun. 52(4): 314-326 (2010) - [j15]Yannis Stylianou, Tomoki Toda, Chung-Hsien Wu, Alexander Kain, Olivier Rosec:
Introduction to the Special Section on Voice Transformation. IEEE Trans. Speech Audio Process. 18(5): 909-911 (2010) - [c75]Yoshinori Shiga, Tomoki Toda, Shinsuke Sakai, Jinfu Ni, Hisashi Kawai, Keiichi Tokuda, Minoru Tsuzaki, Satoshi Nakamura:
NICT Blizzard Challenge 2010 Entry. Blizzard Challenge 2010 - [c74]Hironori Doi, Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Statistical approach to enhancing esophageal speech based on Gaussian mixture models. ICASSP 2010: 4250-4253 - [c73]Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Non-parallel training for many-to-many eigenvoice conversion. ICASSP 2010: 4822-4825 - [c72]Yoshinori Shiga, Tomoki Toda, Shinsuke Sakai, Hisashi Kawai:
Improved training of excitation for HMM-based parametric speech synthesis. INTERSPEECH 2010: 809-812 - [c71]Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
The use of air-pressure sensor in electrolaryngeal speech enhancement based on statistical voice conversion. INTERSPEECH 2010: 1628-1631 - [c70]Kumi Ohta, Tomoki Toda, Yamato Ohtani, Hiroshi Saruwatari, Kiyohiro Shikano:
Adaptive voice-quality control based on one-to-many eigenvoice conversion. INTERSPEECH 2010: 2158-2161 - [c69]Chie Hayashida, Tomoki Toda, Yamato Ohtani, Hiroshi Saruwatari, Kiyohiro Shikano:
Linear transformation approaches to many-to-one voice conversion. SSW 2010: 74-79
2000 – 2009
- 2009
- [j14]Randy Gomez, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Techniques in rapid unsupervised speaker adaptation based on HMM-Sufficient Statistics. Speech Commun. 51(1): 42-57 (2009) - [j13]Junichi Yamagishi, Takashi Nose, Heiga Zen, Zhen-Hua Ling, Tomoki Toda, Keiichi Tokuda, Simon King, Steve Renals:
Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis. IEEE Trans. Speech Audio Process. 17(6): 1208-1230 (2009) - [c68]Ranniery Maia, Tomoki Toda, Shinsuke Sakai, Yoshinori Shiga, Jinfu Ni, Hisashi Kawai, Keiichi Tokuda, Minoru Tsuzaki, Satoshi Nakamura:
The NICT Entry for the Blizzard Challenge 2009: an Enhanced HMM-based Speech Synthesis System with Trajectory Training considering Global Variance and State-Dependent Mixed Excitation. Blizzard Challenge 2009 - [c67]Tomoki Toda, Keigo Nakamura, Hidehiko Sekimoto, Kiyohiro Shikano:
Voice conversion for various types of body transmitted speech. ICASSP 2009: 3601-3604 - [c66]Kai Yu, Tomoki Toda, Milica Gasic, Simon Keizer, François Mairesse, Blaise Thomson, Steve J. Young:
Probablistic modelling of F0 in unvoiced regions in HMM based speech synthesis. ICASSP 2009: 3773-3776 - [c65]Daisuke Miyamoto, Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Acoustic compensation methods for body transmitted speech conversion. ICASSP 2009: 3901-3904 - [c64]Tomoki Toda, Steve J. Young:
Trajectory training considering global variance for HMM-based speech synthesis. ICASSP 2009: 4025-4028 - [c63]Tomoki Toda, Keigo Nakamura, Takayuki Nagai, Tomomi Kaino, Yoshitaka Nakajima, Kiyohiro Shikano:
Technologies for processing body-conducted speech detected with non-audible murmur microphone. INTERSPEECH 2009: 632-635 - [c62]Viet-Anh Tran, Gérard Bailly, Hélène Loevenbruck, Tomoki Toda:
Multimodal HMM-based NAM-to-speech conversion. INTERSPEECH 2009: 656-659 - [c61]Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Electrolaryngeal speech enhancement based on statistical voice conversion. INTERSPEECH 2009: 1431-1434 - [c60]Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Many-to-many eigenvoice conversion with reference voice. INTERSPEECH 2009: 1623-1626 - [c59]Malorie Charlier, Yamato Ohtani, Tomoki Toda, Alexis Moinet, Thierry Dutoit:
Cross-language voice conversion based on eigenvoices. INTERSPEECH 2009: 1635-1638 - [c58]Ranniery Maia, Tomoki Toda, Keiichi Tokuda, Shinsuke Sakai, Satoshi Nakamura:
A decision tree-based clustering approach to state definition in an excitation modeling framework for HMM-based speech synthesis. INTERSPEECH 2009: 1783-1786 - 2008
- [j12]Tobias Cincarek, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Cost Reduction of Acoustic Modeling for Real-Environment Applications Using Unsupervised and Selective Training. IEICE Trans. Inf. Syst. 91-D(3): 499-507 (2008) - [j11]Goshu Nagino, Makoto Shozakai, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method. IEICE Trans. Inf. Syst. 91-D(3): 607-614 (2008) - [j10]Heiga Zen, Tomoki Toda, Keiichi Tokuda:
The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006. IEICE Trans. Inf. Syst. 91-D(6): 1764-1773 (2008) - [j9]Tomoki Toda, Alan W. Black, Keiichi Tokuda:
Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model. Speech Commun. 50(3): 215-227 (2008) - [c57]Ranniery Maia, Jinfu Ni, Shinsuke Sakai, Tomoki Toda, Keiichi Tokuda, Tohru Shimizu, Satoshi Nakamura:
The NICT/ATR speech synthesis system for the Blizzard Challenge 2008. Blizzard Challenge 2008 - [c56]Junichi Yamagishi, Heiga Zen, Yi-Jian Wu, Tomoki Toda, Keiichi Tokuda:
The HTS-2008 System: Yet Another Evaluation of the Speaker-Adaptive HMM-based Speech Synthesis System in The 2008 Blizzard Challenge. Blizzard Challenge 2008 - [c55]Tomoki Toda, Keiichi Tokuda:
Statistical approach to vocal tract transfer function estimation based on factor analyzed trajectory HMM. ICASSP 2008: 3925-3928 - [c54]Junichi Yamagishi, Takashi Nose, Heiga Zen, Tomoki Toda, Keiichi Tokuda:
Performance evaluation of the speaker-independent HMM-based speech synthesis system "HTS 2007" for the Blizzard Challenge 2007. ICASSP 2008: 3957-3960 - [c53]Ranniery Maia, Tomoki Toda, Keiichi Tokuda, Shinichi Sakai, Shun Nakamura:
On the state definition for a trainable excitation model in HMM-based speech synthesis. ICASSP 2008: 3965-3968 - [c52]Kaori Yutani, Yosuke Uto, Yoshihiko Nankaku, Tomoki Toda, Keiichi Tokuda:
Simultaneous conversion of duration and spectrum based on statistical models including time-sequence matching. INTERSPEECH 2008: 1072-1075 - [c51]Takashi Muramatsu, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory. INTERSPEECH 2008: 1076-1079 - [c50]Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
An improved one-to-many eigenvoice conversion system. INTERSPEECH 2008: 1080-1083 - [c49]Daisuke Tani, Tomoki Toda, Yamato Ohtani, Hiroshi Saruwatari, Kiyohiro Shikano:
Maximum a posteriori adaptation for many-to-one eigenvoice conversion. INTERSPEECH 2008: 1461-1463 - [c48]Keigo Nakamura, Tomoki Toda, Yoshitaka Nakajima, Hiroshi Saruwatari, Kiyohiro Shikano:
Evaluation of speaking-aid system with voice conversion for laryngectomees toward its use in practical environments. INTERSPEECH 2008: 2209-2212 - [c47]Keiichiro Oura, Yoshihiko Nankaku, Tomoki Toda, Keiichi Tokuda, Ranniery Maia, Shinsuke Sakai, Satoshi Nakamura:
Simultaneous Acoustic, Prosodic, and Phrasing Model Training for TTs Conversion Systems. ISCSLP 2008: 1-4 - 2007
- [j8]Heiga Zen, Tomoki Toda, Masaru Nakamura, Keiichi Tokuda:
Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005. IEICE Trans. Inf. Syst. 90-D(1): 325-333 (2007) - [j7]Randy Gomez, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Reducing Computation Time of the Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics. IEICE Trans. Inf. Syst. 90-D(2): 554-561 (2007) - [j6]Tomoki Toda, Keiichi Tokuda:
A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis. IEICE Trans. Inf. Syst. 90-D(5): 816-824 (2007) - [j5]Tomoki Toda, Alan W. Black, Keiichi Tokuda:
Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory. IEEE Trans. Speech Audio Process. 15(8): 2222-2235 (2007) - [c46]Jinfu Ni, Toshio Hirai, Hisashi Kawai, Tomoki Toda, Keiichi Tokuda, Minoru Tsuzaki, Shinsuke Sakai, Ranniery Maia, Satoshi Nakamura:
ATRECSS - ATR English speech corpus for speech synthesis. Blizzard Challenge 2007 - [c45]Junichi Yamagishi, Heiga Zen, Tomoki Toda, Keiichi Tokuda:
Speaker-independent HMM-based speech synthesis system - HTS-2007 system for the Blizzard Challenge 2007. Blizzard Challenge 2007 - [c44]Tomoki Toda, Yamato Ohtani, Kiyohiro Shikano:
One-to-Many and Many-to-One Voice Conversion Based on Eigenvoices. ICASSP (4) 2007: 1249-1252 - [c43]Randy Gomez, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Rapid unsupervised speaker adaptation using single utterance based on MLLR and speaker selection. INTERSPEECH 2007: 262-265 - [c42]Tobias Cincarek, Izumi Shindo, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Development of preschool children subsystem for ASR and q&a in a real-environment speech-oriented guidance task. INTERSPEECH 2007: 1469-1472 - [c41]Ranniery Maia, Tomoki Toda, Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda:
A trainable excitation model for HMM-based speech synthesis. INTERSPEECH 2007: 1909-1912 - [c40]Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Speaker adaptive training for one-to-many eigenvoice conversion based on Gaussian mixture model. INTERSPEECH 2007: 1981-1984 - [c39]Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Impact of various small sound source signals on voice conversion accuracy in speech communication aid for laryngectomees. INTERSPEECH 2007: 2517-2520 - [c38]Shinsuke Sakai, Jinfu Ni, Ranniery Maia, Keiichi Tokuda, Minoru Tsuzaki, Tomoki Toda, Hisashi Kawai, Satoshi Nakamura:
Communicative speech synthesis with XIMERA: a first step. SSW 2007: 28-33 - [c37]Kumi Ohta, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Regression approaches to voice quality controll based on one-to-many eigenvoice conversion. SSW 2007: 101-106 - [c36]Daisuke Tani, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
An evaluation of many-to-one voice conversion algorithms with pre-stored speaker data sets. SSW 2007: 107-112 - [c35]Junichi Yamagishi, Takao Kobayashi, Steve Renals, Simon King, Heiga Zen, Tomoki Toda, Keiichi Tokuda:
Improved average-voice-based speech synthesis using gender-mixed modeling and a parameter generation algorithm considering GV. SSW 2007: 125-130 - [c34]Ranniery Maia, Tomoki Toda, Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda:
An excitation model for HMM-based speech synthesis based on residual modeling. SSW 2007: 131-136 - [c33]Yoshihiko Nankaku, Kenichi Nakamura, Tomoki Toda, Keiichi Tokuda:
Spectral conversion based on statistical models including time-sequence matching. SSW 2007: 333-338 - 2006
- [j4]Tobias Cincarek, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Utterance-Based Selective Training for the Automatic Creation of Task-Dependent Acoustic Models. IEICE Trans. Inf. Syst. 89-D(3): 962-969 (2006) - [j3]Randy Gomez, Akinobu Lee, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models. IEICE Trans. Inf. Syst. 89-D(3): 998-1005 (2006) - [j2]Tomoki Toda, Hisashi Kawai, Minoru Tsuzaki, Kiyohiro Shikano:
An evaluation of cost functions sensitively capturing local degradation of naturalness for segment selection in concatenative speech synthesis. Speech Commun. 48(1): 45-56 (2006) - [c32]Tomoki Toda, Hisashi Kawai, Toshio Hirai, Jinfu Ni, Nobuyuki Nishizawa, Junichi Yamagishi, Minoru Tsuzaki, Keiichi Tokuda, Satoshi Nakamura:
Developing a Test Bed of English Text-to-Speech System XIMERA for the Blizzard Challenge 2006. Blizzard Challenge 2006 - [c31]Heiga Zen, Tomoki Toda, Keiichi Tokuda:
The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006. Blizzard Challenge 2006 - [c30]Kenichi Nakamura, Tomoki Toda, Yoshihiko Nankaku, Keiichi Tokuda:
On the Use of Phonetic Information for Mapping from Articulatory Movements to Vocal Tract Spectrum. ICASSP (1) 2006: 93-96 - [c29]Randy Gomez, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Improving Rapid Unsupervised Speaker Adaptation Based On Hmm Sufficient Statistics. ICASSP (1) 2006: 1001-1004 - [c28]Tobias Cincarek, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Acoustic modeling for spoken dialogue systems based on unsupervised utterance-based selective training. INTERSPEECH 2006 - [c27]Mikihiro Nakagiri, Tomoki Toda, Hideki Kashioka, Kiyohiro Shikano:
Improving body transmitted unvoiced speech with statistical voice conversion. INTERSPEECH 2006 - [c26]Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Speaking aid system for total laryngectomees using voice conversion of body transmitted artificial speech. INTERSPEECH 2006 - [c25]Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation. INTERSPEECH 2006 - [c24]Tomoki Toda, Yamato Ohtani, Kiyohiro Shikano:
Eigenvoice conversion based on Gaussian mixture model. INTERSPEECH 2006 - [c23]Yosuke Uto, Yoshihiko Nankaku, Tomoki Toda, Akinobu Lee, Keiichi Tokuda:
Voice conversion based on mixtures of factor analyzers. INTERSPEECH 2006 - 2005
- [j1]Kazuki Adachi, Tomoki Toda, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano:
Designing Target Cost Function Based on Prosody of Speech Database. IEICE Trans. Inf. Syst. 88-D(3): 519-524 (2005) - [c22]Tomoki Toda, Alan W. Black, Keiichi Tokuda:
Spectral Conversion Based on Maximum Likelihood Estimation Considering Global Variance of Converted Parameter. ICASSP (1) 2005: 9-12 - [c21]Heiga Zen, Tomoki Toda:
An overview of nitech HMM-based speech synthesis system for blizzard challenge 2005. INTERSPEECH 2005: 93-96 - [c20]Tomoki Toda, Kiyohiro Shikano:
NAM-to-speech conversion with Gaussian mixture models. INTERSPEECH 2005: 1957-1960 - [c19]Tomoki Toda, Keiichi Tokuda:
Speech parameter generation algorithm considering global variance for HMM-based speech synthesis. INTERSPEECH 2005: 2801-2804 - 2004
- [c18]Tomoki Toda, Hisashi Kawai, Minoru Tsuzaki:
Optimizing sub-cost functions for segment selection based on perceptual evaluations in concatenative speech synthesis. ICASSP (1) 2004: 657-660 - [c17]Hisashi Kawai, Tomoki Toda:
An evaluation of automatic phone segmentation for concatenative speech synthesis. ICASSP (1) 2004: 677-680 - [c16]Tomoki Toda, Alan W. Black, Keiichi Tokuda:
Acoustic-to-articulatory inversion mapping with Gaussian mixture model. INTERSPEECH 2004: 1129-1132 - [c15]Kazuki Adachi, Tomoki Toda, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano:
Perceptual Evaluation of Quality Deterioration Owing to Prosody Modification. LREC 2004 - [c14]Tomoki Toda, Alan W. Black, Keiichi Tokuda:
Mapping from articulatory movements to vocal tract spectrum with Gaussian mixture model for articulatory speech synthesis. SSW 2004: 31-36 - [c13]Hisashi Kawai, Tomoki Toda, Jinfu Ni, Minoru Tsuzaki, Keiichi Tokuda:
XIMERA: a new TTS from ATR based on corpus-based technologies. SSW 2004: 179-184 - 2003
- [c12]Tomoki Toda, Hisashi Kawai, Minoru Tsuzaki, Kiyohiro Shikano:
Segment selection considering local degradation of naturalness in concatenative speech synthesis. ICASSP (1) 2003: 696-699 - [c11]Tomoki Toda, Hisashi Kawai, Minoru Tsuzaki:
Optimizing integrated cost function for segment selection in concatenative speech synthesis based on perceptual evaluations. INTERSPEECH 2003: 297-300 - [c10]Tatsuya Shiraishi, Tomoki Toda, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano:
Simple designing methods of corpus-based visual speech synthesis. INTERSPEECH 2003: 2241-2244 - [c9]Hiromichi Kawanami, Yohei Iwami, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
GMM-based voice conversion applied to emotional speech synthesis. INTERSPEECH 2003: 2401-2404 - 2002
- [c8]Tomoki Toda, Hisashi Kawai, Minoru Tsuzaki, Kiyohiro Shikano:
Unit selection algorithm for Japanese speech synthesis based on both phoneme unit and diphone unit. ICASSP 2002: 465-468 - [c7]Mikiko Mashimo, Tomoki Toda, Hiromichi Kawanami, Hideki Kashioka, Kiyohiro Shikano, Nick Campbell:
Evaluation of cross-language voice conversion using bilingual and non-bilingual databases. INTERSPEECH 2002: 293-296 - [c6]Hiromichi Kawanami, Tsuyoshi Masuda, Tomoki Toda, Kiyohiro Shikano:
Designing Japanese speech database covering wide range in prosody for hybrid speech synthesizer. INTERSPEECH 2002: 2425-2428 - [c5]Hiromichi Kawanami, Tsuyoshi Masuda, Tomoki Toda, Kiyohiro Shikano:
Designing speech database with prosodic variety for expressive TTS system. LREC 2002 - 2001
- [c4]Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum. ICASSP 2001: 841-844 - [c3]Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
High quality voice conversion based on Gaussian mixture model with dynamic frequency warping. INTERSPEECH 2001: 349-352 - [c2]Mikiko Mashimo, Tomoki Toda, Kiyohiro Shikano, Nick Campbell:
Evaluation of cross-language voice conversion based on GMM and straight. INTERSPEECH 2001: 361-364 - 2000
- [c1]Tomoki Toda, Jinlin Lu, Hiroshi Saruwatari, Kiyohiro Shikano:
Straight-based voice conversion algorithm based on Gaussian mixture model. INTERSPEECH 2000: 279-282
Coauthor Index
aka: Zhenhua Ling
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-12-10 21:48 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint