default search action
Michael L. Seltzer
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [c82]Egor Lakomkin, Chunyang Wu, Yassir Fathullah, Ozlem Kalinli, Michael L. Seltzer, Christian Fuegen:
End-to-End Speech Recognition Contextualization with Large Language Models. ICASSP 2024: 12406-12410 - [i33]Irina-Elena Veliche, Zhuangqun Huang, Vineeth Ayyat Kochaniyan, Fuchun Peng, Ozlem Kalinli, Michael L. Seltzer:
Towards measuring fairness in speech recognition: Fair-Speech dataset. CoRR abs/2408.12734 (2024) - 2023
- [c81]Duc Le, Frank Seide, Yuhao Wang, Yang Li, Kjell Schubert, Ozlem Kalinli, Michael L. Seltzer:
Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers. ICASSP 2023: 1-5 - [c80]Ke Li, Jay Mahadeokar, Jinxi Guo, Yangyang Shi, Gil Keren, Ozlem Kalinli, Michael L. Seltzer, Duc Le:
Improving fast-slow Encoder based Transducer with Streaming Deliberation. ICASSP 2023: 1-5 - [c79]Andros Tjandra, Nayan Singhal, David Zhang, Ozlem Kalinli, Abdelrahman Mohamed, Duc Le, Michael L. Seltzer:
Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities. ICASSP 2023: 1-5 - [c78]Suyoun Kim, Akshat Shrivastava, Duc Le, Ju Lin, Ozlem Kalinli, Michael L. Seltzer:
Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding. INTERSPEECH 2023: 1119-1123 - [i32]Suyoun Kim, Akshat Shrivastava, Duc Le, Ju Lin, Ozlem Kalinli, Michael L. Seltzer:
Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding. CoRR abs/2307.12134 (2023) - [i31]Roshan Sharma, Suyoun Kim, Daniel Lazar, Trang Le, Akshat Shrivastava, Kwanghoon Ahn, Piyush Kansal, Leda Sari, Ozlem Kalinli, Michael L. Seltzer:
Augmenting text for spoken language understanding with Large Language Models. CoRR abs/2309.09390 (2023) - [i30]Egor Lakomkin, Chunyang Wu, Yassir Fathullah, Ozlem Kalinli, Michael L. Seltzer, Christian Fuegen:
End-to-End Speech Recognition Contextualization with Large Language Models. CoRR abs/2309.10917 (2023) - 2022
- [c77]Antoine Bruguier, Duc Le, Rohit Prabhavalkar, Dangna Li, Zhe Liu, Bo Wang, Eun Chang, Fuchun Peng, Ozlem Kalinli, Michael L. Seltzer:
Neural-FST Class Language Model for End-to-End Speech Recognition. ICASSP 2022: 6107-6111 - [c76]Jay Mahadeokar, Yangyang Shi, Ke Li, Duc Le, Jiedan Zhu, Vikas Chandra, Ozlem Kalinli, Michael L. Seltzer:
Streaming parallel transducer beam search with fast slow cascaded encoders. INTERSPEECH 2022: 2083-2087 - [c75]Duc Le, Akshat Shrivastava, Paden D. Tomasello, Suyoun Kim, Aleksandr Livshits, Ozlem Kalinli, Michael L. Seltzer:
Deliberation Model for On-Device Spoken Language Understanding. INTERSPEECH 2022: 3468-3472 - [c74]Suyoun Kim, Duc Le, Weiyi Zheng, Tarun Singh, Abhinav Arora, Xiaoyu Zhai, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer:
Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric. INTERSPEECH 2022: 3978-3982 - [i29]Antoine Bruguier, Duc Le, Rohit Prabhavalkar, Dangna Li, Zhe Liu, Bo Wang, Eun Chang, Fuchun Peng, Ozlem Kalinli, Michael L. Seltzer:
Neural-FST Class Language Model for End-to-End Speech Recognition. CoRR abs/2201.11867 (2022) - [i28]Jay Mahadeokar, Yangyang Shi, Ke Li, Duc Le, Jiedan Zhu, Vikas Chandra, Ozlem Kalinli, Michael L. Seltzer:
Streaming parallel transducer beam search with fast-slow cascaded encoders. CoRR abs/2203.15773 (2022) - [i27]Duc Le, Akshat Shrivastava, Paden Tomasello, Suyoun Kim, Aleksandr Livshits, Ozlem Kalinli, Michael L. Seltzer:
Deliberation Model for On-Device Spoken Language Understanding. CoRR abs/2204.01893 (2022) - [i26]Duc Le, Frank Seide, Yuhao Wang, Yang Li, Kjell Schubert, Ozlem Kalinli, Michael L. Seltzer:
Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers. CoRR abs/2211.00896 (2022) - [i25]Andros Tjandra, Nayan Singhal, David Zhang, Ozlem Kalinli, Abdelrahman Mohamed, Duc Le, Michael L. Seltzer:
Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities. CoRR abs/2211.05756 (2022) - 2021
- [c73]Suyoun Kim, Yuan Shangguan, Jay Mahadeokar, Antoine Bruguier, Christian Fuegen, Michael L. Seltzer, Duc Le:
Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer. ICASSP 2021: 7333-7337 - [c72]Ganesh Venkatesh, Alagappan Valliappan, Jay Mahadeokar, Yuan Shangguan, Christian Fuegen, Michael L. Seltzer, Vikas Chandra:
Memory-Efficient Speech Recognition on Smart Devices. ICASSP 2021: 8368-8372 - [c71]Duc Le, Mahaveer Jain, Gil Keren, Suyoun Kim, Yangyang Shi, Jay Mahadeokar, Julian Chan, Yuan Shangguan, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Michael L. Seltzer:
Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion. Interspeech 2021: 1772-1776 - [c70]Suyoun Kim, Abhinav Arora, Duc Le, Ching-Feng Yeh, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer:
Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding. Interspeech 2021: 1977-1981 - [c69]Yangyang Shi, Varun Nagaraja, Chunyang Wu, Jay Mahadeokar, Duc Le, Rohit Prabhavalkar, Alex Xiao, Ching-Feng Yeh, Julian Chan, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer:
Dynamic Encoder Transducer: A Flexible Solution for Trading Off Accuracy for Latency. Interspeech 2021: 2042-2046 - [c68]Jay Mahadeokar, Yangyang Shi, Yuan Shangguan, Chunyang Wu, Alex Xiao, Hang Su, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer:
Flexi-Transducer: Optimizing Latency, Accuracy and Compute for Multi-Domain On-Device Scenarios. Interspeech 2021: 2107-2111 - [c67]Yuan Shangguan, Rohit Prabhavalkar, Hang Su, Jay Mahadeokar, Yangyang Shi, Jiatong Zhou, Chunyang Wu, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer:
Dissecting User-Perceived Latency of On-Device E2E Speech Recognition. Interspeech 2021: 4553-4557 - [c66]Varun Nagaraja, Yangyang Shi, Ganesh Venkatesh, Ozlem Kalinli, Michael L. Seltzer, Vikas Chandra:
Collaborative Training of Acoustic Encoders for Speech Recognition. Interspeech 2021: 4573-4577 - [c65]Ching-Feng Yeh, Yongqiang Wang, Yangyang Shi, Chunyang Wu, Frank Zhang, Julian Chan, Michael L. Seltzer:
Streaming Attention-Based Models with Augmented Memory for End-To-End Speech Recognition. SLT 2021: 8-14 - [c64]Jay Mahadeokar, Yuan Shangguan, Duc Le, Gil Keren, Hang Su, Thong Le, Ching-Feng Yeh, Christian Fuegen, Michael L. Seltzer:
Alignment Restricted Streaming Recurrent Neural Network Transducer. SLT 2021: 52-59 - [c63]Duc Le, Gil Keren, Julian Chan, Jay Mahadeokar, Christian Fuegen, Michael L. Seltzer:
Deep Shallow Fusion for RNN-T Personalization. SLT 2021: 251-257 - [i24]Ganesh Venkatesh, Alagappan Valliappan, Jay Mahadeokar, Yuan Shangguan, Christian Fuegen, Michael L. Seltzer, Vikas Chandra:
Memory-efficient Speech Recognition on Smart Devices. CoRR abs/2102.11531 (2021) - [i23]Suyoun Kim, Abhinav Arora, Duc Le, Ching-Feng Yeh, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer:
Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding. CoRR abs/2104.02138 (2021) - [i22]Yangyang Shi, Varun Nagaraja, Chunyang Wu, Jay Mahadeokar, Duc Le, Rohit Prabhavalkar, Alex Xiao, Ching-Feng Yeh, Julian Chan, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer:
Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency. CoRR abs/2104.02176 (2021) - [i21]Duc Le, Mahaveer Jain, Gil Keren, Suyoun Kim, Yangyang Shi, Jay Mahadeokar, Julian Chan, Yuan Shangguan, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Michael L. Seltzer:
Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion. CoRR abs/2104.02194 (2021) - [i20]Yuan Shangguan, Rohit Prabhavalkar, Hang Su, Jay Mahadeokar, Yangyang Shi, Jiatong Zhou, Chunyang Wu, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer:
Dissecting User-Perceived Latency of On-Device E2E Speech Recognition. CoRR abs/2104.02207 (2021) - [i19]Jay Mahadeokar, Yangyang Shi, Yuan Shangguan, Chunyang Wu, Alex Xiao, Hang Su, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer:
Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios. CoRR abs/2104.02232 (2021) - [i18]Varun Nagaraja, Yangyang Shi, Ganesh Venkatesh, Ozlem Kalinli, Michael L. Seltzer, Vikas Chandra:
Collaborative Training of Acoustic Encoders for Speech Recognition. CoRR abs/2106.08960 (2021) - [i17]Suyoun Kim, Duc Le, Weiyi Zheng, Tarun Singh, Abhinav Arora, Xiaoyu Zhai, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer:
Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric. CoRR abs/2110.05376 (2021) - 2020
- [c62]Duc Le, Thilo Köhler, Christian Fuegen, Michael L. Seltzer:
G2G: TTS-Driven Pronunciation Learning for Graphemic Hybrid ASR. ICASSP 2020: 6869-6873 - [c61]Yongqiang Wang, Abdelrahman Mohamed, Duc Le, Chunxi Liu, Alex Xiao, Jay Mahadeokar, Hongzhao Huang, Andros Tjandra, Xiaohui Zhang, Frank Zhang, Christian Fuegen, Geoffrey Zweig, Michael L. Seltzer:
Transformer-Based Acoustic Modeling for Hybrid Speech Recognition. ICASSP 2020: 6874-6878 - [c60]Yi-Chen Chen, Zhaojun Yang, Ching-Feng Yeh, Mahaveer Jain, Michael L. Seltzer:
Aipnet: Generative Adversarial Pre-Training of Accent-Invariant Networks for End-To-End Speech Recognition. ICASSP 2020: 6979-6983 - [c59]Yangyang Shi, Yongqiang Wang, Chunyang Wu, Christian Fuegen, Frank Zhang, Duc Le, Ching-Feng Yeh, Michael L. Seltzer:
Weak-Attention Suppression for Transformer Based Speech Recognition. INTERSPEECH 2020: 4996-5000 - [i16]Yangyang Shi, Yongqiang Wang, Chunyang Wu, Christian Fuegen, Frank Zhang, Duc Le, Ching-Feng Yeh, Michael L. Seltzer:
Weak-Attention Suppression For Transformer Based Speech Recognition. CoRR abs/2005.09137 (2020) - [i15]Yangyang Shi, Yongqiang Wang, Chunyang Wu, Ching-Feng Yeh, Julian Chan, Frank Zhang, Duc Le, Michael L. Seltzer:
Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition. CoRR abs/2010.10759 (2020) - [i14]Suyoun Kim, Yuan Shangguan, Jay Mahadeokar, Antoine Bruguier, Christian Fuegen, Michael L. Seltzer, Duc Le:
Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer. CoRR abs/2010.13878 (2020) - [i13]Jay Mahadeokar, Yuan Shangguan, Duc Le, Gil Keren, Hang Su, Thong Le, Ching-Feng Yeh, Christian Fuegen, Michael L. Seltzer:
Alignment Restricted Streaming Recurrent Neural Network Transducer. CoRR abs/2011.03072 (2020) - [i12]Ching-Feng Yeh, Yongqiang Wang, Yangyang Shi, Chunyang Wu, Frank Zhang, Julian Chan, Michael L. Seltzer:
Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition. CoRR abs/2011.07120 (2020) - [i11]Duc Le, Gil Keren, Julian Chan, Jay Mahadeokar, Christian Fuegen, Michael L. Seltzer:
Deep Shallow Fusion for RNN-T Personalization. CoRR abs/2011.07754 (2020)
2010 – 2019
- 2019
- [j13]Shinji Watanabe, Shoko Araki, Michiel Bacchiani, Reinhold Haeb-Umbach, Michael L. Seltzer:
Introduction to the Issue on Far-Field Speech Processing in the Era of Deep Learning: Speech Enhancement, Separation, and Recognition. IEEE J. Sel. Top. Signal Process. 13(4): 785-786 (2019) - [j12]Reinhold Haeb-Umbach, Shinji Watanabe, Tomohiro Nakatani, Michiel Bacchiani, Björn Hoffmeister, Michael L. Seltzer, Heiga Zen, Mehrez Souden:
Speech Processing for Digital Home Assistants: Combining signal processing with deep-learning techniques. IEEE Signal Process. Mag. 36(6): 111-124 (2019) - [c58]Duc Le, Xiaohui Zhang, Weiyi Zheng, Christian Fügen, Geoffrey Zweig, Michael L. Seltzer:
From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech Recognition. ASRU 2019: 457-464 - [c57]Zhehuai Chen, Mahaveer Jain, Yongqiang Wang, Michael L. Seltzer, Christian Fuegen:
End-to-end Contextual Speech Recognition Using Class Language Models and a Token Passing Decoder. ICASSP 2019: 6186-6190 - [c56]Zhehuai Chen, Mahaveer Jain, Yongqiang Wang, Michael L. Seltzer, Christian Fuegen:
Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR. INTERSPEECH 2019: 3490-3494 - [i10]Duc Le, Xiaohui Zhang, Weiyi Zheng, Christian Fügen, Geoffrey Zweig, Michael L. Seltzer:
From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech Recognition. CoRR abs/1910.01493 (2019) - [i9]Yongqiang Wang, Abdelrahman Mohamed, Duc Le, Chunxi Liu, Alex Xiao, Jay Mahadeokar, Hongzhao Huang, Andros Tjandra, Xiaohui Zhang, Frank Zhang, Christian Fuegen, Geoffrey Zweig, Michael L. Seltzer:
Transformer-based Acoustic Modeling for Hybrid Speech Recognition. CoRR abs/1910.09799 (2019) - [i8]Duc Le, Thilo Köhler, Christian Fuegen, Michael L. Seltzer:
G2G: TTS-Driven Pronunciation Learning for Graphemic Hybrid ASR. CoRR abs/1910.12612 (2019) - [i7]Ching-Feng Yeh, Jay Mahadeokar, Kaustubh Kalgaonkar, Yongqiang Wang, Duc Le, Mahaveer Jain, Kjell Schubert, Christian Fuegen, Michael L. Seltzer:
Transformer-Transducer: End-to-End Speech Recognition with Self-Attention. CoRR abs/1910.12977 (2019) - [i6]Mahaveer Jain, Kjell Schubert, Jay Mahadeokar, Ching-Feng Yeh, Kaustubh Kalgaonkar, Anuroop Sriram, Christian Fuegen, Michael L. Seltzer:
RNN-T For Latency Controlled ASR With Improved Beam Search. CoRR abs/1911.01629 (2019) - [i5]Yi-Chen Chen, Zhaojun Yang, Ching-Feng Yeh, Mahaveer Jain, Michael L. Seltzer:
AIPNet: Generative Adversarial Pre-training of Accent-invariant Networks for End-to-end Speech Recognition. CoRR abs/1911.11935 (2019) - 2018
- [c55]Suyoun Kim, Michael L. Seltzer:
Towards Language-Universal End-to-End Speech Recognition. ICASSP 2018: 4914-4918 - [c54]Zhuo Chen, Takuya Yoshioka, Xiong Xiao, Linyu Li, Michael L. Seltzer, Yifan Gong:
Efficient Integration of Fixed Beamformers and Speech Separation Networks for Multi-Channel Far-Field Speech Separation. ICASSP 2018: 5384-5388 - [c53]Suyoun Kim, Michael L. Seltzer, Jinyu Li, Rui Zhao:
Improved Training for Online End-to-end Speech Recognition Systems. INTERSPEECH 2018: 2913-2917 - [i4]Zhehuai Chen, Mahaveer Jain, Yongqiang Wang, Michael L. Seltzer, Christian Fuegen:
End-to-end contextual speech recognition using class language models and a token passing decoder. CoRR abs/1812.02142 (2018) - 2017
- [j11]Wayne Xiong, Jasha Droppo, Xuedong Huang, Frank Seide, Michael L. Seltzer, Andreas Stolcke, Dong Yu, Geoffrey Zweig:
Toward Human Parity in Conversational Speech Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 25(12): 2410-2423 (2017) - [c52]Baolin Peng, Michael L. Seltzer, Y. C. Ju, Geoffrey Zweig, Kam-Fai Wong:
May I take your order? A Neural Model for Extracting Structured Information from Conversations. EACL (1) 2017: 450-459 - [c51]Tom Ko, Vijayaditya Peddinti, Daniel Povey, Michael L. Seltzer, Sanjeev Khudanpur:
A study on data augmentation of reverberant speech for robust speech recognition. ICASSP 2017: 5220-5224 - [c50]Jinyu Li, Michael L. Seltzer, Xi Wang, Rui Zhao, Yifan Gong:
Large-Scale Domain Adaptation via Teacher-Student Learning. INTERSPEECH 2017: 2386-2390 - [p3]Xiong Xiao, Shinji Watanabe, Hakan Erdogan, Michael I. Mandel, Liang Lu, John R. Hershey, Michael L. Seltzer, Guoguo Chen, Yu Zhang, Dong Yu:
Discriminative Beamforming with Phase-Aware Neural Networks for Speech Enhancement and Recognition. New Era for Robust Speech Recognition, Exploiting Deep Learning 2017: 79-104 - [i3]Jinyu Li, Michael L. Seltzer, Xi Wang, Rui Zhao, Yifan Gong:
Large-Scale Domain Adaptation via Teacher-Student Learning. CoRR abs/1708.05466 (2017) - [i2]Suyoun Kim, Michael L. Seltzer:
Towards Language-Universal End-to-End Speech Recognition. CoRR abs/1711.02207 (2017) - [i1]Suyoun Kim, Michael L. Seltzer, Jinyu Li, Rui Zhao:
Improved training for online end-to-end speech recognition systems. CoRR abs/1711.02212 (2017) - 2016
- [c49]Pegah Ghahremani, Jasha Droppo, Michael L. Seltzer:
Linearly augmented deep neural network. ICASSP 2016: 5085-5089 - [c48]Xiong Xiao, Shinji Watanabe, Hakan Erdogan, Liang Lu, John R. Hershey, Michael L. Seltzer, Guoguo Chen, Yu Zhang, Michael I. Mandel, Dong Yu:
Deep beamforming networks for multi-channel speech recognition. ICASSP 2016: 5745-5749 - [c47]Tasha Nagamine, Michael L. Seltzer, Nima Mesgarani:
On the Role of Nonlinear Transformations in Deep Neural Network Acoustic Models. INTERSPEECH 2016: 803-807 - 2015
- [j10]Chao Weng, Dong Yu, Michael L. Seltzer, Jasha Droppo:
Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 23(10): 1670-1679 (2015) - [c46]Yu Zhang, Dong Yu, Michael L. Seltzer, Jasha Droppo:
Speech recognition with prediction-adaptation-correction recurrent neural networks. ICASSP 2015: 5004-5008 - [c45]Ritwik Giri, Michael L. Seltzer, Jasha Droppo, Dong Yu:
Improving speech recognition in reverberation using a room-aware deep neural network and multi-task learning. ICASSP 2015: 5014-5018 - [c44]Tasha Nagamine, Michael L. Seltzer, Nima Mesgarani:
Exploring how deep neural networks form phonemic categories. INTERSPEECH 2015: 1912-1916 - 2014
- [c43]Hyunson Seo, Hong-Goo Kang, Michael L. Seltzer:
Factored adaptation of speaker and environment using orthogonal subspace transforms. ICASSP 2014: 3251-3255 - [c42]Chao Weng, Dong Yu, Michael L. Seltzer, Jasha Droppo:
Single-channel mixed speech recognition using deep neural networks. ICASSP 2014: 5632-5636 - [c41]Yan Huang, Malcolm Slaney, Michael L. Seltzer, Yifan Gong:
Towards better performance with heterogeneous training data in acoustic modeling using deep neural networks. INTERSPEECH 2014: 845-849 - [c40]Malcolm Slaney, Michael L. Seltzer:
The influence of pitch and noise on the discriminability of filterbank features. INTERSPEECH 2014: 2263-2267 - [c39]Dong Yu, Adam Eversole, Michael L. Seltzer, Kaisheng Yao, Brian Guenter, Oleksii Kuchaiev, Frank Seide, Huaming Wang, Jasha Droppo, Zhiheng Huang, Geoffrey Zweig, Christopher J. Rossbach, Jon Currey:
An introduction to computational networks and the computational network toolkit (invited talk). INTERSPEECH 2014 - 2013
- [c38]Samuel Thomas, Michael L. Seltzer, Kenneth Church, Hynek Hermansky:
Deep neural network features and semi-supervised training for low resource speech recognition. ICASSP 2013: 6704-6708 - [c37]Michael L. Seltzer, Jasha Droppo:
Multi-task learning in deep neural networks for improved phoneme recognition. ICASSP 2013: 6965-6969 - [c36]Michael L. Seltzer, Dong Yu, Yongqiang Wang:
An investigation of deep neural networks for noise robust speech recognition. ICASSP 2013: 7398-7402 - [c35]Li Deng, Jinyu Li, Jui-Ting Huang, Kaisheng Yao, Dong Yu, Frank Seide, Michael L. Seltzer, Geoffrey Zweig, Xiaodong He, Jason D. Williams, Yifan Gong, Alex Acero:
Recent advances in deep learning for speech research at Microsoft. ICASSP 2013: 8604-8608 - [c34]Dong Yu, Michael L. Seltzer, Jinyu Li, Jui-Ting Huang, Frank Seide:
Feature Learning in Deep Neural Networks - A Study on Speech Recognition Tasks. ICLR 2013 - 2012
- [c33]Jinyu Li, Michael L. Seltzer, Yifan Gong:
Improvements to VTS feature enhancement. ICASSP 2012: 4677-4680 - [c32]Michael L. Seltzer, Alex Acero:
Factored adaptation using a combination of feature-space and model-space transforms. INTERSPEECH 2012: 1792-1795 - [c31]Jinyu Li, Michael L. Seltzer, Yifan Gong:
Efficient VTS Adaptation Using Jacobian Approximation. INTERSPEECH 2012: 1906-1909 - [p2]Michael L. Seltzer:
Acoustic Model Training for Robust Speech Recognition. Techniques for Noise Robustness in Automatic Speech Recognition 2012: 347-368 - 2011
- [j9]Michael L. Seltzer, Yun-Cheng Ju, Ivan Tashev, Ye-Yi Wang, Dong Yu:
In-Car Media Search. IEEE Signal Process. Mag. 28(4): 50-60 (2011) - [c30]Michael L. Seltzer, Alex Acero:
Factored adaptation for separable compensation of speaker and environmental variability. ASRU 2011: 146-151 - [c29]Flavio P. Ribeiro, Dinei A. F. Florêncio, Cha Zhang, Michael L. Seltzer:
CROWDMOS: An approach for crowdsourcing mean opinion score studies. ICASSP 2011: 2416-2419 - [c28]Xing Fan, Michael L. Seltzer, Jasha Droppo, Henrique S. Malvar, Alex Acero:
Joint encoding of the waveform and speech recognition features using a transform codec. ICASSP 2011: 5148-5151 - [c27]Dong Yu, Michael L. Seltzer:
Improved Bottleneck Features Using Pretrained Deep Neural Networks. INTERSPEECH 2011: 237-240 - [c26]Michael L. Seltzer, Alex Acero:
Separating Speaker and Environmental Variability Using Factored Transforms. INTERSPEECH 2011: 1097-1100 - 2010
- [j8]Ozlem Kalinli, Michael L. Seltzer, Jasha Droppo, Alex Acero:
Noise Adaptive Training for Robust Automatic Speech Recognition. IEEE Trans. Speech Audio Process. 18(8): 1889-1901 (2010) - [c25]Michael L. Seltzer, Alex Acero, Kaustubh Kalgaonkar:
Acoustic model adaptation via Linear Spline Interpolation for robust speech recognition. ICASSP 2010: 4550-4553 - [c24]Michael L. Seltzer, Alex Acero:
HMM adaptation using linear spline interpolation with integrated spline parameter training for robust speech recognition. INTERSPEECH 2010: 1664-1667 - [c23]Li Deng, Michael L. Seltzer, Dong Yu, Alex Acero, Abdel-rahman Mohamed, Geoffrey E. Hinton:
Binary coding of speech spectrograms using a deep auto-encoder. INTERSPEECH 2010: 1692-1695
2000 – 2009
- 2009
- [c22]Kaustubh Kalgaonkar, Michael L. Seltzer, Alex Acero:
Noise robust model adaptation using linear spline interpolation. ASRU 2009: 199-204 - [c21]Michael L. Seltzer, Lei Zhang:
The data deluge: Challenges and opportunities of unlimited data in statistical signal processing. ICASSP 2009: 3701-3704 - [c20]Ozlem Kalinli, Michael L. Seltzer, Alex Acero:
Noise adaptive training using a vector taylor series approach for noise robust automatic speech recognition. ICASSP 2009: 3825-3828 - [c19]Young-In Song, Ye-Yi Wang, Yun-Cheng Ju, Michael L. Seltzer, Ivan Tashev, Alex Acero:
Voice search of structured media data. ICASSP 2009: 3941-3944 - [c18]Yun-Cheng Ju, Michael L. Seltzer, Ivan Tashev:
Improving perceived accuracy for in-car media search. INTERSPEECH 2009: 979-982 - 2008
- [c17]Ivan Tashev, Jasha Droppo, Michael L. Seltzer, Alex Acero:
Robust design of wideband loudspeaker arrays. ICASSP 2008: 381-384 - [c16]Graham W. Taylor, Michael L. Seltzer, Alex Acero:
Maximum a posteriori ICA: Applying prior knowledge to the separation of acoustic sources. ICASSP 2008: 1821-1824 - [c15]Jasha Droppo, Michael L. Seltzer, Alex Acero, Yu-Hsiang Bosco Chiu:
Towards a non-parametric acoustic model: an acoustic decision tree for observation probability calculation. INTERSPEECH 2008: 289-292 - 2007
- [j7]Amarnag Subramanya, Michael L. Seltzer, Alejandro Acero:
Automatic Removal of Typed Keystrokes From Speech Signals. IEEE Signal Process. Lett. 14(5): 363-366 (2007) - [j6]Michael L. Seltzer, Alex Acero:
Training Wideband Acoustic Models Using Mixed-Bandwidth Training Data for Speech Recognition. IEEE Trans. Speech Audio Process. 15(1): 235-245 (2007) - [c14]Michael L. Seltzer, Ivan Tashev, Alex Acero:
Microphone Array Post-Filter using Incremental Bayes Learning to Track the Spatial Distributions of Speech and Noise. ICASSP (1) 2007: 29-32 - [c13]Michael L. Seltzer, Yun-Cheng Ju, Ivan Tashev, Alex Acero:
Robust location understanding in spoken dialog systems using intersections. INTERSPEECH 2007: 2813-2816 - [c12]Ivan Tashev, Michael L. Seltzer, Yun-Cheng Ju, Dong Yu, Alex Acero:
Commute UX: Telephone Dialog System for Location-based Services. SIGdial 2007: 87-94 - 2006
- [j5]Michael L. Seltzer, Richard M. Stern:
Subband Likelihood-Maximizing Beamforming for Speech Recognition in Reverberant Environments. IEEE Trans. Speech Audio Process. 14(6): 2109-2121 (2006) - [c11]Amarnag Subramanya, Michael L. Seltzer, Alex Acero:
Automatic removal of typed keystrokes from speech signals. INTERSPEECH 2006 - 2005
- [c10]Michael L. Seltzer, Alex Acero:
Training Wideband Acoustic Models using Mixed-Bandwidth Training Data via Feature Bandwidth Extension. ICASSP (1) 2005: 921-924 - [c9]Michael L. Seltzer, Alex Acero, Jasha Droppo:
Robust bandwidth extension of noise-corrupted narrowband speech. INTERSPEECH 2005: 1509-1512 - [p1]Bhiksha Raj, Michael L. Seltzer, Manuel Jesus Reyes-Gomez:
Speech Recognizer Based Maximum Likelihood Beamforming. Speech Separation by Humans and Machines 2005: 65-82 - 2004
- [j4]Bhiksha Raj, Michael L. Seltzer, Richard M. Stern:
Reconstruction of missing features for robust speech recognition. Speech Commun. 43(4): 275-296 (2004) - [j3]Michael L. Seltzer, Bhiksha Raj, Richard M. Stern:
A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition. Speech Commun. 43(4): 379-393 (2004) - [j2]Michael L. Seltzer, Bhiksha Raj, Richard M. Stern:
Likelihood-maximizing beamforming for robust hands-free speech recognition. IEEE Trans. Speech Audio Process. 12(5): 489-498 (2004) - [c8]Michael L. Seltzer, Richard M. Stern:
Parameter sharing in subband likelihood-maximizing beamforming for speech recognition using microphone arrays. ICASSP (1) 2004: 881-884 - 2003
- [j1]Michael L. Seltzer, Bhiksha Raj:
Speech-recognizer-based filter optimization for microphone array processing. IEEE Signal Process. Lett. 10(3): 69-71 (2003) - [c7]Michael L. Seltzer, Richard M. Stern:
Subband parameter optimization of microphone arrays for speech recognition in reverberant environments. ICASSP (1) 2003: 408-411 - [c6]Michael L. Seltzer, Jasha Droppo, Alex Acero:
A harmonic-model-based front end for robust speech recognition. INTERSPEECH 2003: 1277-1280 - 2002
- [c5]Michael L. Seltzer, Bhiksha Raj, Richard M. Stern:
Speech recognizer-based microphone array processing for robust hands-free speech recognition. ICASSP 2002: 897-900 - 2001
- [c4]Rita Singh, Michael L. Seltzer, Bhiksha Raj, Richard M. Stern:
Speech in Noisy Environments: robust automatic segmentation, feature extraction, and hypothesis combination. ICASSP 2001: 273-276 - [c3]Michael L. Seltzer, Bhiksha Raj:
Calibration of microphone arrays for improved speech recognition. INTERSPEECH 2001: 1005-1008 - 2000
- [c2]Bhiksha Raj, Michael L. Seltzer, Richard M. Stern:
Reconstruction of damaged spectrographic features for robust speech recognition. INTERSPEECH 2000: 357-360 - [c1]Michael L. Seltzer, Bhiksha Raj, Richard M. Stern:
Classifier-based mask estimation for missing feature methods of robust speech recognition. INTERSPEECH 2000: 538-541
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-10-07 21:21 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint