On the Soft Fusion of Probability Mass Functions for Multimodal Speech Processing

Kumar, D.; Vimal, P.; Hegde, Rajesh M.

doi:10.1155/2011/294010

Research Article
Open access
Published: 15 March 2011

On the Soft Fusion of Probability Mass Functions for Multimodal Speech Processing

D. Kumar¹,
P. Vimal¹ &
Rajesh M. Hegde¹

EURASIP Journal on Advances in Signal Processing volume 2011, Article number: 294010 (2011) Cite this article

1474 Accesses
Metrics details

Abstract

Multimodal speech processing has been a subject of investigation to increase robustness of unimodal speech processing systems. Hard fusion of acoustic and visual speech is generally used for improving the accuracy of such systems. In this paper, we discuss the significance of two soft belief functions developed for multimodal speech processing. These soft belief functions are formulated on the basis of a confusion matrix of probability mass functions obtained jointly from both acoustic and visual speech features. The first soft belief function (BHT-SB) is formulated for binary hypothesis testing like problems in speech processing. This approach is extended to multiple hypothesis testing (MHT) like problems to formulate the second belief function (MHT-SB). The two soft belief functions, namely, BHT-SB and MHT-SB are applied to the speaker diarization and audio-visual speech recognition tasks, respectively. Experiments on speaker diarization are conducted on meeting speech data collected in a lab environment and also on the AMI meeting database. Audiovisual speech recognition experiments are conducted on the GRID audiovisual corpus. Experimental results are obtained for both multimodal speech processing tasks using the BHT-SB and the MHT-SB functions. The results indicate reasonable improvements when compared to unimodal (acoustic speech or visual speech alone) speech processing.

Publisher note

To access the full article, please see PDF.

Author information

Authors and Affiliations

Department of Electrical Engineering, Indian Institute of Technology, Kanpur, 208016, India
D. Kumar, P. Vimal & Rajesh M. Hegde

Authors

D. Kumar
View author publications
You can also search for this author in PubMed Google Scholar
P. Vimal
View author publications
You can also search for this author in PubMed Google Scholar
Rajesh M. Hegde
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rajesh M. Hegde.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Kumar, D., Vimal, P. & Hegde, R.M. On the Soft Fusion of Probability Mass Functions for Multimodal Speech Processing. EURASIP J. Adv. Signal Process. 2011, 294010 (2011). https://doi.org/10.1155/2011/294010

Download citation

Received: 25 July 2010
Revised: 08 February 2011
Accepted: 02 March 2011
Published: 15 March 2011
DOI: https://doi.org/10.1155/2011/294010

On the Soft Fusion of Probability Mass Functions for Multimodal Speech Processing

Abstract

Publisher note

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords