skip to main content
10.5555/1732323.1732353guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A comprehensive audio-visual corpus for teaching sound persian phoneme articulation

Published: 11 October 2009 Publication History

Abstract

Building an audio-visual data corpus is one significant step in audio-visual research. One of the most challenging tasks in computer science is computer-aided speech therapy and language learning. Developing computer applications for training and rehabilitation of the handicapped and helping the hearing and speaking-impaired by facial speech synthesis are among the most helpful, state-of-the-art roles of computer technology in today's human-machine interacting systems. To date, there have been no audio-visual corpora in Persian language, in that it makes it difficult or even impossible for researchers to carry out studies in the area. This paper gives an indication of the collected Persian audio-visual data corpus. AVA is a comprehensive, systematic collection of both continuous speech and isolated spoken utterances in Persian language. The goal of this project is to facilitate audio-visual research in the language through this data corpus which is available upon request.

References

[1]
E. Bailly-Bailliere, S. Bengio, F. Bimbot, M. Hamouz, J. Kittler, J. Mariethoz, J. Matas, K. Messer, V. Popovici, F. Poree, B. Ruiz, and J. P. Thiran. "The BANCA Database and Evaluation Protocol" In Proceedings of Audio and Video Based Biometric Person Authentication, Springer Berlin/Heidelberg, Volume 2688, pp. 625- 638, 2003.
[2]
Bench J, Daly N, Dayle J, Lind C. "Choosing talkers for the BKB/A Speechreading Test: a procedure with observations on talker age and gender", British Journal of Audiology. Volume 29, Issue 3, pp. 172-187, Jun 1995.I. S. Jacobs and C. P. Bean," Fine particles, thin films and exchange anisotropy," in Magnetism, vol. III, G. T. Rado and H. Suhl, Eds. New York: Academic, 1963, pp. 271-350.
[3]
C. C. Chibelushi, F. Deravi, J. S. D. Mason, "Survey of audio visual speech databases", Tech. Rep., Department of Electrical and Electronic Engineering, University of Wales, Swansea, UK, 1996.
[4]
Alin G. Chinu and Leon J. M. Rothkrantz. "Building a Data Corpus for Audio-Visual Speech Recognition" AGC, pp. 88-92, April 2007.
[5]
P. Cisar, M. Zelezny, Z. Krnoul, J. Kanis, J. Zelinka, L. Müller "Design and recording of Czech speech corpus for audio-visual continuous speech recognition". Proceedings of the Auditory-Visual Speech Processing International Conference 2005, AVSP2005, p. 1-4, Vancouver Island, 2005.
[6]
Dr Speech, A Training software system, http://www.drspeech.com.
[7]
O. Engwall, O. Bälter, A. -M. Öster and H. Kjellström "Designing the human-machine interface of the computer-based speech training system ARTUR based on early user tests". Behaviour and Information Technology. Volume 25, Issue 4, pp. 353-365. 2006.
[8]
T. Ezzat and T. Poggio, "Visual Speech Synthesis by Morphing Visemes", International Journal of Computer Vision, Volume 38, pp: 45- 57, 2000.
[9]
Gita Movalleli "Sara Lip-Reading Test: Construction, Evaluation and operating on a group of people with hearing disorder" MSc Thesis, Department of Rehabilitation in Tehran University of medical science, In Persian, 2002.
[10]
R. Goecke, J. B. Millar, A. Zelinsky, and J. Robert-Ribes, "A detailed description of the AVOZES data corpus", in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '01), pp. 486-491, Salt Lake City, Utah, USA, May 2001.
[11]
R. Goecke, J. B. Millar, "The Audio-Video Australian English Speech Data Corpus AVOZES" Proceedings of the 8th International Conference on Spoken Language Processing, ICSLP2004, Volume III, pp. 2525- 2528, 2004.
[12]
M. Grimm and S. Narayanan, "The Vera am Mittag German audio-visual emotional speech database", ICME 2008, IEEE, pp. 865-868, April 2008.
[13]
C. Jahani, "The Glottal Plosive: A Phoneme in Spoken Modern Persian or Not?" In Éva Ágnes Csató, Bo Isaksson, and Carina Jahani. Linguistic Convergence and Areal Diffusion: Case studies from Iranian, Semitic and Turkic. London: RoutledgeCurzon. pp. 79-96. ISBN 0-415-30804- 6, 2005.
[14]
H. Kjellstrom, O. Engwall, "Audiovisual-to-articulatory inversion". Speech Communication, Volume 51, Issue 3, pp. 195-202, 2009.
[15]
Peter Ladefoged, "Vowels and Consonants", Blackwell Publishers pub, 2nd. Ed, ISBN: 978-1-4051-2458-4.1, 2004.
[16]
B. Lee, M. Hasegawa-Johnson, C. Goudeseune, S. Kamdar, S. Borys, M. Liu, T. Huang "AVICAR: Audio- Visual Speech Corpus in a Car Environment" In Proceedings of International Conference on Spoken Language Processing - INTERSPEECH, Jeju Island, Korea, October 4- 8, 2004.
[17]
L. Liangi, Y. Luo, F. Huang and Nefian, A. V, "A multi-stream audio-video large-vocabulary mandarin Chinese speech database" In IEEE International Conference on Multimedia and Expo, Volume 3, pp. 1787 - 1790, 2004.
[18]
Lynn K. Marassa and Charissa R. Lansing "Visual Word Recognition in Two Facial Motion Conditions: full-face versus Lip plus Mandible" Journal of speech and hearing Research. Dec; 38(6): pp. 1387-94. 1995.
[19]
K. Messer, J. Matas, J. Kittler, J. Luettin, and G Maitre, "XM2VTSDB: the extended M2VTS database, " in Proceedings of the 2nd International Conference on Audio-and Video-Based Biometric Person Authentication", (AVBPA '99), pp. 72-77, Washington, DC, USA, March 1999.
[20]
J. B. Millar, M. Wagner, and R. Goecke, "Aspects of Speaking-Face Data Corpus Design Methodology" In Proc. 8th Int. Conf. Spoken Language Processing ICSLP, Volume II, Jeju, Korea, pp. 1157-1160, 2004.
[21]
D. Mostefa, N. Moreau, K. Choukri, G. Potamianos, S. M. Chu, A. Tyagi, J. R. Casas, J. Turmo, L. Christoforetti, F. Tobia, A. Pnevmatikakis, V. Mylonakis, F. Talantzis, S. Burger, R. Stiefelhagen, K. Bernardin, and C. Rochet, "The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms", Journal of Language Resources and Evaluation, Springer, Volume 41, pp: 389-407, 2008.
[22]
E. Patterson, S. Gurbuz, Z. Tufekci, and J. N. Gowdy, "Moving-talker, speaker-independent feature study, and baseline results using the CUAVE multimodal speech corpus" EURASIP Journal on Applied Signal Processing, Volume 2002, pp. 1189-1201, 2002.
[23]
E. Patterson, S. Gurbuz, Z. Tufekci, and J. N. Gowdy, "CUAVE: a new audio-visual database for multimodal human computer-interface research," in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '02), Volume. 2, pp. 2017-2020, Orlando, Fla, USA, May 2002.
[24]
V. Pera, A. Moura, D. Freitas, "LPFAV2: a New Multi-Modal Database for Developing Speech Recognition Systems for an Assistive Technology Application", SPECOM'2004: 9th Conference Speech and Computer St. Petersburg, Russia September 20-22, 2004.
[25]
S. Pigeon and L. Vandendorpe, "The m2vts multimodal face database (release 1.00). AVBPA '97:", Proceedings of the First International Conference on Audio- and Video-Based Biometric Person Authentication, pp. 403-409, London, UK. Springer-Verlag, 1997.
[26]
J. Trojanová, M. Hrúz, P. Campr, and M. elezn, "Design and Recording of Czech Audio-Visual Database with Impaired Conditions for Continuous Speech Recognition" Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), 2008.
[27]
J. C. WojdelÔ, P. Wiggers, and L. J. M. Rothkrantz, "An audiovisual corpus for multimodal speech recognition in Dutch language", Proceedings of the International Conference on Spoken Language Processing, ICSLP2002 Denver CO, USA, September, pp. 1917-1920, 2002.
[28]
Yadollah Samareh, Persian phonetics, Markaze nashre daneshgahi pub, Tehran, in Persian, 1998.
[29]
T. Yotsukura, S. Nakamura and S. Morishima, "Construction of audio-visual speech corpus using motion-capture system and corpus based facial animation", The IEICE Transaction on Information and System E 88-D, 11, pp. 2377-2483, 2005.
[30]
I. Kirschning, T. Toledo, "Language Training for Hearing Impaired Children with CSLU Vocabulary Tutor", Journal: WSEAS Transactions on Information Science and Applications, Issue 1, Vol. 1, pp. 20-25. July 2004.
[31]
International Phonetic Association. "Handbook of the International Phonetic Association: A guide to the use of the International Phonetic Alphabet", Cambridge: Cambridge University Press. pp. 124-125. ISBN 978-0521637510, 1999.
[32]
R. Möttönen, J-L. Olivés, J. Kulju and M. Sams Parameterized Visual Speech Synthesis and Its Evaluation. Tampere, Finland, Proc. of EUSIPCO 2000.
[33]
Cohen, Michael M., Beskow, Jonas, and Massaro, Dominic W. "Recent Developments in facial animation: An inside view". In proceedings of auditory visual speechperception, Pages 201-206. Terrigal-Sydney Australia, December, 1998.
[34]
L. Bernstein, M. Goldstein, J. Mashie, "Speech training aids for hearing-impaired individuals", Journal of Rehabilitation Research and Development, Volume 25, pp.53-62, 1988.
[35]
G. Zoric and S.I. Pandic, "Real-time language independent lip synchronization method using a genetic algorithm", Signal Processing, Volume 86, Issue 12, pp: 3644-3656, 2006.

Cited By

View all
  • (2010)A novel multimedia educational speech therapy system for hearing impaired childrenProceedings of the Advances in multimedia information processing, and 11th Pacific Rim conference on Multimedia: Part II10.5555/1894049.1894117(705-715)Online publication date: 21-Sep-2010
  • (2010)The persian linguistic based audio-visual data corpus, AVA II, considering coarticulationProceedings of the 16th international conference on Advances in Multimedia Modeling10.1007/978-3-642-11301-7_30(284-294)Online publication date: 6-Jan-2010
  • (2009)Persian Viseme Classification for Developing Visual Speech Training ApplicationProceedings of the 10th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing10.1007/978-3-642-10467-1_104(1080-1085)Online publication date: 15-Dec-2009
  1. A comprehensive audio-visual corpus for teaching sound persian phoneme articulation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    SMC'09: Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
    October 2009
    5232 pages
    ISBN:9781424427932

    Publisher

    IEEE Press

    Publication History

    Published: 11 October 2009

    Author Tags

    1. audio visual data
    2. corpus design
    3. speech therapy

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 03 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2010)A novel multimedia educational speech therapy system for hearing impaired childrenProceedings of the Advances in multimedia information processing, and 11th Pacific Rim conference on Multimedia: Part II10.5555/1894049.1894117(705-715)Online publication date: 21-Sep-2010
    • (2010)The persian linguistic based audio-visual data corpus, AVA II, considering coarticulationProceedings of the 16th international conference on Advances in Multimedia Modeling10.1007/978-3-642-11301-7_30(284-294)Online publication date: 6-Jan-2010
    • (2009)Persian Viseme Classification for Developing Visual Speech Training ApplicationProceedings of the 10th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing10.1007/978-3-642-10467-1_104(1080-1085)Online publication date: 15-Dec-2009

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media