skip to main content
research-article

Isolated Sign Language Recognition with Grassmann Covariance Matrices

Published: 07 May 2016 Publication History

Abstract

In this article, to utilize long-term dynamics over an isolated sign sequence, we propose a covariance matrix--based representation to naturally fuse information from multimodal sources. To tackle the drawback induced by the commonly used Riemannian metric, the proximity of covariance matrices is measured on the Grassmann manifold. However, the inherent Grassmann metric cannot be directly applied to the covariance matrix. We solve this problem by evaluating and selecting the most significant singular vectors of covariance matrices of sign sequences. The resulting compact representation is called the Grassmann covariance matrix. Finally, the Grassmann metric is used to be a kernel for the support vector machine, which enables learning of the signs in a discriminative manner. To validate the proposed method, we collect three challenging sign language datasets, on which comprehensive evaluations show that the proposed method outperforms the state-of-the-art methods both in accuracy and computational cost.

References

[1]
Vincent Arsigny, Pierre Fillard, Xavier Pennec, and Nicholas Ayache. 2007. Geometric means in a novel vector space structure on symmetric positive-definite matrices. SIAM Journal on Matrix Analysis and Applications 29, 1, 328--347.
[2]
Immanuel Bayer and Thierry Silbermann. 2013. A multi modal approach to gesture recognition from audio and video data. In Proceedings of the 15th ACM International Conference on Multimodal Interaction (ICMI’13). ACM, New York, NY, 461--466.
[3]
Sait Celebi, Ali Selman Aydin, Talha Tarik Temiz, and Tarik Arici. 2013. Gesture recognition using skeleton data with weighted dynamic time warping. In Proceedings of the 8th International Joint Conference on Computer Vision, Imaging, and Computer Graphics Theory and Applications (VISAPP’13). 620--625.
[4]
Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 3, 27.
[5]
China-Deaf-Assoc. 2003. Chinese Sign Language (in Chinese). Huaxia Publishing House. ISBN: 9787508030050
[6]
Helen Cooper, Brian Holt, and Richard Bowden. 2011. Sign language recognition. In Visual Analysis of Humans. Springer, 539--562.
[7]
N. Dalal and B. Triggs. 2005. Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, Vol. 1. IEEE, Los Alamitos, CA, 886--893.
[8]
Ali Erol, George Bebis, Mircea Nicolescu, Richard D. Boyle, and Xander Twombly. 2007. Vision-based hand pose estimation: A review. Computer Vision and Image Understanding 108, 1C2, 52--73.
[9]
Sergio Escalera, Jordi Gonzàlez, Xavier Baró, Miguel Reyes, Oscar Lopes, Isabelle Guyon, Vassilis Athitsos, and Hugo Escalante. 2013. Multi-modal gesture recognition challenge 2013: Dataset and results. In Proceedings of the 15th International Conference on Multimodal Interaction (ICMI’13). ACM, New York, NY, 445--452.
[10]
Jens Forster, Christoph Schmidt, Oscar Koller, Martin Bellgardt, and Hermann Ney. 2014. Extensions of the sign language recognition and translation corpus RWTH-PHOENIX-weather. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14). 1911--1916.
[11]
Wen Gao, Gaolin Fang, Debin Zhao, and Yiqiang Chen. 2004. Transition movement models for large vocabulary continuous sign language recognition. In Proceedings of the 6th IEEE International Conference on Automatic Face and Gesture Recognition. IEEE, Los Alamitos, CA, 553--558.
[12]
Gene H. Golub and Charles F. Van Loan. 2012. Matrix Computations. Vol. 3. JHU Press. ISBN: 9781421407944
[13]
Jihun Hamm and Daniel D. Lee. 2008. Grassmann discriminant analysis: A unifying view on subspace-based learning. In Proceedings of the 25th International Conference on Machine Learning (ICML’08). 376--383.
[14]
Mehrtash Tafazzoli Harandi, Conrad Sanderson, Arnold Wiliem, and Brian C. Lovell. 2012. Kernel analysis over Riemannian manifolds for visual recognition of actions, pedestrians and textures. In Proceedings of the Workshop on the Applications of Computer Vision (WACV’12). 433--439.
[15]
Jie Huang, Wengang Zhou, Houqiang Li, and Weiping Li. 2015. Sign language recognition using 3D convolutional neural networks. In Proceedings of the 2015 IEEE International Conference on Multimedia and Expo (ICME’15). IEEE, Los Alamitos, CA, 1--6.
[16]
Alexander Klaser, Marcin Marszalek, and Cordelia Schmid. 2008. A spatio-temporal descriptor based on 3D-gradients. In Proceedings of the British Machine Vision Conference (BMVC’08). 275:1--275:10.
[17]
Oscar Koller, Jens Forster, and Hermann Ney. 2015. Continuous sign language recognition: Towards large vocabulary statistical recognition systems handling multiple signers. Computer Vision and Image Understanding 141, 108--125.
[18]
W. W. Kong and S. Ranganath. 2014. Towards subject independent continuous sign language recognition: A segment and merge approach. Pattern Recognition 47, 3, 1294--1308.
[19]
Ivan Laptev. 2005. On space-time interest points. International Journal of Computer Vision 64, 2--3, 107--123.
[20]
Pengfei Lu and Matt Huenerfauth. 2014. Collecting and evaluating the CUNY ASL corpus for research on American sign language animation. Computer Speech and Language 28, 3, 812--831.
[21]
Marta Magariños, Marta Milo, and Isabel Varela-Nieto. 2015. Editorial: Aging, neurogenesis and neuroinflammation in hearing loss and protection. Frontiers in Aging Neuroscience 7, 1--2.
[22]
Natalia Neverova, Christian Wolf, Giulio Paci, Giacomo Sommavilla, Graham W. Taylor, and Florian Nebout. 2013. A multi-scale approach to gesture detection and recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW’13). IEEE, Los Alamitos, CA, 484--491.
[23]
Eng-Jon Ong, Helen Cooper, Nicolas Pugeault, and Richard Bowden. 2012. Sign language recognition using sequential pattern trees. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). IEEE, Los Alamitos, CA, 2200--2207.
[24]
Eng-Jon Ong, Oscar Koller, Nicolas Pugeault, and Richard Bowden. 2014. Sign spotting using hierarchical sequential patterns with temporal intervals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1923--1930.
[25]
Sylvie C. W. Ong and Suhas Ranganath. 2005. Automatic sign language analysis: A survey and the future beyond lexical meaning. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 6, 873--891.
[26]
Xavier Pennec, Pierre Fillard, and Nicholas Ayache. 2006. A Riemannian framework for tensor computing. International Journal of Computer Vision 66, 1, 41--66.
[27]
Vassilis Pitsikalis, Athanasios Katsamanis, Stavros Theodorakis, and Petros Maragos. 2015. Multimodal gesture recognition via multiple hypotheses rescoring. Journal of Machine Learning Research 16, 1, 255--284.
[28]
Vassilis Pitsikalis, Stavros Theodorakis, Christian Vogler, and Petros Maragos. 2011. Advances in phonetics-based sub-unit modeling for transcription alignment and sign language recognition. In Proceedings of the 2011 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’11). IEEE, Los Alamitos, CA, 1--6.
[29]
R.-H. Liang and M. Ouhyoung. 1996. A sign language recognition system using hidden Markov model and context sensitive search. In Proceedings of the ACM Symposium on Virtual Reality and Technology. 59--66.
[30]
Andres Sanin, Conrad Sanderson, Mehrtash T. Harandi, and Brian C. Lovell. 2013. Spatio-temporal covariance descriptors for action and gesture recognition. In Proceedings of the 2013 IEEE Workshop on Applications of Computer Vision (WACV’13). IEEE, Los Alamitos, CA, 103--110.
[31]
Paul Scovanner, Saad Ali, and Mubarak Shah. 2007. A 3-dimensional sift descriptor and its application to action recognition. In Proceedings of the 15th International Conference on Multimedia. ACM, New York, NY, 357--360.
[32]
Yu Lin Shen. 1998. Shouyu Xingzhi Fenxin {Analyzing qualities of sign language}. Teshu Jiaoyu Yanjiu {Research on Special Education} 2, 6--10.
[33]
Jamie Shotton, Toby Sharp, Alex Kipman, Andrew Fitzgibbon, Mark Finocchio, Andrew Blake, Mat Cook, and Richard Moore. 2013. Real-time human pose recognition in parts from single depth images. Communications of the ACM 56, 1, 116--124.
[34]
Thad Starner, Joshua Weaver, and Alex Pentland. 1998. Real-time American sign language recognition using desk and wearable computer based video. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 12, 1371--1375.
[35]
Martin L. A. Sternberg. 1998. American Sign Language. HarperCollins. ISBN: 978-0062716088
[36]
William C. Stokoe. 2005. Sign language structure: An outline of the visual communication systems of the American deaf. Journal of Deaf Studies and Deaf Education 10, 1, 3--37.
[37]
Oncel Tuzel, Fatih Porikli, and Peter Meer. 2006. Region covariance: A fast descriptor for detection and classification. In Proceedings of the 9th European Conference on Computer Vision (ECCV’06). 589--600.
[38]
Oncel Tuzel, Fatih Porikli, and Peter Meer. 2008. Pedestrian detection via classification on Riemannian manifolds. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 10, 1713--1727.
[39]
Raviteja Vemulapalli, Jaishanker K. Pillai, and Rama Chellappa. 2013. Kernel learning for extrinsic classification of manifold features. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13). 1782--1789.
[40]
Ulrich Von Agris, Moritz Knorr, and Karl-Friedrich Kraiss. 2008. The significance of facial features for automatic sign language recognition. In Proceedings of the 8th IEEE International Conference on Automatic Face and Gesture Recognition (FG’08). IEEE, Los Alamitos, CA, 1--6.
[41]
Chunli Wang, Wen Gao, and Shiguang Shan. 2002. An approach based on phonemes to large vocabulary Chinese sign language recognition. In Proceedings of the 5th IEEE International Conference on Automatic Face and Gesture Recognition. IEEE, Los Alamitos, CA, 411--416.
[42]
Jiang Wang, Zicheng Liu, Ying Wu, and Junsong Yuan. 2012b. Mining actionlet ensemble for action recognition with depth cameras. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). 1290--1297.
[43]
Ruiping Wang, Huimin Guo, Larry S. Davis, and Qionghai Dai. 2012a. Covariance discriminative learning: A natural and efficient approach to image set classification. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). 2496--2503.
[44]
Geert Willems, Tinne Tuytelaars, and Luc Van Gool. 2008. An efficient dense and scale-invariant spatio-temporal interest point detector. In Computer Vision—ECCV 2008. Lecture Notes in Computer Science, Vol. 5303. Springer, 650--663.
[45]
Yung-Chow Wong. 1967. Differential geometry of Grassmann manifolds. Proceedings of the National Academy of Sciences of the United States of America 57, 3, 589.
[46]
Jiaxiang Wu, Jian Cheng, Chaoyang Zhao, and Hanqing Lu. 2013. Fusing multi-modal features for gesture recognition. In Proceedings of the 15th ACM International Conference on Multimodal Interaction (ICMI’13). ACM, New York, NY, 453--460.
[47]
Chunyan Xu, Tianjiang Wang, Junbin Gao, Shougang Cao, Wenbing Tao, and Fang Liu. 2014. An ordered-patch-based image classification approach on the image Grassmannian manifold. IEEE Transactions on Neural Networks and Learning Systems 25, 4, 728--737.
[48]
Shengye Yan, Shiguang Shan, Xilin Chen, and Wen Gao. 2008. Locally assembled binary (LAB) feature with feature-centric cascade for fast and accurate face detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’08). IEEE, Los Alamitos, CA, 1--7.
[49]
H.-D. Yang, S. Sclaroff, and S.-W. Lee. 2009. Sign language spotting with a threshold model based on conditional random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 7, 1264--1277.
[50]
Jun Hui Yang and Susan Fischer. 2002. Expressing negation in Chinese sign language. Sign Language and Linguistics 5, 2, 167--202.
[51]
Zahoor Zafrulla, Helene Brashear, Thad Starner, Harley Hamilton, and Peter Presti. 2011. American sign language recognition with the Kinect. In Proceedings of the 13th International Conference on Multimodal Interfaces. ACM, New York, NY, 279--286.
[52]
Zahoor Zafrulla, Helene Brashear, Pei Yin, Peter Presti, Thad Starner, and Harley Hamilton. 2010. American sign language phrase verification in an educational game for deaf children. In Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR’10). IEEE, Los Alamitos, CA, 3846--3849.
[53]
Ming-Zhong Zhou and Li-Xin Ji. 2010. Real-time endpoint detection algorithm combining time-frequency domain. In Proceedings of the 2010 2nd International Workshop on Intelligent Systems and Applications (ISA’10). IEEE, Los Alamitos, CA, 1--4.

Cited By

View all

Index Terms

  1. Isolated Sign Language Recognition with Grassmann Covariance Matrices

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Accessible Computing
    ACM Transactions on Accessible Computing  Volume 8, Issue 4
    May 2016
    80 pages
    ISSN:1936-7228
    EISSN:1936-7236
    DOI:10.1145/2905046
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 May 2016
    Accepted: 01 February 2016
    Revised: 01 February 2016
    Received: 01 May 2015
    Published in TACCESS Volume 8, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Grassmann manifold
    2. Hearing loss
    3. covariance matrix
    4. sign language

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Microsoft Research Asia and the Natural Science Foundation of China
    • Infotech Oulu
    • Academy of Finland
    • Fidipro Program of Tekes

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)27
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 31 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media