skip to main content
10.5555/3061053.3061089guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype

Semi-supervised multimodal deep learning for RGB-D object recognition

Published: 09 July 2016 Publication History


This paper studies the problem of RGB-D object recognition. Inspired by the great success of deep convolutional neural networks (DCNN) in AI, researchers have tried to apply it to improve the performance of RGB-D object recognition. However, DCNN always requires a large-scale annotated dataset to supervise its training. Manually labeling such a large RGB-D dataset is expensive and time consuming, which prevents DCNN from quickly promoting this research area. To address this problem, we propose a semi-supervised multimodal deep learning framework to train DCNN effectively based on very limited labeled data and massive unlabeled data. The core of our framework is a novel diversity preserving co-training algorithm, which can successfully guide DCNN to learn from the unlabeled RGB-D data by making full use of the complementary cues of the RGB and depth data in object representation. Experiments on the benchmark RGB-D dataset demonstrate that, with only 5% labeled training data, our approach achieves competitive performance for object recognition compared with those state-of-the-art results reported by fully-supervised methods.


Maria-Florina Balcan, Avrim Blum, and Ke Yang. Co-training and expansion: Towards bridging theory and practice. In NIPS , 2004.
Avrim Blum and Tom Mitchell. Combining labeled and unlabeled data with co-training. In COLT , 1998.
Manuel Blum, Jost Tobias Springenberg, Jan Wulfing, and Martin Riedmiller. A learned feature descriptor for object recognition in rgb-d data. In ICRA , 2012.
Liefeng Bo, Xiaofeng Ren, and Dieter Fox. Depth kernel descriptors for object recognition. In IROS , 2011.
Liefeng Bo, Xiaofeng Ren, and Dieter Fox. Hierarchical matching pursuit for image classification: architecture and fast algorithms. In NIPS , 2011.
Liefeng Bo, Xiaofeng Ren, and Dieter Fox. Unsupervised feature learning for rgb-d based object recognition. ISER, June , 2012.
Yanhua Cheng, Xin Zhao, Kaiqi Huang, and Tieniu Tan. Semi-supervised learning for rgb-d object recognition. In ICPR , 2014.
Yanhua Cheng, Rui Cai, Chi Zhang, Zhiwei Li, Xin Zhao, Kaiqi Huang, and Yong Rui. Query adaptive similarity measure for rgb-d object recognition. In ICCV , 2015.
Yanhua Cheng, Rui Cai, Xin Zhao, and Kaiqi Huang. Convolutional fisher kernels for rgb-d object recognition. In 3DV , 2015.
Yanhua Cheng, Xin Zhao, Kaiqi Huang, and Tieniu Tan. Semi-supervised learning and feature evaluation for rgb-d object recognition. Computer Vision and Image Understanding , 139:149-160, 2015.
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR , 2009.
Andreas Eitel, Jost Tobias Springenberg, Luciano Spinello, Martin Riedmiller, and Wolfram Burgard. Multimodal deep learning for robust rgb-d object recognition. IROS , 2015.
Saurabh Gupta, Ross Girshick, Pablo Arbeláez, and Jitendra Malik. Learning rich features from rgb-d images for object detection and segmentation. In ECCV , 2014.
I-Hong Jhuo, Shenghua Gao, Liansheng Zhuang, DT Lee, and Yi Ma. Unsupervised feature learning for rgb-d image classification. In ACCV , 2015.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NIPS , 2012.
Kevin Lai, Liefeng Bo, Xiaofeng Ren, and Dieter Fox. A large-scale hierarchical multi-view rgbd object dataset. In ICRA , 2011.
Kevin Lai, Liefeng Bo, Xiaofeng Ren, and Dieter Fox. Sparse distance learning for object recognition combining rgb and depth information. In ICRA , 2011.
Danial Lashkari and Polina Golland. Convex clustering with exemplar-based models. In NIPS , 2007.
Carolina R.C., Roberto J. Lopez-Sastre, Javier Acevedo-Rodriguez, and Saturnino Maldonado-Bascon. Surfing the point clouds: selective 3d spatial pyramids for category-level object recognition. In CVPR , 2012.
Max Schwarz, Hannes Schulz, and Sven Behnke. Rgb-d object recognition and pose estimation based on pre-trained convolutional neural network features. In ICRA , 2015.
Richard Socher, Brody Huval, Bharath Bath, Christopher D Manning, and Andrew Ng. Convolutional-recursive deep learning for 3d object classification. In NIPS , 2012.
Anran Wang, Jianfei Cai, Jiwen Lu, and Tat-Jen Cham. Mmss: Multi-modal sharable and specific feature learning for rgb-d object recognition. In ICCV , 2015.
Jason Weston, Frédéric Ratle, Hossein Mobahi, and Ronan Collobert. Deep learning via semisupervised embedding. In Neural Networks: Tricks of the Trade , pages 639-655. Springer, 2012.
Xiaojin Zhu. Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison, 2005.

Cited By

View all
  1. Semi-supervised multimodal deep learning for RGB-D object recognition



    Information & Contributors


    Published In

    cover image Guide Proceedings
    IJCAI'16: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence
    July 2016
    4277 pages


    • Sony: Sony Corporation
    • Arizona State University: Arizona State University
    • Microsoft: Microsoft
    • Facebook: Facebook
    • AI Journal: AI Journal


    AAAI Press

    Publication History

    Published: 09 July 2016


    • Article


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Jan 2025

    Other Metrics


    Cited By

    View all

    View Options

    View options






    Share this Publication link

    Share on social media