Article

Semi-supervised multimodal deep learning for RGB-D object recognition

Authors:

Yong RuiAuthors Info & Claims

IJCAI'16: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence

Pages 3345 - 3351

Published: 09 July 2016 Publication History

Abstract

This paper studies the problem of RGB-D object recognition. Inspired by the great success of deep convolutional neural networks (DCNN) in AI, researchers have tried to apply it to improve the performance of RGB-D object recognition. However, DCNN always requires a large-scale annotated dataset to supervise its training. Manually labeling such a large RGB-D dataset is expensive and time consuming, which prevents DCNN from quickly promoting this research area. To address this problem, we propose a semi-supervised multimodal deep learning framework to train DCNN effectively based on very limited labeled data and massive unlabeled data. The core of our framework is a novel diversity preserving co-training algorithm, which can successfully guide DCNN to learn from the unlabeled RGB-D data by making full use of the complementary cues of the RGB and depth data in object representation. Experiments on the benchmark RGB-D dataset demonstrate that, with only 5% labeled training data, our approach achieves competitive performance for object recognition compared with those state-of-the-art results reported by fully-supervised methods.

References

[1]

Maria-Florina Balcan, Avrim Blum, and Ke Yang. Co-training and expansion: Towards bridging theory and practice. In NIPS , 2004.

Digital Library

[2]

Avrim Blum and Tom Mitchell. Combining labeled and unlabeled data with co-training. In COLT , 1998.

Digital Library

[3]

Manuel Blum, Jost Tobias Springenberg, Jan Wulfing, and Martin Riedmiller. A learned feature descriptor for object recognition in rgb-d data. In ICRA , 2012.

[4]

Liefeng Bo, Xiaofeng Ren, and Dieter Fox. Depth kernel descriptors for object recognition. In IROS , 2011.

[5]

Liefeng Bo, Xiaofeng Ren, and Dieter Fox. Hierarchical matching pursuit for image classification: architecture and fast algorithms. In NIPS , 2011.

Digital Library

[6]

Liefeng Bo, Xiaofeng Ren, and Dieter Fox. Unsupervised feature learning for rgb-d based object recognition. ISER, June , 2012.

[7]

Yanhua Cheng, Xin Zhao, Kaiqi Huang, and Tieniu Tan. Semi-supervised learning for rgb-d object recognition. In ICPR , 2014.

Digital Library

[8]

Yanhua Cheng, Rui Cai, Chi Zhang, Zhiwei Li, Xin Zhao, Kaiqi Huang, and Yong Rui. Query adaptive similarity measure for rgb-d object recognition. In ICCV , 2015.

Digital Library

[9]

Yanhua Cheng, Rui Cai, Xin Zhao, and Kaiqi Huang. Convolutional fisher kernels for rgb-d object recognition. In 3DV , 2015.

Digital Library

[10]

Yanhua Cheng, Xin Zhao, Kaiqi Huang, and Tieniu Tan. Semi-supervised learning and feature evaluation for rgb-d object recognition. Computer Vision and Image Understanding , 139:149-160, 2015.

Digital Library

[11]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR , 2009.

[12]

Andreas Eitel, Jost Tobias Springenberg, Luciano Spinello, Martin Riedmiller, and Wolfram Burgard. Multimodal deep learning for robust rgb-d object recognition. IROS , 2015.

[13]

Saurabh Gupta, Ross Girshick, Pablo Arbeláez, and Jitendra Malik. Learning rich features from rgb-d images for object detection and segmentation. In ECCV , 2014.

[14]

I-Hong Jhuo, Shenghua Gao, Liansheng Zhuang, DT Lee, and Yi Ma. Unsupervised feature learning for rgb-d image classification. In ACCV , 2015.

[15]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NIPS , 2012.

Digital Library

[16]

Kevin Lai, Liefeng Bo, Xiaofeng Ren, and Dieter Fox. A large-scale hierarchical multi-view rgbd object dataset. In ICRA , 2011.

[17]

Kevin Lai, Liefeng Bo, Xiaofeng Ren, and Dieter Fox. Sparse distance learning for object recognition combining rgb and depth information. In ICRA , 2011.

[18]

Danial Lashkari and Polina Golland. Convex clustering with exemplar-based models. In NIPS , 2007.

Digital Library

[19]

Carolina R.C., Roberto J. Lopez-Sastre, Javier Acevedo-Rodriguez, and Saturnino Maldonado-Bascon. Surfing the point clouds: selective 3d spatial pyramids for category-level object recognition. In CVPR , 2012.

[20]

Max Schwarz, Hannes Schulz, and Sven Behnke. Rgb-d object recognition and pose estimation based on pre-trained convolutional neural network features. In ICRA , 2015.

[21]

Richard Socher, Brody Huval, Bharath Bath, Christopher D Manning, and Andrew Ng. Convolutional-recursive deep learning for 3d object classification. In NIPS , 2012.

Digital Library

[22]

Anran Wang, Jianfei Cai, Jiwen Lu, and Tat-Jen Cham. Mmss: Multi-modal sharable and specific feature learning for rgb-d object recognition. In ICCV , 2015.

Digital Library

[23]

Jason Weston, Frédéric Ratle, Hossein Mobahi, and Ronan Collobert. Deep learning via semisupervised embedding. In Neural Networks: Tricks of the Trade , pages 639-655. Springer, 2012.

[24]

Xiaojin Zhu. Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison, 2005.

Cited By

Lv JLiu KHe SShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)Differentiated Learning for Multi-Modal Domain AdaptationProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475660(1322-1330)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475660
Chen DWang WGao WZhou Z(2018)Tri-net for semi-supervised deep learningProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304889.3304939(2014-2020)Online publication date: 13-Jul-2018
https://dl.acm.org/doi/10.5555/3304889.3304939
Wang ZLi WKao YZou DWang QAhn MHong S(2018)HCR-NetProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304415.3304559(1014-1020)Online publication date: 13-Jul-2018
https://dl.acm.org/doi/10.5555/3304415.3304559
Show More Cited By

Semi-supervised multimodal deep learning for RGB-D object recognition
1. Computing methodologies

Recommendations

Semi-supervised learning and feature evaluation for RGB-D object recognition

We propose a semi-supervised learning method for RGB-D object recognition.We propose CNN-SPM-RNN to extract powerful RGB-D features.An unbiased feature evaluation for recent RGB-D features are introduced. With new depth sensing technology such as Kinect ...
Deep Co-Training for Semi-Supervised Image Recognition
Computer Vision – ECCV 2018
Abstract
In this paper, we study the problem of semi-supervised image recognition, which is to learn classifiers using both labeled and unlabeled images. We present Deep Co-Training, a deep learning based method inspired by the Co-Training framework. The ...
Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

In multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

IJCAI'16: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence

July 2016

4277 pages

ISBN:9781577357704

Editor:
Gerhard Brewka
Leipzig University, Germany

Sponsors

Sony: Sony Corporation
Arizona State University: Arizona State University
Microsoft: Microsoft
Facebook: Facebook
AI Journal: AI Journal

Publisher

AAAI Press

Publication History

Published: 09 July 2016

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lv JLiu KHe SShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)Differentiated Learning for Multi-Modal Domain AdaptationProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475660(1322-1330)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475660
Chen DWang WGao WZhou Z(2018)Tri-net for semi-supervised deep learningProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304889.3304939(2014-2020)Online publication date: 13-Jul-2018
https://dl.acm.org/doi/10.5555/3304889.3304939
Wang ZLi WKao YZou DWang QAhn MHong S(2018)HCR-NetProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304415.3304559(1014-1020)Online publication date: 13-Jul-2018
https://dl.acm.org/doi/10.5555/3304415.3304559
Keren GMousa APietquin OZafeiriou SSchuller B(2018)Deep learning for multisensorial and multimodal interactionThe Handbook of Multimodal-Multisensor Interfaces10.1145/3107990.3107996(99-128)Online publication date: 1-Oct-2018
https://dl.acm.org/doi/10.1145/3107990.3107996
Wu SJi QWang SWong HYu ZXu Y(2018)Semi-Supervised Image Classification With Self-Paced Cross-Task NetworksIEEE Transactions on Multimedia10.1109/TMM.2017.275852220:4(851-865)Online publication date: 1-Apr-2018
https://dl.acm.org/doi/10.1109/TMM.2017.2758522
Du YWong YJin WWei WHu YKankanhalli MGeng W(2017)Semi-supervised learning for surface EMG-based gesture recognitionProceedings of the 26th International Joint Conference on Artificial Intelligence10.5555/3172077.3172113(1624-1630)Online publication date: 19-Aug-2017
https://dl.acm.org/doi/10.5555/3172077.3172113

View Options

View options

Figures

Tables

Media

View Table of Conten