skip to main content
research-article

Multi-feature Fusion VoteNet for 3D Object Detection

Published: 27 January 2022 Publication History

Abstract

In this article, we propose a Multi-feature Fusion VoteNet (MFFVoteNet) framework for improving the 3D object detection performance in cluttered and heavily occluded scenes. Our method takes the point cloud and the synchronized RGB image as inputs to provide object detection results in 3D space. Our detection architecture is built on VoteNet with three key designs. First, we augment the VoteNet input with point color information to enhance the difference of various instances in a scene. Next, we integrate an image feature module into the VoteNet to provide a strong object class signal that can facilitate deterministic detections in occlusion. Moreover, we propose a Projection Non-Maximum Suppression (PNMS) method in 3D object detection to eliminate redundant proposals and hence provide more accurate positioning of 3D objects. We evaluate the proposed MFFVoteNet on two challenging 3D object detection datasets, i.e., ScanNetv2 and SUN RGB-D. Extensive experiments show that our framework can effectively improve the performance of 3D object detection.

References

[1]
Navaneeth Bodla, Bharat Singh, Rama Chellappa, and Larry S. Davis. 2017. Soft-NMS–improving object detection with one line of code. In Proceedings of the IEEE International Conference on Computer Vision. 5561–5569.
[2]
Jintai Chen, Biwen Lei, Qingyu Song, Haochao Ying, Danny Z. Chen, and Jian Wu. 2020. A hierarchical graph network for 3D object detection on point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 392–401.
[3]
Xiaozhi Chen, Kaustav Kundu, Ziyu Zhang, Huimin Ma, Sanja Fidler, and Raquel Urtasun. 2016. Monocular 3D object detection for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2147–2156.
[4]
Xiaozhi Chen, Kaustav Kundu, Yukun Zhu, Andrew G. Berneshawi, Huimin Ma, Sanja Fidler, and Raquel Urtasun. 2015. 3D object proposals for accurate object class detection. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 424–432.
[5]
Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, and Tian Xia. 2017. Multi-view 3D object detection network for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1907–1915.
[6]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248–255.
[7]
Bertram Drost and Slobodan Ilic. 2012. 3D object detection and localization using multimodal point pair features. In Proceedings of the 2nd International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission. IEEE, 9–16.
[8]
Pablo Espinace, Thomas Kollar, Alvaro Soto, and Nicholas Roy. 2010. Indoor scene recognition through object detection. In Proceedings of the IEEE International Conference on Robotics and Automation. IEEE, 1406–1413.
[9]
Georgios Georgakis, Arsalan Mousavian, Alexander C. Berg, and Jana Kosecka. 2017. Synthesizing training data for object detection in indoor scenes. arXiv preprint arXiv:1702.07836 (2017).
[10]
C. M. Gevaert, C. Persello, R. Sliuzas, and G. Vosselman. 2017. Informal settlement classification using point-cloud and image-based features from UAV data. ISPRS J. Photogram. Rem. Sens. 125 (2017), 225–236.
[11]
Wei Han, Pooya Khorrami, Tom Le Paine, Prajit Ramachandran, Mohammad Babaeizadeh, Honghui Shi, Jianan Li, Shuicheng Yan, and Thomas S. Huang. 2016. Seq-NMS for video object detection. arXiv preprint arXiv:1602.08465 (2016).
[12]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 2961–2969.
[13]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[14]
Chaoqun Hong, Jun Yu, Jian Wan, Dacheng Tao, and Meng Wang. 2015. Multimodal deep autoencoder for human pose recovery. IEEE Trans. Image Process. 24, 12 (2015), 5659–5670.
[15]
Ji Hou, Angela Dai, and Matthias Nießner. 2019. 3D-SIS: 3D semantic instance segmentation of RGB-D scans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4421–4430.
[16]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.
[17]
Haowei Ji and Xianqi Luo. 2019. 3D scene reconstruction of landslide topography based on data fusion between laser point cloud and UAV image. Environ. Earth Sci. 78, 17 (2019), 1–12.
[18]
Jean Lahoud and Bernard Ghanem. 2017. 2D-driven 3D object detection in RGB-D images. In Proceedings of the IEEE International Conference on Computer Vision. 4622–4630.
[19]
Bo Li, Tianlei Zhang, and Tian Xia. 2016. Vehicle detection from 3D lidar using fully convolutional network. arXiv preprint arXiv:1608.07916 (2016).
[20]
Dahua Lin, Sanja Fidler, and Raquel Urtasun. 2013. Holistic scene understanding for 3D object detection with RGB-D cameras. In Proceedings of the IEEE International Conference on Computer Vision. 1417–1424.
[21]
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2117–2125.
[22]
Weiping Liu, Jia Sun, Wanyi Li, Ting Hu, and Peng Wang. 2019. Deep learning on point clouds and its application: A survey. Sensors 19, 19 (2019), 4188. DOI:https://doi.org/10.3390/s19194188
[23]
Antoine Mauri, Redouane Khemmar, Benoit Decoux, Nicolas Ragot, Romain Rossi, Rim Trabelsi, Rémi Boutteau, Jean-Yves Ertaud, and Xavier Savatier. 2020. Deep learning for real-time 3D multi-object detection, localisation, and tracking: Application to smart mobility. Sensors 20, 2 (2020), 532. DOI:https://doi.org/10.3390/s20020532
[24]
Gabriel Meynet, Yana Nehmé, Julie Digne, and Guillaume Lavoué. 2020. PCQM: A full-reference quality metric for colored 3D point clouds. In Proceedings of the 12th International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 1–6.
[25]
Guan Pang and Ulrich Neumann. 2016. 3D point cloud object detection with multi-view convolutional neural network. In Proceedings of the 23rd International Conference on Pattern Recognition (ICPR). IEEE, 585–590.
[26]
Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. 2017. Colored point cloud registration revisited. In Proceedings of the IEEE International Conference on Computer Vision. 143–152.
[27]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 8026–8037.
[28]
Charles R. Qi, Xinlei Chen, Or Litany, and Leonidas J. Guibas. 2020. Imvotenet: Boosting 3d object detection in point clouds with image votes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4404–4413.
[29]
Charles R. Qi, Or Litany, Kaiming He, and Leonidas J. Guibas. 2019. Deep Hough voting for 3D object detection in point clouds. In Proceedings of the IEEE International Conference on Computer Vision. 9277–9286.
[30]
Charles R. Qi, Wei Liu, Chenxia Wu, Hao Su, and Leonidas J. Guibas. 2018. Frustum pointnets for 3D object detection from RGB-D data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 918–927.
[31]
Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017. Pointnet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 652–660.
[32]
Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J. Guibas. 2017. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Proceedings of the Conference on Advances in neural information processing systems. 5099–5108.
[33]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779–788.
[34]
Zhile Ren and Erik B. Sudderth. 2016. Three-dimensional object detection and layout prediction using clouds of oriented gradients. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1525–1533.
[35]
Henry Schneiderman and Takeo Kanade. 2000. A statistical method for 3D object detection applied to faces and cars. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition.IEEE, 746–751.
[36]
Shaoshuai Shi, Xiaogang Wang, and Hongsheng Li. 2019. PointRCNN: 3D object proposal generation and detection from point cloud. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–779.
[37]
Shuran Song and Jianxiong Xiao. 2016. Deep sliding shapes for amodal 3D object detection in RGB-D images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 808–816.
[38]
Jun Wang, Dongxiao Gu, Zeyun Yu, Changbai Tan, and Laishui Zhou. 2012. A framework for 3D model reconstruction in reverse engineering. Comput. Industr. Eng. 63, 4 (2012), 1189–1200.
[39]
Jun Wang, Qian Xie, Yabin Xu, Laishui Zhou, and Nan Ye. 2016. Cluttered indoor scene modeling via functional part-guided graph matching. Comput-aided Geom. Des. 43 (2016), 82–94.
[40]
Jun Wang, Kai Xu, Ligang Liu, Junjie Cao, Shengjun Liu, Zeyun Yu, and Xianfeng David Gu. 2013. Consolidation of low-quality point clouds from outdoor scenes. In Computer Graphics Forum, Vol. 32. Wiley Online Library, 207–216.
[41]
Jun Wang, Zeyun Yu, Weizhong Zhang, Mingqiang Wei, Changbai Tan, Ning Dai, and Xi Zhang. 2014. Robust reconstruction of 2D curves from scattered noisy point data. Comput.-aided Des. 50 (2014), 27–40.
[42]
Jun Wang, Z. Yu, W Zhu, and J. Cao. 2013. Feature-preserving surface reconstruction from unoriented, noisy point data. In Computer Graphics Forum, Vol. 32. Wiley Online Library, 164–176.
[43]
Tim Wengefeld, Benjamin Lewandowski, Daniel Seichter, Lennard Pfennig, and Horst-Michael Gross. 2019. Real-time person orientation estimation using colored pointclouds. In Proceedings of the European Conference on Mobile Robots (ECMR). IEEE, 1–7.
[44]
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV). 3–19.
[45]
Yu Xiang, Roozbeh Mottaghi, and Silvio Savarese. 2014. Beyond PASCAL: A benchmark for 3D object detection in the wild. In IEEE Winter Conference on Applications of Computer Vision. IEEE, 75–82.
[46]
Qian Xie, Yu-Kun Lai, Jing Wu, Zhoutao Wang, Yiming Zhang, Kai Xu, and Jun Wang. 2020. MLVCNet: Multi-level context votenet for 3D object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10447–10456.
[47]
Qian Xie, Oussama Remil, Yanwen Guo, Meng Wang, Mingqiang Wei, and Jun Wang. 2017. Object detection and tracking under occlusion for object-level RGB-D video segmentation. IEEE Trans. Multim. 20, 3 (2017), 580–592.
[48]
Yi Yang, Jingkuan Song, Zi Huang, Zhigang Ma, Nicu Sebe, and Alexander G. Hauptmann. 2012. Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Trans. Multim. 15, 3 (2012), 572–581.
[49]
Cheng Yi, Yuan Zhang, Qiaoyun Wu, Yabin Xu, Oussama Remil, Mingqiang Wei, and Jun Wang. 2017. Urban building reconstruction from raw LiDAR point data. Comput.-aided Des. 93 (2017), 1–14.
[50]
Li Yi, Wang Zhao, He Wang, Minhyuk Sung, and Leonidas J. Guibas. 2019. GSPN: Generative shape proposal network for 3D instance segmentation in point cloud. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3947–3956.
[51]
Jun Yu, Yong Rui, and Bo Chen. 2013. Exploiting click constraints and multi-view features for image re-ranking. IEEE Trans. Multim. 16, 1 (2013), 159–168.
[52]
Jun Yu, Yong Rui, and Dacheng Tao. 2014. Click prediction for web image reranking using multimodal sparse coding. IEEE Trans. Image Process. 23, 5 (2014), 2019–2032.
[53]
Jun Yu, Dacheng Tao, Meng Wang, and Yong Rui. 2014. Learning to rank using user clicks and visual features for image retrieval. IEEE Trans. Cyber. 45, 4 (2014), 767–779.
[54]
Jie Zhang, Junjie Cao, Xiuping Liu, Jun Wang, Jian Liu, and Xiquan Shi. 2013. Point cloud normal estimation via low-rank subspace clustering. Comput. Graph. 37, 6 (2013), 697–706.
[55]
Yin Zhou and Oncel Tuzel. 2018. VoxelNet: End-to-end learning for point cloud based 3D object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4490–4499.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 1
January 2022
517 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3505205
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 January 2022
Accepted: 01 April 2021
Revised: 01 March 2021
Received: 01 December 2020
Published in TOMM Volume 18, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Images
  2. point cloud
  3. 3D object detection
  4. multi-feature fusion
  5. occlusion

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • National Key Research and Development Program of China
  • Aeronautical Science Foundation of China
  • National Natural Science Foundation of China
  • Natural Science Foundation of Jiangsu Province

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)277
  • Downloads (Last 6 weeks)16
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media