skip to main content
research-article

PGNet: Progressive Feature Guide Learning Network for Three-dimensional Shape Recognition

Published: 22 July 2021 Publication History

Abstract

Three-dimensional (3D) shape recognition is a popular topic and has potential application value in the field of computer vision. With the recent proliferation of deep learning, various deep learning models have achieved state-of-the-art performance. Among them, multiview-based 3D shape representation has received increased attention in recent years, and related approaches have shown significant improvement in 3D shape recognition. However, these methods focus on feature learning based on the design of the network and ignore the correlation among views. In this article, we propose a novel progressive feature guide learning network (PGNet) that focuses on the correlation among multiple views and integrates multiple modalities for 3D shape recognition. In particular, we propose two information fusion schemes from visual and feature aspects. The visual fusion scheme focuses on the view level and employs the soft-attention model to define the weights of views for visual information fusion. The feature fusion scheme focuses on the feature dimension information and employs the quantified feature as the mask to further optimize the feature. These two schemes jointly construct a PGNet for 3D shape representation. The classic ModelNet40 and ShapeNetCore55 datasets are applied to demonstrate the performance of our approach. The corresponding experiment also demonstrates the superiority of our approach.

References

[1]
Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller. 2015. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE International Conference on Computer Vision. 945–953.
[2]
Chu Wang, Marcello Pelillo, and Kaleem Siddiqi. 2019. Dominant set clustering and pooling for multi-view 3d object recognition. arXiv:1906.01592. Retrieved from https://arxiv.org/abs/1906.01592.
[3]
Z. Zhang, H. Lin, X. Zhao, R. Ji, and Y. Gao. 2018. Inductive multi-hypergraph learning and its application on view-based 3d object classification. IEEE Trans. Image Process. 27, 12 (Dec. 2018), 5957–5968.
[4]
Jianwen Jiang, Di Bao, Ziqiang Chen, Xibin Zhao, and Yue Gao. 2019. MLVCNN: Multi-loop-view convolutional neural network for 3D shape retrieval. Proceedings of the AAAI Conference on Artificial Intelligence 33, 01 (2019), 8513–8520.
[5]
Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 652–660.
[6]
Wei-Zhi Nie, An-An Liu, Yue Gao, and Yu-Ting Su. 2018. Hyper-clique graph matching and applications. IEEE Trans. Circ. Syst. Vid. Technol. 29, 6 (2018), 1619–1630.
[7]
Yu-Ting Su, Yu-Qian Li, Wei-Zhi Nie, Dan Song, and An-An Liu. 2019. Joint heterogeneous feature learning and distribution alignment for 2D image-based 3D object retrieval. IEEE Transactions on Circuits and Systems for Video Technology 30, 10 (2019), 3765–3776.
[8]
Richard Socher, Brody Huval, Bharath Putta Bath, Christopher D. Manning, and Andrew Y. Ng. 2012. Convolutional-recursive deep learning for 3d object classification. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’12). 665–673.
[9]
Zhizhong Han, Zhenbao Liu, Junwei Han, Chi Man Vong, Shuhui Bu, and C. L. Philip Chen. 2017. Unsupervised learning of 3-d local features from raw voxels based on a novel permutation voxelization strategy. IEEE Trans. Cybernet.99 (2017), 1–14.
[10]
Zhizhong Han, Zhenbao Liu, Chi-Man Vong, Yu-Shen Liu, Shuhui Bu, Junwei Han, and C. L. Philip Chen. 2018. Deep spatiality: Unsupervised learning of spatially-enhanced global and local 3D features by deep neural network with coupled softmax. IEEE Trans. Image Process. 27, 6 (2018), 3049–3063.
[11]
Yutong Feng, Yifan Feng, Haoxuan You, Xibin Zhao, and Yue Gao. 2018. MeshNet: Mesh neural network for 3d shape representation. arxiv:1811.11424. Retrieved from http://arxiv.org/abs/1811.11424.
[12]
Mohcine Bouksim, F. Rafii Zakani, K. Arhid, M. Aboulfatah, and T. Gadi. 2018. New approach for 3D Mesh Retrieval using data envelopment analysis. Int. J. Intell. Eng. Syst. 11, 1 (2018), 98–107.
[13]
Hiroharu Kato, Yoshitaka Ushiku, and Tatsuya Harada. 2018. Neural 3d mesh renderer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3907–3916.
[14]
Ran Song and Liping Wang. 2019. Multiscale representation of 3d surfaces via stochastic mesh laplacian. Comput.-Aid. Des. 115 (2019), 98–110.
[15]
Konstantinos Sfikas, Theoharis Theoharis, and Ioannis Pratikakis. 2017. Exploiting the PANORAMA Representation for convolutional neural network classification and retrieval. In Proceedings of the Eurographics Workshop on 3D Object Retrieval, Ioannis Pratikakis, Florent Dupont, and Maks Ovsjanikov (Eds.). The Eurographics Association.
[16]
Chao Ma, Yulan Guo, Jungang Yang, and Wei An. 2018. Learning multi-view representation with LSTM for 3-D shape recognition and retrieval. IEEE Trans. Multimedia 21, 5 (2018), 1169–1182.
[17]
Xinwei He, Yang Zhou, Zhichao Zhou, Song Bai, and Xiang Bai. 2018. Triplet-center loss for multi-view 3d object retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1945–1954.
[18]
Alexander Grabner, Peter M. Roth, and Vincent Lepetit. 2018. 3d pose estimation and 3d model retrieval for objects in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3022–3031.
[19]
Yifan Feng, Zizhao Zhang, Xibin Zhao, Rongrong Ji, and Yue Gao. 2018. GVCNN: Group-view convolutional neural networks for 3D shape recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 264–272.
[20]
Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, and Tian Xia. 2017. Multi-view 3d object detection network for autonomous driving. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 1907–1915.
[21]
Haoxuan You, Yifan Feng, Rongrong Ji, and Yue Gao. 2018. Pvnet: A joint convolutional network of point cloud and multi-view for 3d shape recognition. In Proceedings of the 26th ACM International Conference on Multimedia. 1310–1318.
[22]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097–1105.
[23]
Panagiotis Papadakis, Ioannis Pratikakis, Stavros Perantonis, and Theoharis Theoharis. 2007. Efficient 3D shape matching and retrieval using a concrete radialized spherical projection representation. Pattern Recogn. 40, 9 (2007), 2437–2452.
[24]
Pankaj Malhotra, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal. 2015. Long short term memory networks for anomaly detection in time series. In Proceedings, Vol. 89. Presses universitaires de Louvain, 89–94.
[25]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473. Retrieved from https://arxiv.org/abs/1409.0473.
[26]
Liang-Chieh Chen, Yi Yang, Jiang Wang, Wei Xu, and Alan L. Yuille. 2016. Attention to scale: Scale-aware semantic image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3640–3649.
[27]
Michael Kazhdan, Thomas Funkhouser, and Szymon Rusinkiewicz. 2003. Rotation invariant spherical harmonic representation of 3 d shape descriptors. In Proceedings of the Symposium on Geometry Processing, Vol. 6. 156–164.
[28]
Ding-Yun Chen, Xiao-Pei Tian, Yu-Te Shen, and Ming Ouhyoung. 2003. On visual similarity based 3D model retrieval. In Computer Graphics Forum, Vol. 22. Wiley Online Library, 223–232.
[29]
Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, Jianxiong Xiao, Zhirong Wu, Shuran Song, and Aditya Khosla. 2015. 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[30]
Michael Allen, Lewis Girod, Ryan Newton, Samuel Madden, and Deborah Estrin. 2008. VoxNet: An interactive, rapidly-deployable acoustic monitoring platform. In Proceedings of the International Conference on Information Processing in Sensor Networks.
[31]
Andrew Brock, Theodore Lim, J. M. Ritchie, and Nick Weston. 2016. Generative and discriminative voxel modeling with convolutional neural networks. arXiv:1608.04236. Retrieved from http://arxiv.org/abs/1608.04236.
[32]
Charles R. Qi, Hao Su, Matthias Nießner, Angela Dai, Mengyuan Yan, and Leonidas J. Guibas. 2016. Volumetric and multi-view cnns for object classification on 3d data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5648–5656.
[33]
Qian Yu, Chengzhuan Yang, Honghui Fan, and Hui Wei. 2020. Latent-MVCNN: 3D shape recognition using multiple views from pre-defined or random viewpoints. Neural Processing Letters 52 (2020), 581–602.
[34]
Yanxin Ma, Bin Zheng, Yulan Guo, Yinjie Lei, and Jun Zhang. 2017. Boosting multi-view convolutional neural networks for 3d object recognition via view saliency. In Proceedings of the Chinese Conference on Image and Graphics Technologies. Springer, 199–209.
[35]
Asako Kanezaki, Yasuyuki Matsushita, and Yoshifumi Nishida. 2018. Rotationnet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5010–5019.
[36]
Zizhao Zhang, Haojie Lin, Xibin Zhao, Rongrong Ji, and Yue Gao. 2018. Inductive multi-hypergraph learning and its application on view-based 3D object classification. IEEE Trans. Image Process. 27, 12 (2018), 5957–5968.
[37]
Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J. Guibas. 2017. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems. 5099–5108.
[38]
Roman Klokov and Victor Lempitsky. 2017. Escape from cells: Deep kd-networks for the recognition of 3d point cloud models. In Proceedings of the IEEE International Conference on Computer Vision. 863–872.
[39]
Yangyan Li, Rui Bu, Mingchao Sun, and Baoquan Chen. 2018. PointCNN. arXiv:1801.07791. Retrieved from https://arxiv.org/abs/1801.07791.
[40]
Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. 2018. Dynamic graph cnn for learning on point clouds. arXiv:1801.07829. Retrieved from https://arxiv.org/abs/1801.07829.
[41]
Haoxuan You, Yifan Feng, Rongrong Ji, and Yue Gao. 2018. Pvnet: A joint convolutional network of point cloud and multi-view for 3d shape recognition. In Proceedings of the 26th ACM International Conference on Multimedia. 1310–1318.
[42]
Xinwei He, Tengteng Huang, Song Bai, and Xiang Bai. 2019. View n-gram network for 3D object retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 7515–7524.
[43]
Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, Jianxiong Xiao, Zhirong Wu, Shuran Song, and Aditya Khosla. 2015. 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition.
[44]
Manolis Savva, Fisher Yu, Hao Su, M. Aono, B. Chen, D. Cohen-Or, W. Deng, Hang Su, Song Bai, Xiang Bai, et al. 2016. Shrec16 track: largescale 3d shape retrieval from shapenet core55. In Proceedings of the Eurographics Workshop on 3D Object Retrieval. 89–98.
[45]
Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. 2019. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. 38, 5 (2019), 1–12.
[46]
Manolis Savva and Yu Fisher. 2017. SHREC’17 Track large-scale 3d shape retrieval from shapenet core55. In Proceedings of the Eurographics Workshop on 3D Object Retrieval. 5010–5019.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 3
August 2021
443 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3476118
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 July 2021
Accepted: 01 December 2020
Revised: 01 December 2020
Received: 01 June 2020
Published in TOMM Volume 17, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. 3D model retrieval
  2. multi-modal
  3. 3D shape recognition

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • National Natural Science Foundation of China
  • The Natural Science Foundation of Tianjin

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media