research-article

PGNet: Progressive Feature Guide Learning Network for Three-dimensional Shape Recognition

Authors:

An-An LiuAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 17, Issue 3

Article No.: 87, Pages 1 - 17

https://doi.org/10.1145/3443708

Published: 22 July 2021 Publication History

Abstract

Three-dimensional (3D) shape recognition is a popular topic and has potential application value in the field of computer vision. With the recent proliferation of deep learning, various deep learning models have achieved state-of-the-art performance. Among them, multiview-based 3D shape representation has received increased attention in recent years, and related approaches have shown significant improvement in 3D shape recognition. However, these methods focus on feature learning based on the design of the network and ignore the correlation among views. In this article, we propose a novel progressive feature guide learning network (PGNet) that focuses on the correlation among multiple views and integrates multiple modalities for 3D shape recognition. In particular, we propose two information fusion schemes from visual and feature aspects. The visual fusion scheme focuses on the view level and employs the soft-attention model to define the weights of views for visual information fusion. The feature fusion scheme focuses on the feature dimension information and employs the quantified feature as the mask to further optimize the feature. These two schemes jointly construct a PGNet for 3D shape representation. The classic ModelNet40 and ShapeNetCore55 datasets are applied to demonstrate the performance of our approach. The corresponding experiment also demonstrates the superiority of our approach.

References

[1]

Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller. 2015. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE International Conference on Computer Vision. 945–953.

Digital Library

[2]

Chu Wang, Marcello Pelillo, and Kaleem Siddiqi. 2019. Dominant set clustering and pooling for multi-view 3d object recognition. arXiv:1906.01592. Retrieved from https://arxiv.org/abs/1906.01592.

[3]

Z. Zhang, H. Lin, X. Zhao, R. Ji, and Y. Gao. 2018. Inductive multi-hypergraph learning and its application on view-based 3d object classification. IEEE Trans. Image Process. 27, 12 (Dec. 2018), 5957–5968.

Digital Library

[4]

Jianwen Jiang, Di Bao, Ziqiang Chen, Xibin Zhao, and Yue Gao. 2019. MLVCNN: Multi-loop-view convolutional neural network for 3D shape retrieval. Proceedings of the AAAI Conference on Artificial Intelligence 33, 01 (2019), 8513–8520.

[5]

Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 652–660.

[6]

Wei-Zhi Nie, An-An Liu, Yue Gao, and Yu-Ting Su. 2018. Hyper-clique graph matching and applications. IEEE Trans. Circ. Syst. Vid. Technol. 29, 6 (2018), 1619–1630.

[7]

Yu-Ting Su, Yu-Qian Li, Wei-Zhi Nie, Dan Song, and An-An Liu. 2019. Joint heterogeneous feature learning and distribution alignment for 2D image-based 3D object retrieval. IEEE Transactions on Circuits and Systems for Video Technology 30, 10 (2019), 3765–3776.

[8]

Richard Socher, Brody Huval, Bharath Putta Bath, Christopher D. Manning, and Andrew Y. Ng. 2012. Convolutional-recursive deep learning for 3d object classification. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’12). 665–673.

[9]

Zhizhong Han, Zhenbao Liu, Junwei Han, Chi Man Vong, Shuhui Bu, and C. L. Philip Chen. 2017. Unsupervised learning of 3-d local features from raw voxels based on a novel permutation voxelization strategy. IEEE Trans. Cybernet.99 (2017), 1–14.

[10]

Zhizhong Han, Zhenbao Liu, Chi-Man Vong, Yu-Shen Liu, Shuhui Bu, Junwei Han, and C. L. Philip Chen. 2018. Deep spatiality: Unsupervised learning of spatially-enhanced global and local 3D features by deep neural network with coupled softmax. IEEE Trans. Image Process. 27, 6 (2018), 3049–3063.

[11]

Yutong Feng, Yifan Feng, Haoxuan You, Xibin Zhao, and Yue Gao. 2018. MeshNet: Mesh neural network for 3d shape representation. arxiv:1811.11424. Retrieved from http://arxiv.org/abs/1811.11424.

[12]

Mohcine Bouksim, F. Rafii Zakani, K. Arhid, M. Aboulfatah, and T. Gadi. 2018. New approach for 3D Mesh Retrieval using data envelopment analysis. Int. J. Intell. Eng. Syst. 11, 1 (2018), 98–107.

[13]

Hiroharu Kato, Yoshitaka Ushiku, and Tatsuya Harada. 2018. Neural 3d mesh renderer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3907–3916.

[14]

Ran Song and Liping Wang. 2019. Multiscale representation of 3d surfaces via stochastic mesh laplacian. Comput.-Aid. Des. 115 (2019), 98–110.

[15]

Konstantinos Sfikas, Theoharis Theoharis, and Ioannis Pratikakis. 2017. Exploiting the PANORAMA Representation for convolutional neural network classification and retrieval. In Proceedings of the Eurographics Workshop on 3D Object Retrieval, Ioannis Pratikakis, Florent Dupont, and Maks Ovsjanikov (Eds.). The Eurographics Association.

[16]

Chao Ma, Yulan Guo, Jungang Yang, and Wei An. 2018. Learning multi-view representation with LSTM for 3-D shape recognition and retrieval. IEEE Trans. Multimedia 21, 5 (2018), 1169–1182.

Digital Library

[17]

Xinwei He, Yang Zhou, Zhichao Zhou, Song Bai, and Xiang Bai. 2018. Triplet-center loss for multi-view 3d object retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1945–1954.

[18]

Alexander Grabner, Peter M. Roth, and Vincent Lepetit. 2018. 3d pose estimation and 3d model retrieval for objects in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3022–3031.

[19]

Yifan Feng, Zizhao Zhang, Xibin Zhao, Rongrong Ji, and Yue Gao. 2018. GVCNN: Group-view convolutional neural networks for 3D shape recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 264–272.

[20]

Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, and Tian Xia. 2017. Multi-view 3d object detection network for autonomous driving. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 1907–1915.

[21]

Haoxuan You, Yifan Feng, Rongrong Ji, and Yue Gao. 2018. Pvnet: A joint convolutional network of point cloud and multi-view for 3d shape recognition. In Proceedings of the 26th ACM International Conference on Multimedia. 1310–1318.

Digital Library

[22]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097–1105.

Digital Library

[23]

Panagiotis Papadakis, Ioannis Pratikakis, Stavros Perantonis, and Theoharis Theoharis. 2007. Efficient 3D shape matching and retrieval using a concrete radialized spherical projection representation. Pattern Recogn. 40, 9 (2007), 2437–2452.

Digital Library

[24]

Pankaj Malhotra, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal. 2015. Long short term memory networks for anomaly detection in time series. In Proceedings, Vol. 89. Presses universitaires de Louvain, 89–94.

[25]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473. Retrieved from https://arxiv.org/abs/1409.0473.

[26]

Liang-Chieh Chen, Yi Yang, Jiang Wang, Wei Xu, and Alan L. Yuille. 2016. Attention to scale: Scale-aware semantic image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3640–3649.

[27]

Michael Kazhdan, Thomas Funkhouser, and Szymon Rusinkiewicz. 2003. Rotation invariant spherical harmonic representation of 3 d shape descriptors. In Proceedings of the Symposium on Geometry Processing, Vol. 6. 156–164.

[28]

Ding-Yun Chen, Xiao-Pei Tian, Yu-Te Shen, and Ming Ouhyoung. 2003. On visual similarity based 3D model retrieval. In Computer Graphics Forum, Vol. 22. Wiley Online Library, 223–232.

[29]

Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, Jianxiong Xiao, Zhirong Wu, Shuran Song, and Aditya Khosla. 2015. 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[30]

Michael Allen, Lewis Girod, Ryan Newton, Samuel Madden, and Deborah Estrin. 2008. VoxNet: An interactive, rapidly-deployable acoustic monitoring platform. In Proceedings of the International Conference on Information Processing in Sensor Networks.

Digital Library

[31]

Andrew Brock, Theodore Lim, J. M. Ritchie, and Nick Weston. 2016. Generative and discriminative voxel modeling with convolutional neural networks. arXiv:1608.04236. Retrieved from http://arxiv.org/abs/1608.04236.

[32]

Charles R. Qi, Hao Su, Matthias Nießner, Angela Dai, Mengyuan Yan, and Leonidas J. Guibas. 2016. Volumetric and multi-view cnns for object classification on 3d data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5648–5656.

[33]

Qian Yu, Chengzhuan Yang, Honghui Fan, and Hui Wei. 2020. Latent-MVCNN: 3D shape recognition using multiple views from pre-defined or random viewpoints. Neural Processing Letters 52 (2020), 581–602.

[34]

Yanxin Ma, Bin Zheng, Yulan Guo, Yinjie Lei, and Jun Zhang. 2017. Boosting multi-view convolutional neural networks for 3d object recognition via view saliency. In Proceedings of the Chinese Conference on Image and Graphics Technologies. Springer, 199–209.

[35]

Asako Kanezaki, Yasuyuki Matsushita, and Yoshifumi Nishida. 2018. Rotationnet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5010–5019.

[36]

Zizhao Zhang, Haojie Lin, Xibin Zhao, Rongrong Ji, and Yue Gao. 2018. Inductive multi-hypergraph learning and its application on view-based 3D object classification. IEEE Trans. Image Process. 27, 12 (2018), 5957–5968.

Digital Library

[37]

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J. Guibas. 2017. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems. 5099–5108.

[38]

Roman Klokov and Victor Lempitsky. 2017. Escape from cells: Deep kd-networks for the recognition of 3d point cloud models. In Proceedings of the IEEE International Conference on Computer Vision. 863–872.

[39]

Yangyan Li, Rui Bu, Mingchao Sun, and Baoquan Chen. 2018. PointCNN. arXiv:1801.07791. Retrieved from https://arxiv.org/abs/1801.07791.

[40]

Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. 2018. Dynamic graph cnn for learning on point clouds. arXiv:1801.07829. Retrieved from https://arxiv.org/abs/1801.07829.

[41]

Haoxuan You, Yifan Feng, Rongrong Ji, and Yue Gao. 2018. Pvnet: A joint convolutional network of point cloud and multi-view for 3d shape recognition. In Proceedings of the 26th ACM International Conference on Multimedia. 1310–1318.

Digital Library

[42]

Xinwei He, Tengteng Huang, Song Bai, and Xiang Bai. 2019. View n-gram network for 3D object retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 7515–7524.

[43]

Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, Jianxiong Xiao, Zhirong Wu, Shuran Song, and Aditya Khosla. 2015. 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition.

[44]

Manolis Savva, Fisher Yu, Hao Su, M. Aono, B. Chen, D. Cohen-Or, W. Deng, Hang Su, Song Bai, Xiang Bai, et al. 2016. Shrec16 track: largescale 3d shape retrieval from shapenet core55. In Proceedings of the Eurographics Workshop on 3D Object Retrieval. 89–98.

[45]

Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. 2019. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. 38, 5 (2019), 1–12.

Digital Library

[46]

Manolis Savva and Yu Fisher. 2017. SHREC’17 Track large-scale 3d shape retrieval from shapenet core55. In Proceedings of the Eurographics Workshop on 3D Object Retrieval. 5010–5019.

Cited By

Li ZSeah HGuo BYang M(2024)MLGPnetComputer Vision and Image Understanding10.1016/j.cviu.2023.103904239:COnline publication date: 12-Apr-2024
https://dl.acm.org/doi/10.1016/j.cviu.2023.103904
Cao JLiao S(2023)PVFANJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23280045:5(8119-8133)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.3233/JIFS-232800
Chen DKong DLi JWang SYin B(2023)ADOSMNet: a novel visual affordance detection network with object shape mask guided feature encodersMultimedia Tools and Applications10.1007/s11042-023-16898-283:11(31629-31653)Online publication date: 18-Sep-2023
https://doi.org/10.1007/s11042-023-16898-2

Index Terms

PGNet: Progressive Feature Guide Learning Network for Three-dimensional Shape Recognition
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Redundancy
  2. Embedded and cyber-physical systems
    1. Embedded systems
    2. Robotics
2. Networks
  1. Network properties
    1. Network reliability

Recommendations

Exploring Deep Learning for View-Based 3D Model Retrieval

In recent years, view-based 3D model retrieval has become one of the research focuses in the field of computer vision and machine learning. In fact, the 3D model retrieval algorithm consists of feature extraction and similarity measurement, and the ...
PVFNet: Point-View Fusion Network for 3D Shape Recognition
Knowledge Science, Engineering and Management
Abstract
3D object recognition has enjoyed much of research attention in the machine vision filed. Deep learning methods for 3D shape recognition such as the multi-view based methods and the point cloud based methods have achieved the state-of-the-art ...
PVFAN: Point-view fusion attention network for 3D shape recognition

3D shape recognition is a critical research topic in the field of computer vision, attracting substantial attention. Existing approaches mainly focus on extracting distinctive 3D shape features; however, they often neglect the model’s robustness and lack ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 17, Issue 3

August 2021

443 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3476118

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2021 Association for Computing Machinery.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 July 2021

Accepted: 01 December 2020

Revised: 01 December 2020

Received: 01 June 2020

Published in TOMM Volume 17, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

National Natural Science Foundation of China
The Natural Science Foundation of Tianjin

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
157
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)1

Reflects downloads up to 05 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li ZSeah HGuo BYang M(2024)MLGPnetComputer Vision and Image Understanding10.1016/j.cviu.2023.103904239:COnline publication date: 12-Apr-2024
https://dl.acm.org/doi/10.1016/j.cviu.2023.103904
Cao JLiao S(2023)PVFANJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23280045:5(8119-8133)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.3233/JIFS-232800
Chen DKong DLi JWang SYin B(2023)ADOSMNet: a novel visual affordance detection network with object shape mask guided feature encodersMultimedia Tools and Applications10.1007/s11042-023-16898-283:11(31629-31653)Online publication date: 18-Sep-2023
https://doi.org/10.1007/s11042-023-16898-2
Zuo KSu X(2022)Three-Dimensional Action Recognition for Basketball Teaching Coupled with Deep Neural NetworkElectronics10.3390/electronics1122379711:22(3797)Online publication date: 18-Nov-2022
https://doi.org/10.3390/electronics11223797

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Issue’s Table of Contents