skip to main content
10.1145/3664647.3681324acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Part-level Reconstruction for Self-Supervised Category-level 6D Object Pose Estimation with Coarse-to-Fine Correspondence Optimization

Published: 28 October 2024 Publication History

Abstract

Self-supervised category-level 6D pose estimation stands as a fundamental task in computer vision. However, current self-supervised methods face two major challenges. Firstly, existing networks struggle to reconstruct precise object models due to significant part-level shape variations among specific categories. Secondly, they are impacted by the many-to-one ambiguity in the correspondences between pixels and point clouds. To address these challenges, we propose a novel approach that includes a Part-level Shape Reconstruction (PSR) module and a Coarse-to-Fine Correspondence Optimization (CFCO) module. In the (PSR) module, we introduce a part-level discrete shape memory to capture more fine-grained shape variations of different objects and use it to perform precise reconstruction. In the (CFCO) module, we utilize Hungarian matching to generate one-to-one pseudo labels at both region and pixel levels, which provides explicit supervision for the corresponding similarity matrices. We evaluate our method on the REAL275 and WILD6D datasets. Our extensive experiments show that our self-supervised approach outperforms existing methods and achieves new state-of-the-art results within the self-supervised framework.

References

[1]
Yoshua Bengio, Nicholas Léonard, and Aaron Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013).
[2]
Frank Biocca. 1992. Virtual reality technology: A tutorial. Journal of communication 42, 4 (1992), 23--72.
[3]
Angel X Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. 2015. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015).
[4]
Dengsheng Chen, Jun Li, Zheng Wang, and Kai Xu. 2020. Learning canonical shape space for category-level 6d object pose and size estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 11973--11982.
[5]
Wei Chen, Xi Jia, Hyung Jin Chang, Jinming Duan, Linlin Shen, and Ales Leonardis. 2021. Fs-net: Fast shape-based network for category-level 6d object pose estimation with decoupled rotation mechanism. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1581--1590.
[6]
Xu Chen, Zijian Dong, Jie Song, Andreas Geiger, and Otmar Hilliges. 2020. Category level object pose estimation via neural analysis-by-synthesis. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXVI 16. Springer, 139--156.
[7]
An-Chieh Cheng, Xueting Li, Sifei Liu, Min Sun, and Ming-Hsuan Yang. 2022. Autoregressive 3d shape generation via canonical mapping. In European Conference on Computer Vision. Springer, 89--104.
[8]
Yan Di, Ruida Zhang, Zhiqiang Lou, Fabian Manhardt, Xiangyang Ji, Nassir Navab, and Federico Tombari. 2022. Gpv-pose: Category-level object pose estimation via geometry-guided point-wise voting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6781--6791.
[9]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961--2969.
[10]
Yisheng He, Haoqiang Fan, Haibin Huang, Qifeng Chen, and Jian Sun. 2022. Towards self-supervised category-level object pose and size estimation. arXiv preprint arXiv:2203.02884 (2022).
[11]
Georg Klein and David Murray. 2007. Parallel tracking and mapping for small AR workspaces. In 2007 6th IEEE and ACM international symposium on mixed and augmented reality. IEEE, 225--234.
[12]
Taeyeop Lee, Byeong-Uk Lee, Inkyu Shin, Jaesung Choe, Ukcheol Shin, In So Kweon, and Kuk-Jin Yoon. 2022. UDA-COPE: unsupervised domain adaptation for category-level object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14891--14900.
[13]
Taeyeop Lee, Jonathan Tremblay, Valts Blukis, Bowen Wen, Byeong-Uk Lee, Inkyu Shin, Stan Birchfield, In So Kweon, and Kuk-Jin Yoon. 2023. TTA-COPE: Test-Time Adaptation for Category-Level Object Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 21285--21295.
[14]
Peixuan Li, Huaici Zhao, Pengfei Liu, and Feidao Cao. 2020. Rtm3d: Real-time monocular 3d detection from object keypoints for autonomous driving. In European Conference on Computer Vision. Springer, 644--660.
[15]
Jiehong Lin, Zewei Wei, Changxing Ding, and Kui Jia. 2022. Category-level 6D object pose and size estimation using self-supervised deep prior deformation networks. In European Conference on Computer Vision. Springer, 19--34.
[16]
Jiehong Lin, Zewei Wei, Zhihao Li, Songcen Xu, Kui Jia, and Yuanqing Li. 2021. Dualposenet: Category-level 6d object pose and size estimation using dual pose network with refined learning of pose consistency. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3560--3569.
[17]
Paritosh Mittal, Yen-Chi Cheng, Maneesh Singh, and Shubham Tulsiani. 2022. Autosdf: Shape priors for 3d completion, reconstruction and generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 306--315.
[18]
Sida Peng, Yuan Liu, Qixing Huang, Xiaowei Zhou, and Hujun Bao. 2019. Pvnet: Pixel-wise voting network for 6dof pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4561--4570.
[19]
Wanli Peng, Jianhang Yan, Hongtao Wen, and Yi Sun. 2022. Self-supervised category-level 6D object pose estimation with deep implicit shape representation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 2082--2090.
[20]
Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 652--660.
[21]
Mahdi Rad and Vincent Lepetit. 2017. Bb8: A scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In Proceedings of the IEEE international conference on computer vision. 3828--3836.
[22]
Ali Razavi, Aaron Van den Oord, and Oriol Vinyals. 2019. Generating diverse high-fidelity images with vq-vae-2. Advances in neural information processing systems 32 (2019).
[23]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv:1505.04597 [cs.CV]
[24]
Jingnan Shi, Heng Yang, and Luca Carlone. 2021. Optimal pose and shape estimation for category-level 3d object perception. arXiv preprint arXiv:2104.08383 (2021).
[25]
Meng Tian, Marcelo H Ang, and Gim Hee Lee. 2020. Shape prior deformation for categorical 6d object pose and size estimation. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXI 16. Springer, 530--546.
[26]
Shinji Umeyama. 1991. Least-squares estimation of transformation parameters between two point patterns. IEEE Transactions on Pattern Analysis & Machine Intelligence 13, 04 (1991), 376--380.
[27]
Aaron Van Den Oord, Oriol Vinyals, et al. 2017. Neural discrete representation learning. Advances in neural information processing systems 30 (2017).
[28]
Angtian Wang, Shenxiao Mei, Alan L Yuille, and Adam Kortylewski. 2021. Neural view synthesis and matching for semi-supervised few-shot learning of 3d pose. Advances in Neural Information Processing Systems 34 (2021), 7207--7219.
[29]
Chen Wang, Danfei Xu, Yuke Zhu, Roberto Martín-Martín, Cewu Lu, Li Fei-Fei, and Silvio Savarese. 2019. Densefusion: 6d object pose estimation by iterative dense fusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3343--3352.
[30]
Gu Wang, Fabian Manhardt, Federico Tombari, and Xiangyang Ji. 2021. Gdrnet: Geometry-guided direct regression network for monocular 6d object pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16611--16621.
[31]
He Wang, Srinath Sridhar, Jingwei Huang, Julien Valentin, Shuran Song, and Leonidas J Guibas. 2019. Normalized object coordinate space for category-level 6d object pose and size estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2642--2651.
[32]
Yu Xiang, Tanner Schmidt, Venkatraman Narayanan, and Dieter Fox. 2017. Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199 (2017).
[33]
Yang You, Ruoxi Shi, Weiming Wang, and Cewu Lu. 2022. Cppf: Towards robust category-level 9d pose estimation in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6866--6875.
[34]
Yanjie Ze and Xiaolong Wang. 2022. Category-level 6d object pose estimation in the wild: A semi-supervised learning approach and a new dataset. Advances in Neural Information Processing Systems 35 (2022), 27469--27483.
[35]
Kaifeng Zhang, Yang Fu, Shubhankar Borse, Hong Cai, Fatih Porikli, and Xiaolong Wang. 2022. Self-supervised geometric correspondence for category-level 6d object pose estimation in the wild. arXiv preprint arXiv:2210.07199 (2022).

Index Terms

  1. Part-level Reconstruction for Self-Supervised Category-level 6D Object Pose Estimation with Coarse-to-Fine Correspondence Optimization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
    October 2024
    11719 pages
    ISBN:9798400706868
    DOI:10.1145/3664647
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 October 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. 3d reconstruction
    2. deep learning
    3. multimodal data processing
    4. self-supervised learning
    5. visual-spatial correspondence

    Qualifiers

    • Research-article

    Funding Sources

    • the Natural Science Foundation of China
    • Dreams Foundation of Jianghuai Advance Technology Center
    • National Aviation Science Foundation
    • Beijing Municipal Science & Technology Commission, Administrative Commission of Zhongguancun Science Park

    Conference

    MM '24
    Sponsor:
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne VIC, Australia

    Acceptance Rates

    MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 39
      Total Downloads
    • Downloads (Last 12 months)39
    • Downloads (Last 6 weeks)19
    Reflects downloads up to 28 Dec 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media