research-article

SDE2D: Semantic-Guided Discriminability Enhancement Feature Detector and Descriptor

Authors:

Thomas H. LiAuthors Info & Claims

IEEE Transactions on Multimedia, Volume 27

Pages 275 - 286

https://doi.org/10.1109/TMM.2024.3521748

Published: 23 December 2024 Publication History

Abstract

Local feature detectors and descriptors serve various computer vision tasks, such as image matching, visual localization, and 3D reconstruction. To address the extreme variations of rotation and light in the real world, most detectors and descriptors capture as much invariance as possible. However, these methods ignore feature discriminability and perform poorly in indoor scenes. Indoor scenes have too many weak-textured and even repeatedly textured regions, so it is necessary for the extracted features to possess sufficient discriminability. Therefore, we propose a semantic-guided method (called SDE2D) enhancing feature discriminability to improve the performance of descriptors for indoor scenes. We develop a kind of semantic-guided discriminability enhancement (SDE) loss function that uses semantic information from indoor scenes. To the best of our knowledge, this is the first deep research that applies semantic segmentation to enhance discriminability. In addition, we design a novel framework that allows semantic segmentation network to be well embedded as a module in the overall framework and provides guidance information for training. Besides, we explore the impact of different semantic segmentation models on our method. The experimental results on indoor scenes datasets demonstrate that the proposed SDE2D performs well compared with the state-of-the-art models.

References

[1]

J. Ma, X. Jiang, A. Fan, J. Jiang, and J. Yan, “Image matching from handcrafted to deep features: A survey,” Int. J. Comput. Vis., vol. 129, no. 1, pp. 23–79, 2021.

Digital Library

[2]

C. B. Choy, J. Gwak, S. Savarese, and M. K. Chandraker, “Universal correspondence network,” in Proc. Neural Inf. Process. Syst., 2016, pp. 2406–2414.

[3]

J. L. Schönberger, M. Pollefeys, A. Geiger, and T. Sattler, “Semantic visual localization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 6896–6906.

[4]

X. Yang et al., “Robust and efficient RGB-D SLAM in dynamic environments,” IEEE Trans. Multimedia, vol. 23, pp. 4208–4219, 2021.

[5]

S. Zhao, M. Gong, H. Zhao, J. Zhang, and D. Tao, “Deep corner,” Int. J. Comput. Vis., vol. 131, pp. 2908–2932, 2023.

Digital Library

[6]

X. Zhang, F. X. Yu, S. Karaman, and S.-F. Chang, “Learning discriminative and transformation covariant local feature detectors,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 4923–4931.

[7]

E. Rublee, V. Rabaud, K. Konolige, and G. R. Bradski, “ORB: An efficient alternative to SIFT or SURF,” in Proc. IEEE Int. Conf. Comput. Vis., 2011, pp. 2564–2571.

[8]

H. Bay, T. Tuytelaars, and L. V. Gool, “SURF: Speeded up robust features,” in Proc. Eur. Conf. Comput. Vis., 2006, pp. 404–417.

[9]

D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004.

Digital Library

[10]

M. Karpushin, G. Valenzise, and F. Dufaux, “Keypoint detection in RGBD images based on an anisotropic scale space,” IEEE Trans. Multimedia, vol. 18, no. 9, pp. 1762–1771, Sep. 2016.

Digital Library

[11]

Y. Verdie, K. M. Yi, P. Fua, and V. Lepetit, “TILDE: A temporally invariant learned detector,” in Proc. 2015 IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 5279–5288.

[12]

K. Lenc and A. Vedaldi, “Learning covariant feature detectors,” in Proc. 2016 Workshops Comput. Vis., 2016, pp. 100–117.

[13]

H. Noh, A. Araujo, J. Sim, T. Weyand, and B. Han, “Large-scale image retrieval with attentive deep local features,” in Proc. 2017 IEEE Int. Conf. Comput. Vis., 2017, pp. 3476–3485.

[14]

Z. Luo et al., “ASLFeat: Learning local features of accurate shape and localization,” in Proc. Comput. Vis. Pattern Recognit., 2020, pp. 6589–6598.

[15]

P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “SuperGlue: Learning feature matching with graph neural networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 4938–4947.

[16]

J. Sun, Z. Shen, Y. Wang, H. Bao, and X. Zhou, “LoFTR: Detector-free local feature matching with transformers,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 8922–8931.

[17]

J. M. D. Mishkin and F. Radenovic, “Repeatability is not enough: Learning discriminative affine regions via discriminability,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 287–304.

[18]

J. Revaud, C. R. de Souza, M. Humenberger, and P. Weinzaepfel, “R2D2: Reliable and repeatable detector and descriptor,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2019, pp. 12405–12415.

[19]

K. M. Yi, E. Trulls, V. Lepetit, and P. Fua, “LIFT: Learned invariant feature transform,” in Proc. 14th Eur. Conf. Comput. Vis., Amsterdam, The Netherlands, Oct. 11-14, 2016, Proc., Part VI 14, 2016, pp. 467–483.

[20]

D. DeTone, T. Malisiewicz, and A. Rabinovich, “SuperPoint: Self-supervised interest point detection and description,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 224–236.

[21]

M. Dusmanu et al., “D2-Net: A trainable CNN for joint description and detection of local features,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 8092–8101.

[22]

X. Zhao, “ALIKE: Accurate and lightweight keypoint detection and descriptor extraction,” IEEE Trans. Multimedia, vol. 25, pp. 3101–3112, 2022.

[23]

A. B. Laguna and K. Mikolajczyk, “Key.net: Keypoint detection by handcrafted and learned CNN filters revisited,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 1, pp. 698–711, Jan. 2023.

[24]

Y. He et al., “DarkFeat: Noise-robust feature detector and descriptor for extremely low-light raw images,” in Proc. AAAI Conf. Artif. Intell., 2023, pp. 826–834.

[25]

E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 4, pp. 640–651, Apr. 2017.

Digital Library

[26]

H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proc. 2017 IEEE Conf. Comput. Vis. Pattern Recognit., Honolulu, HI, USA, Jul. 21-26, 2017, pp. 6230–6239.

[27]

E. Xie, “SegFormer: Simple and efficient design for semantic segmentation with transformers,” in Proc. Adv. Neural Inf. Process. Syst., 2021, pp. 12077–12090.

[28]

A. Kirillov et al., “Segment anything,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 4015–4026.

[29]

K. C. Gowda and G. Krishna, “The condensed nearest neighbor rule using the concept of mutual nearest neighborhood (Corresp.),” IEEE Trans. Inf. Theory, vol. 25, no. 4, pp. 488–490, Jul. 1979.

Digital Library

[30]

Y. Sun et al., “Correspondence transformers with asymmetric feature learning and matching flow super-resolution,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 17787–17796.

[31]

D. Zhao, “Multi-scale matching networks for semantic correspondence,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 3354–3364.

[32]

J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proc. 2018 IEEE Conf. Comput. Vis. Pattern Recognit., Salt Lake City, UT, USA, Jun. 18-22, 2018, pp. 7132–7141.

[33]

Z. Zhong et al., “Squeeze-and-attention networks for semantic segmentation,” in Proc. 2020 IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Seattle, WA, USA, Jun. 13-19, 2020, pp. 13062–13071.

[34]

B. Zhou, “Scene parsing through ADE20 K dataset,” in Proc. 2017 IEEE Conf. Comput. Vis. Pattern Recognit., Honolulu, HI, USA, Jul. 21-26, 2017, pp. 5122–5130.

[35]

G. E. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” 2015, arXiv:1503.02531.

[36]

W. Zhang, T. Sun, S. Wang, Q. Cheng, and N. Haala, “HI-SLAM: Monocular real-time dense mapping with hybrid implicit fields,” IEEE Robot. Automat. Lett., vol. 9, no. 2, pp. 1548–1555, Feb. 2024.

[37]

Q. Peng, Z. Xiang, Y. Fan, T. Zhao, and X. Zhao, “RWT-SLAM: Robust visual slam for highly weak-textured environments,” 2022, arXiv:2207.03539.

[38]

S. A. Sadat, K. Chutskoff, D. Jungic, J. Wawerla, and R. Vaughan, “Feature-rich path planning for robust navigation of MAVs with mono-slam,” in Proc. 2014 IEEE Int. Conf. Robot. Automat., 2014, pp. 3870–3875.

[39]

K. He, Y. Lu, and S. Sclaroff, “Local descriptors optimized for average precision,” in Proc. 2018 IEEE Conf. Comput. Vis. Pattern Recognit., Salt Lake City, UT, USA, Jun. 18-22, 2018, pp. 596–605.

[40]

C. Wang, G. Zhang, Z. Cheng, and W. Zhou, “Rethinking low-level features for interest point detection and description,” in Proc. 16th Asian Conf. Comput. Vis., Macao, China, Dec. 4-8, 2022, Proc., Part II, ser. Lecture Notes in Computer Science, 2022, pp. 108–123.

[41]

A. Dai, “ScanNet: Richly-annotated 3D reconstructions of indoor scenes,” in Proc. 2017 IEEE Conf. Comput. Vis. Pattern Recognit., Honolulu, HI, USA, Jul. 21-26, 2017, pp. 2432–2443.

[42]

T. Sattler, T. Weyand, B. Leibe, and L. Kobbelt, “Image retrieval for image-based localization revisited,” in Proc. Brit. Mach. Vis. Conf., Surrey, U.K., Sep. 03-07, 2012, pp. 1–12.

[43]

F. Xue, I. Budvytis, D. O. Reino, and R. Cipolla, “Efficient large-scale localization by global instance recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., New Orleans, LA, USA, Jun. 18-24, 2022, pp. 17327–17336.

[44]

M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,” Commun. ACM, vol. 24, no. 6, pp. 381–395, 1981.

Digital Library

[45]

A. Bhowmik, S. Gumhold, C. Rother, and E. Brachmann, “Reinforced feature points: Optimizing feature detection and description for a high-level task,” in Proc. 2020 IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 4947–4956.

[46]

A. Kendall, M. Grimes, and R. Cipolla, “PoseNet: A convolutional network for real-time 6-DoF camera relocalization,” in Proc. 2015 IEEE Int. Conf. Comput. Vis., Santiago, Chile, Dec. 07-13, 2015, pp. 2938–2946.

[47]

S. Brahmbhatt, J. Gu, K. Kim, J. Hays, and J. Kautz, “Geometry-aware learning of maps for camera localization,” in Proc. 2018 IEEE Conf. Comput. Vis. Pattern Recognit., Salt Lake City, UT, USA, Jun. 18-22, 2018, pp. 2616–2625.

[48]

J. Shotton, “Scene coordinate regression forests for camera relocalization in RGB-D images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2013, pp. 2930–2937.

[49]

Q. Zhou, T. Sattler, and L. Leal-Taixé, “Patch2Pix: Epipolar-guided pixel-level correspondences,” in Proc. 2021 IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 4667–4676.

[50]

P.-E. Sarlin, C. Cadena, R. Siegwart, and M. Dymczyk, “From coarse to fine: Robust hierarchical localization at large scale,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 12708–12717.

[51]

W. Cheng, W. Lin, K. Chen, and X. Zhang, “Cascaded parallel filtering for memory-efficient image-based localization,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 1032–1041.

[52]

L. Yang et al., “SceneSqueezer: Learning to compress scene for camera relocalization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 8259–8268.

[53]

V. Balntas, K. Lenc, A. Vedaldi, and K. Mikolajczyk, “HPatches: A benchmark and evaluation of handcrafted and learned local descriptors,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2017, pp. 3852–3861.

[54]

A. Mishchuk, D. Mishkin, F. Radenovic, and J. Matas, “Working hard to know your neighbors margins: Local descriptor learning loss,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 4829–4840.

[55]

Y. Ono, E. Trulls, P. Fua, and K. M. Yi, “LF-Net: Learning local features from images,” in Proc. Adv. Neural Inf. Process. Syst., 2018, pp. 6237–6247.

[56]

M. Perdoch, O. Chum, and J. Matas, “Efficient representation of local geometry for large scale object retrieval,” in Proc. 2009 IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 9–16.

Index Terms

SDE2D: Semantic-Guided Discriminability Enhancement Feature Detector and Descriptor
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Index terms have been assigned to the content through auto-classification.

Recommendations

A rotationally invariant descriptor based on mixed intensity feature histograms

The LIOP (Local Intensity Order Pattern) operator sorts the intensity of neighboring points around one sample point and encodes an intensity order information into this sampling point. In the experiments, we found that the relative order information of ...
A Hybrid Image Feature Descriptor for Classification
CIS '15: Proceedings of the 2015 11th International Conference on Computational Intelligence and Security (CIS)

Feature extraction methods have an important role in image classification. In this paper, a hybrid texture feature descriptor is proposed by utilizing the attributes of two complementary features, PRICoLBP and LPQ. PRICoLBP performs well in the case of ...
Semantic Kernels Binarized - A Feature Descriptor for Fast and Robust Matching
CVMP '11: Proceedings of the 2011 Conference for Visual Media Production

This paper presents a new approach for feature description used in image processing and robust image recognition algorithms such as 3D camera tracking, view reconstruction or 3D scene analysis. State of the art feature detectors distinguish interest ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Multimedia

IEEE Transactions on Multimedia Volume 27, Issue

2025

502 pages

Issue’s Table of Contents

1520-9210 © 2024 IEEE. All rights reserved, including rights for text and data mining, and training of artificial intelligence and similar technologies. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 23 December 2024

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents