skip to main content
research-article

SDE2D: Semantic-Guided Discriminability Enhancement Feature Detector and Descriptor

Published: 23 December 2024 Publication History

Abstract

Local feature detectors and descriptors serve various computer vision tasks, such as image matching, visual localization, and 3D reconstruction. To address the extreme variations of rotation and light in the real world, most detectors and descriptors capture as much invariance as possible. However, these methods ignore feature discriminability and perform poorly in indoor scenes. Indoor scenes have too many weak-textured and even repeatedly textured regions, so it is necessary for the extracted features to possess sufficient discriminability. Therefore, we propose a semantic-guided method (called SDE2D) enhancing feature discriminability to improve the performance of descriptors for indoor scenes. We develop a kind of semantic-guided discriminability enhancement (SDE) loss function that uses semantic information from indoor scenes. To the best of our knowledge, this is the first deep research that applies semantic segmentation to enhance discriminability. In addition, we design a novel framework that allows semantic segmentation network to be well embedded as a module in the overall framework and provides guidance information for training. Besides, we explore the impact of different semantic segmentation models on our method. The experimental results on indoor scenes datasets demonstrate that the proposed SDE2D performs well compared with the state-of-the-art models.

References

[1]
J. Ma, X. Jiang, A. Fan, J. Jiang, and J. Yan, “Image matching from handcrafted to deep features: A survey,” Int. J. Comput. Vis., vol. 129, no. 1, pp. 23–79, 2021.
[2]
C. B. Choy, J. Gwak, S. Savarese, and M. K. Chandraker, “Universal correspondence network,” in Proc. Neural Inf. Process. Syst., 2016, pp. 2406–2414.
[3]
J. L. Schönberger, M. Pollefeys, A. Geiger, and T. Sattler, “Semantic visual localization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 6896–6906.
[4]
X. Yang et al., “Robust and efficient RGB-D SLAM in dynamic environments,” IEEE Trans. Multimedia, vol. 23, pp. 4208–4219, 2021.
[5]
S. Zhao, M. Gong, H. Zhao, J. Zhang, and D. Tao, “Deep corner,” Int. J. Comput. Vis., vol. 131, pp. 2908–2932, 2023.
[6]
X. Zhang, F. X. Yu, S. Karaman, and S.-F. Chang, “Learning discriminative and transformation covariant local feature detectors,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 4923–4931.
[7]
E. Rublee, V. Rabaud, K. Konolige, and G. R. Bradski, “ORB: An efficient alternative to SIFT or SURF,” in Proc. IEEE Int. Conf. Comput. Vis., 2011, pp. 2564–2571.
[8]
H. Bay, T. Tuytelaars, and L. V. Gool, “SURF: Speeded up robust features,” in Proc. Eur. Conf. Comput. Vis., 2006, pp. 404–417.
[9]
D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004.
[10]
M. Karpushin, G. Valenzise, and F. Dufaux, “Keypoint detection in RGBD images based on an anisotropic scale space,” IEEE Trans. Multimedia, vol. 18, no. 9, pp. 1762–1771, Sep. 2016.
[11]
Y. Verdie, K. M. Yi, P. Fua, and V. Lepetit, “TILDE: A temporally invariant learned detector,” in Proc. 2015 IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 5279–5288.
[12]
K. Lenc and A. Vedaldi, “Learning covariant feature detectors,” in Proc. 2016 Workshops Comput. Vis., 2016, pp. 100–117.
[13]
H. Noh, A. Araujo, J. Sim, T. Weyand, and B. Han, “Large-scale image retrieval with attentive deep local features,” in Proc. 2017 IEEE Int. Conf. Comput. Vis., 2017, pp. 3476–3485.
[14]
Z. Luo et al., “ASLFeat: Learning local features of accurate shape and localization,” in Proc. Comput. Vis. Pattern Recognit., 2020, pp. 6589–6598.
[15]
P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “SuperGlue: Learning feature matching with graph neural networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 4938–4947.
[16]
J. Sun, Z. Shen, Y. Wang, H. Bao, and X. Zhou, “LoFTR: Detector-free local feature matching with transformers,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 8922–8931.
[17]
J. M. D. Mishkin and F. Radenovic, “Repeatability is not enough: Learning discriminative affine regions via discriminability,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 287–304.
[18]
J. Revaud, C. R. de Souza, M. Humenberger, and P. Weinzaepfel, “R2D2: Reliable and repeatable detector and descriptor,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2019, pp. 12405–12415.
[19]
K. M. Yi, E. Trulls, V. Lepetit, and P. Fua, “LIFT: Learned invariant feature transform,” in Proc. 14th Eur. Conf. Comput. Vis., Amsterdam, The Netherlands, Oct. 11-14, 2016, Proc., Part VI 14, 2016, pp. 467–483.
[20]
D. DeTone, T. Malisiewicz, and A. Rabinovich, “SuperPoint: Self-supervised interest point detection and description,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 224–236.
[21]
M. Dusmanu et al., “D2-Net: A trainable CNN for joint description and detection of local features,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 8092–8101.
[22]
X. Zhao, “ALIKE: Accurate and lightweight keypoint detection and descriptor extraction,” IEEE Trans. Multimedia, vol. 25, pp. 3101–3112, 2022.
[23]
A. B. Laguna and K. Mikolajczyk, “Key.net: Keypoint detection by handcrafted and learned CNN filters revisited,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 1, pp. 698–711, Jan. 2023.
[24]
Y. He et al., “DarkFeat: Noise-robust feature detector and descriptor for extremely low-light raw images,” in Proc. AAAI Conf. Artif. Intell., 2023, pp. 826–834.
[25]
E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 4, pp. 640–651, Apr. 2017.
[26]
H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proc. 2017 IEEE Conf. Comput. Vis. Pattern Recognit., Honolulu, HI, USA, Jul. 21-26, 2017, pp. 6230–6239.
[27]
E. Xie, “SegFormer: Simple and efficient design for semantic segmentation with transformers,” in Proc. Adv. Neural Inf. Process. Syst., 2021, pp. 12077–12090.
[28]
A. Kirillov et al., “Segment anything,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 4015–4026.
[29]
K. C. Gowda and G. Krishna, “The condensed nearest neighbor rule using the concept of mutual nearest neighborhood (Corresp.),” IEEE Trans. Inf. Theory, vol. 25, no. 4, pp. 488–490, Jul. 1979.
[30]
Y. Sun et al., “Correspondence transformers with asymmetric feature learning and matching flow super-resolution,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 17787–17796.
[31]
D. Zhao, “Multi-scale matching networks for semantic correspondence,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 3354–3364.
[32]
J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proc. 2018 IEEE Conf. Comput. Vis. Pattern Recognit., Salt Lake City, UT, USA, Jun. 18-22, 2018, pp. 7132–7141.
[33]
Z. Zhong et al., “Squeeze-and-attention networks for semantic segmentation,” in Proc. 2020 IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Seattle, WA, USA, Jun. 13-19, 2020, pp. 13062–13071.
[34]
B. Zhou, “Scene parsing through ADE20 K dataset,” in Proc. 2017 IEEE Conf. Comput. Vis. Pattern Recognit., Honolulu, HI, USA, Jul. 21-26, 2017, pp. 5122–5130.
[35]
G. E. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” 2015, arXiv:1503.02531.
[36]
W. Zhang, T. Sun, S. Wang, Q. Cheng, and N. Haala, “HI-SLAM: Monocular real-time dense mapping with hybrid implicit fields,” IEEE Robot. Automat. Lett., vol. 9, no. 2, pp. 1548–1555, Feb. 2024.
[37]
Q. Peng, Z. Xiang, Y. Fan, T. Zhao, and X. Zhao, “RWT-SLAM: Robust visual slam for highly weak-textured environments,” 2022, arXiv:2207.03539.
[38]
S. A. Sadat, K. Chutskoff, D. Jungic, J. Wawerla, and R. Vaughan, “Feature-rich path planning for robust navigation of MAVs with mono-slam,” in Proc. 2014 IEEE Int. Conf. Robot. Automat., 2014, pp. 3870–3875.
[39]
K. He, Y. Lu, and S. Sclaroff, “Local descriptors optimized for average precision,” in Proc. 2018 IEEE Conf. Comput. Vis. Pattern Recognit., Salt Lake City, UT, USA, Jun. 18-22, 2018, pp. 596–605.
[40]
C. Wang, G. Zhang, Z. Cheng, and W. Zhou, “Rethinking low-level features for interest point detection and description,” in Proc. 16th Asian Conf. Comput. Vis., Macao, China, Dec. 4-8, 2022, Proc., Part II, ser. Lecture Notes in Computer Science, 2022, pp. 108–123.
[41]
A. Dai, “ScanNet: Richly-annotated 3D reconstructions of indoor scenes,” in Proc. 2017 IEEE Conf. Comput. Vis. Pattern Recognit., Honolulu, HI, USA, Jul. 21-26, 2017, pp. 2432–2443.
[42]
T. Sattler, T. Weyand, B. Leibe, and L. Kobbelt, “Image retrieval for image-based localization revisited,” in Proc. Brit. Mach. Vis. Conf., Surrey, U.K., Sep. 03-07, 2012, pp. 1–12.
[43]
F. Xue, I. Budvytis, D. O. Reino, and R. Cipolla, “Efficient large-scale localization by global instance recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., New Orleans, LA, USA, Jun. 18-24, 2022, pp. 17327–17336.
[44]
M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,” Commun. ACM, vol. 24, no. 6, pp. 381–395, 1981.
[45]
A. Bhowmik, S. Gumhold, C. Rother, and E. Brachmann, “Reinforced feature points: Optimizing feature detection and description for a high-level task,” in Proc. 2020 IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 4947–4956.
[46]
A. Kendall, M. Grimes, and R. Cipolla, “PoseNet: A convolutional network for real-time 6-DoF camera relocalization,” in Proc. 2015 IEEE Int. Conf. Comput. Vis., Santiago, Chile, Dec. 07-13, 2015, pp. 2938–2946.
[47]
S. Brahmbhatt, J. Gu, K. Kim, J. Hays, and J. Kautz, “Geometry-aware learning of maps for camera localization,” in Proc. 2018 IEEE Conf. Comput. Vis. Pattern Recognit., Salt Lake City, UT, USA, Jun. 18-22, 2018, pp. 2616–2625.
[48]
J. Shotton, “Scene coordinate regression forests for camera relocalization in RGB-D images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2013, pp. 2930–2937.
[49]
Q. Zhou, T. Sattler, and L. Leal-Taixé, “Patch2Pix: Epipolar-guided pixel-level correspondences,” in Proc. 2021 IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 4667–4676.
[50]
P.-E. Sarlin, C. Cadena, R. Siegwart, and M. Dymczyk, “From coarse to fine: Robust hierarchical localization at large scale,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 12708–12717.
[51]
W. Cheng, W. Lin, K. Chen, and X. Zhang, “Cascaded parallel filtering for memory-efficient image-based localization,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 1032–1041.
[52]
L. Yang et al., “SceneSqueezer: Learning to compress scene for camera relocalization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 8259–8268.
[53]
V. Balntas, K. Lenc, A. Vedaldi, and K. Mikolajczyk, “HPatches: A benchmark and evaluation of handcrafted and learned local descriptors,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2017, pp. 3852–3861.
[54]
A. Mishchuk, D. Mishkin, F. Radenovic, and J. Matas, “Working hard to know your neighbors margins: Local descriptor learning loss,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 4829–4840.
[55]
Y. Ono, E. Trulls, P. Fua, and K. M. Yi, “LF-Net: Learning local features from images,” in Proc. Adv. Neural Inf. Process. Syst., 2018, pp. 6237–6247.
[56]
M. Perdoch, O. Chum, and J. Matas, “Efficient representation of local geometry for large scale object retrieval,” in Proc. 2009 IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 9–16.

Index Terms

  1. SDE2D: Semantic-Guided Discriminability Enhancement Feature Detector and Descriptor
              Index terms have been assigned to the content through auto-classification.

              Recommendations

              Comments

              Information & Contributors

              Information

              Published In

              cover image IEEE Transactions on Multimedia
              IEEE Transactions on Multimedia  Volume 27, Issue
              2025
              502 pages

              Publisher

              IEEE Press

              Publication History

              Published: 23 December 2024

              Qualifiers

              • Research-article

              Contributors

              Other Metrics

              Bibliometrics & Citations

              Bibliometrics

              Article Metrics

              • 0
                Total Citations
              • 0
                Total Downloads
              • Downloads (Last 12 months)0
              • Downloads (Last 6 weeks)0
              Reflects downloads up to 09 Feb 2025

              Other Metrics

              Citations

              View Options

              View options

              Figures

              Tables

              Media

              Share

              Share

              Share this Publication link

              Share on social media