research-article

Semantic Correspondence with Geometric Structure Analysis

Authors:

Yuanfang GuoAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 17, Issue 3

Article No.: 83, Pages 1 - 21

https://doi.org/10.1145/3441576

Published: 22 July 2021 Publication History

Abstract

This article studies the correspondence problem for semantically similar images, which is challenging due to the joint visual and geometric deformations. We introduce the Flip-aware Distance Ratio method (FDR) to solve this problem from the perspective of geometric structure analysis. First, a distance ratio constraint is introduced to enforce the geometric consistencies between images with large visual variations, whereas local geometric jitters are tolerated via a smoothness term. For challenging cases with symmetric structures, our proposed method exploits Curl to suppress the mismatches. Subsequently, image correspondence is formulated as a permutation problem, for which we propose a Gradient Guided Simulated Annealing (GGSA) algorithm to perform a robust discrete optimization. Experiments on simulated and real-world datasets, where both visual and geometric deformations are present, indicate that our method significantly improves the baselines for both visually and semantically similar images.

References

[1]

Manya V. Afonso, Jacinto C. Nascimento, and Jorge S. Marques. 2013. Automatic estimation of multiple motion fields from video sequences using a region matching based approach. IEEE Transactions on Multimedia 16, 1 (2013), 1–14.

[2]

Xaro Benavent, Ana Garcia-Serrano, Ruben Granados, Joan Benavent, and Esther de Ves. 2013. Multimedia information retrieval based on late semantic fusion approaches: Experiments on a Wikipedia image collection. IEEE Transactions on Multimedia 15, 8 (2013), 2009–2021.

Digital Library

[3]

Alexander C. Berg, Tamara L. Berg, and Jitendra Malik. 2005. Shape matching and object recognition using low distortion correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 26–33.

Digital Library

[4]

Goutam Bhat, Felix Järemo Lawin, Martin Danelljan, Andreas Robinson, Michael Felsberg, Luc Van Gool, and Radu Timofte. 2020. Learning what to learn for video object segmentation. In Proceedings of the European Conference on Computer Vision.

[5]

Hilton Bristow, Jack Valmadre, and Simon Lucey. 2015. Dense semantic correspondence where every pixel is a classifier. In Proceedings of the IEEE International Conference on Computer Vision. 4024–4031.

Digital Library

[6]

Michael Calonder, Vincent Lepetit, Mustafa Özuysal, Tomasz Trzcinski, Christoph Strecha, and Pascal Fua. 2012. BRIEF: Computing a local binary descriptor very fast. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 7 (2012), 1281–1298.

Digital Library

[7]

Minsu Cho, Karteek Alahari, and Jean Ponce. 2013. Learning graphs to match. In Proceedings of the IEEE International Conference on Computer Vision. 25–32.

Digital Library

[8]

Minsu Cho, Jungmin Lee, and Kyoung Mu Lee. 2010. Reweighted random walks for graph matching. In Proceedings of the European Conference on Computer Vision. 492–505.

Digital Library

[9]

Minsu Cho, Jian Sun, Olivier Duchenne, and Jean Ponce. 2014. Finding matches in a haystack: A max-pooling strategy for graph matching in the presence of outliers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2091–2098.

Digital Library

[10]

Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 886–893.

Digital Library

[11]

Olivier Duchenne, Armand Joulin, and Jean Ponce. 2011. A graph-matching kernel for object categorization. In Proceedings of the IEEE International Conference on Computer Vision. 1792–1799.

Digital Library

[12]

Fangxiang Feng, Xiaojie Wang, Ruifan Li, and Ibrar Ahmad. 2015. Correspondence autoencoders for cross-modal retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications 12, 1s (2015), 22.

Digital Library

[13]

Martin A. Fischler and Robert C. Bolles. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24, 6 (1981), 381–395.

Digital Library

[14]

Shiming Ge, Jia Li, Qiting Ye, and Zhao Luo. 2017. Detecting masked faces in the wild with LLE-CNNs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 426–434.

[15]

Ross Girshick. 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 1440–1448.

Digital Library

[16]

V. Granville, M. Krivanek, and J. P. Rasson. 1994. Simulated annealing: A proof of convergence. IEEE Transactions on Pattern Analysis and Machine Intelligence 16, 6 (1994), 652–656.

Digital Library

[17]

Tal Hassner, Viki Mayzels, and Lihi Zelnik-Manor. 2012. On SIFTs and their scales. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1522–1528.

Digital Library

[18]

Shuaiyi Huang, Qiuyue Wang, Songyang Zhang, Shipeng Yan, and Xuming He. 2019. Dynamic context correspondence network for semantic alignment. In Proceedings of the IEEE International Conference on Computer Vision. 2010–2019.

[19]

Junhwa Hur, Hwasup Lim, Changsoo Park, and Sang Chul Ahn. 2015. Generalized deformable spatial pyramid: Geometry-preserving dense correspondence estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1392–1400.

[20]

Ignacio Rocco, Relja Arandjelovic, and Josef Sivic. 2018. End-to-end weakly-supervised semantic alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6917–6925.

[21]

Sangryul Jeon, Dongbo Min, Seungryong Kim, and Kwanghoon Sohn. 2019. Joint learning of semantic alignment and object landmark detection. In Proceedings of the IEEE International Conference on Computer Vision. 7293–7302.

[22]

Jaechul Kim, Ce Liu, Fei Sha, and Kristen Grauman. 2013. Deformable spatial pyramid matching for fast dense correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2307–2314.

Digital Library

[23]

Suna Kim, Suha Kwak, Jan Feyereisl, and Bohyung Han. 2012. Online multi-target tracking by large margin structured learning. In Proceedings of the Asian Conference on Computer Vision. 98–111.

Digital Library

[24]

Seungryong Kim, Stephen Lin, Sang Ryul Jeon, Dongbo Min, and Kwanghoon Sohn. 2018. Recurrent transformer networks for semantic correspondence. In Proceedings of the International Conference on Neural Information Processing Systems. 6126–6136.

Digital Library

[25]

Seungryong Kim, Dongbo Min, Bumsub Ham, Sangryul Jeon, Stephen Lin, and Kwanghoon Sohn. 2017. FCSS: Fully convolutional self-similarity for dense semantic correspondence. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 616–625.

[26]

Seungryong Kim, Dongbo Min, Stephen Lin, and Kwanghoon Sohn. 2017. DCTM: Discrete-continuous transformation matching for semantic flow. In Proceedings of the IEEE International Conference on Computer Vision. 4539–4548.

[27]

Seungryong Kim, Dongbo Min, Stephen Lin, and Kwanghoon Sohn. 2020. Discrete-continuous transformation matching for dense semantic correspondence. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 1 (2020), 59–73.

Digital Library

[28]

S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. 1983. Optimization by simulated annealing. Science 220, 4598 (1983), 671–680.

[29]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM 60, 6 (2017), 84–90.

Digital Library

[30]

Eugene L. Lawler. 1963. The quadratic assignment problem. Management Science 9, 4 (1963), 586–599.

Digital Library

[31]

Junghyup Lee, Dohyung Kim, Wonkyung Lee, Jean Ponce, and Bumsub Ham. 2020. Learning semantic correspondence exploiting an object-level prior. IEEE Transactions on Pattern Analysis and Machine Intelligence. Early access, August 3, 2020.

[32]

Soon-Young Lee, Jae-Young Sim, Chang-Su Kim, and Sang-Uk Lee. 2013. Correspondence matching of multi-view video sequences using mutual information based similarity measure. IEEE Transactions on Multimedia 15, 8 (2013), 1719–1731.

Digital Library

[33]

Marius Leordeanu and Martial Hebert. 2005. A spectral technique for correspondence problems using pairwise constraints. In Proceedings of the IEEE International Conference on Computer Vision. 1482–1489.

Digital Library

[34]

Marius Leordeanu, Martial Hebert, and Rahul Sukthankar. 2009. An integer projected fixed point method for graph matching and MAP inference. In Proceedings of the International Conference on Neural Information Processing Systems. 1114–1122.

Digital Library

[35]

Chueh-Yu Li and Chiou-Ting Hsu. 2008. Image retrieval with relevance feedback based on graph-theoretic region correspondence estimation. IEEE Transactions on Multimedia 10, 3 (2008), 447–456.

Digital Library

[36]

Ce Liu, Jenny Yuen, and Antonio Torralba. 2011. Nonparametric scene parsing via label transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 12 (2011), 2368–2382.

Digital Library

[37]

Ce Liu, Jenny Yuen, and Antonio Torralba. 2011. SIFT flow: Dense correspondence across scenes and its applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 5 (2011), 978–994.

Digital Library

[38]

David G. Lowe. 1999. Object recognition from local scale-invariant features. In Proceedings of the IEEE International Conference on Computer Vision. 1150–1157.

Digital Library

[39]

K.-K. Maninis, S. Caelles, Y. Chen, J. Pont-Tuset, L. Leal-Taixe, D. Cremers, and L. Van Gool. 2019. Video object segmentation without temporal information. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 6 (2019), 1515–1530.

Digital Library

[40]

David Novotný, Diane Larlus, and Andrea Vedaldi. 2017. AnchorNet: A weakly supervised network to learn geometry-sensitive features for semantic matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2867–2876.

[41]

Deepti Pachauri, Risi Kondor, and Vikas Singh. 2013. Solving the multi-way matching problem by permutation synchronization. In Proceedings of the International Conference on Neural Information Processing Systems. 1860–1868.

Digital Library

[42]

Jiangmiao Pang, Kai Chen, Jianping Shi, Huajun Feng, Wanli Ouyang, and Dahua Lin. 2019. Libra R-CNN: Towards balanced learning for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 821–830.

[43]

Federico Perazzi, Anna Khoreva, Rodrigo Benenson, Bernt Schiele, and Alexander Sorkine-Hornung. 2017. Learning video object segmentation from static images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3491–3500.

[44]

Richard Roberts, Sudipta N. Sinha, Richard Szeliski, and Drew Steedly. 2011. Structure from motion for scenes with large duplicate structures. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3137–3144.

Digital Library

[45]

Ignacio Rocco, Relja Arandjelovic, and Josef Sivic. 2017. Convolutional neural network architecture for geometric matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 39–48.

[46]

Douglas C. Schmidt and Larry E. Druffel. 1976. A fast backtracking algorithm to test directed graphs for isomorphism using distance matrices. Journal of the ACM 23, 3 (1976), 433–445.

Digital Library

[47]

Yumin Suh, Kamil Adamczewski, and Kyoung Mu Lee. 2015. Subgraph matching using compactness prior for robust feature correspondence. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5070–5078.

[48]

Yumin Suh, Minsu Cho, and Kyoung Mu Lee. 2012. Graph matching via sequential Monte Carlo. In Proceedings of the European Conference on Computer Vision. 624–637.

Digital Library

[49]

Yoshikazu Terada and Ulrike V. Luxburg. 2014. Local ordinal embedding. In Proceedings of the International Conference on Machine Learning. 847–855.

Digital Library

[50]

Prune Truong, Martin Danelljan, and Radu Timofte. 2020. GLU-Net: Global-local universal network for dense flow and correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6258–6268.

[51]

Nikolai Ufer and Bjorn Ommer. 2017. Deep semantic feature matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5929–5938.

[52]

Julian R. Ullmann. 1976. An algorithm for subgraph isomorphism. Journal of the ACM 23, 1 (1976), 31–42.

Digital Library

[53]

Carl Martin Vondrick, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, and Kevin Murphy. 2018. Tracking emerges by colorizing videos. In Proceedings of the European Conference on Computer Vision. 402–419.

[54]

Rui Wang, Dong Liang, Wei Zhang, and Xiaochun Cao. 2016. MatchDR: Image correspondence by leveraging distance ratio constraint. In Proceedings of the ACM Conference on Multimedia. 606–610.

Digital Library

[55]

Zhiyu Wang, Peng Cui, Lexing Xie, Wenwu Zhu, Yong Rui, and Shiqiang Yang. 2014. Bilateral correspondence model for words-and-pictures association in multimedia-rich microblogs. ACM Transactions on Multimedia Computing, Communications, and Applications 10, 4 (2014), 21.

Digital Library

[56]

Zhichao Yin, Trevor Darrell, and Fisher Yu. 2019. Hierarchical discrete distribution decomposition for match density estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6044–6053.

[57]

Zhengyou Zhang. 1994. Iterative point matching for registration of free-form curves and surfaces. International Journal of Computer Vision 13, 2 (1994), 119–152.

Digital Library

[58]

Wanlei Zhao and Chong-Wah Ngo. 2013. Flip-Invariant SIFT for copy and object detection. IEEE Transactions on Image Processing 22, 3 (2013), 980–991.

Digital Library

[59]

Xiaowei Zhou, Menglong Zhu, and Kostas Daniilidis. 2015. Multi-image matching via fast alternating minimization. In Proceedings of the IEEE International Conference on Computer Vision. 4032–4040.

Digital Library

Cited By

Liang XYang EDeng CYang Y(2024)CrossFormer: Cross-modal Representation Learning via Heterogeneous Graph TransformerACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3688801Online publication date: 20-Sep-2024
https://doi.org/10.1145/3688801
Syu JLin JSrivastava G(2024)Distributed Learning Mechanisms for Anomaly Detection in Privacy-Aware Energy Grid Management SystemsACM Transactions on Sensor Networks10.1145/3640341Online publication date: 17-Jan-2024
https://dl.acm.org/doi/10.1145/3640341
Xia SXing TWu CLiu GYang JLi K(2024)AQMon: A Fine-grained Air Quality Monitoring System Based on UAV Images for Smart CitiesACM Transactions on Sensor Networks10.1145/363876620:2(1-20)Online publication date: 19-Jan-2024
https://dl.acm.org/doi/10.1145/3638766
Show More Cited By

Index Terms

Semantic Correspondence with Geometric Structure Analysis
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Matching
2. Theory of computation
  1. Design and analysis of algorithms
    1. Mathematical optimization
      1. Discrete optimization
        Optimization with randomized search heuristics
        Simulated annealing

Recommendations

MatchDR: Image Correspondence by Leveraging Distance Ratio Constraint
MM '16: Proceedings of the 24th ACM international conference on Multimedia

Image correspondence is to establish the connections between coherent images, which can be quite challenging due to the visual and geometric deformations. This paper proposes a robust image correspondence technique from the perspective of spatial ...
3-D surface reconstruction from stereoscopic image sequences
ICCV '95: Proceedings of the Fifth International Conference on Computer Vision

A stereoscopic scene analysis system for 3-D modeling of objects from stereoscopic image sequences is described. A dense map of 3-D surface points is obtained by image correspondence, object segmentation, interpolation, and triangulation. Emphasis is ...
Tolerance near sets and image correspondence

The principal problem considered in this paper is how to solve the image correspondence problem using a bio-inspired approach. One solution to this problem is to consider tolerance near sets that model human perception in a physical continuum. Near sets ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 17, Issue 3

August 2021

443 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3476118

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2021 Association for Computing Machinery.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 July 2021

Accepted: 01 December 2020

Revised: 01 October 2020

Received: 01 March 2020

Published in TOMM Volume 17, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

42
Total Citations
View Citations
143
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)5

Reflects downloads up to 09 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liang XYang EDeng CYang Y(2024)CrossFormer: Cross-modal Representation Learning via Heterogeneous Graph TransformerACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3688801Online publication date: 20-Sep-2024
https://doi.org/10.1145/3688801
Syu JLin JSrivastava G(2024)Distributed Learning Mechanisms for Anomaly Detection in Privacy-Aware Energy Grid Management SystemsACM Transactions on Sensor Networks10.1145/3640341Online publication date: 17-Jan-2024
https://dl.acm.org/doi/10.1145/3640341
Xia SXing TWu CLiu GYang JLi K(2024)AQMon: A Fine-grained Air Quality Monitoring System Based on UAV Images for Smart CitiesACM Transactions on Sensor Networks10.1145/363876620:2(1-20)Online publication date: 19-Jan-2024
https://dl.acm.org/doi/10.1145/3638766
Su HLi JDu ZZhu LLu KShen H(2024)Cross-domain Recommendation via Dual Adversarial AdaptationACM Transactions on Information Systems10.1145/363252442:3(1-26)Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1145/3632524
Wang DLi FLiu KZhang X(2024)Real-time Cyber-Physical Security Solution Leveraging an Integrated Learning-Based ApproachACM Transactions on Sensor Networks10.1145/358200920:2(1-22)Online publication date: 9-Jan-2024
https://dl.acm.org/doi/10.1145/3582009
Long ZZhu CChen JLi ZRen YLiu Y(2024)Multi-View MERA Subspace ClusteringIEEE Transactions on Multimedia10.1109/TMM.2023.330723926(3102-3112)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3307239
Hui CZhang SCui WLiu SJiang FZhao D(2024)Rate-Adaptive Neural Network for Image Compressive SensingIEEE Transactions on Multimedia10.1109/TMM.2023.330121326(2515-2530)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3301213
Zhao BLi LPan H(2023)Non-Local Means Hole Repair Algorithm Based on Adaptive BlockApplied Sciences10.3390/app1401015914:1(159)Online publication date: 24-Dec-2023
https://doi.org/10.3390/app14010159
Lv CZhang DGeng SWu ZHuang H(2023)Color Transfer for Images: A SurveyACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363515220:8(1-29)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3635152
Wang RWang FSu YSun JSun FLi H(2023)Attention-guided Multi-modality Interaction Network for RGB-D Salient Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/362474720:3(1-22)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1145/3624747
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents