research-article

Unsupervised Visual and Textual Information Fusion in CBMIR Using Graph-Based Methods

Authors:

Julien Ah-Pine,

Gabriela Csurka,

Stéphane ClinchantAuthors Info & Claims

ACM Transactions on Information Systems (TOIS), Volume 33, Issue 2

Article No.: 9, Pages 1 - 31

https://doi.org/10.1145/2699668

Published: 17 February 2015 Publication History

Abstract

Multimedia collections are more than ever growing in size and diversity. Effective multimedia retrieval systems are thus critical to access these datasets from the end-user perspective and in a scalable way. We are interested in repositories of image/text multimedia objects and we study multimodal information fusion techniques in the context of content-based multimedia information retrieval. We focus on graph-based methods, which have proven to provide state-of-the-art performances. We particularly examine two such methods: cross-media similarities and random-walk-based scores. From a theoretical viewpoint, we propose a unifying graph-based framework, which encompasses the two aforementioned approaches. Our proposal allows us to highlight the core features one should consider when using a graph-based technique for the combination of visual and textual information. We compare cross-media and random-walk-based results using three different real-world datasets. From a practical standpoint, our extended empirical analyses allow us to provide insights and guidelines about the use of graph-based methods for multimodal information fusion in content-based multimedia information retrieval.

References

[1]

J. Ah-Pine, M. Bressan, S. Clinchant, G. Csurka, Y. Hoppenot, and J.M. Renders. 2009. Crossing textual and visual content in different application scenarios. Multimedia Tools and Applications 42, 1 (2009), 31--56.

Digital Library

[2]

J. Ah-Pine, C. Cifarelli, S. Clinchant, G. Csurka, and J.M. Renders. 2008. XRCE’s participation to ImageCLEF 2008. In Working Notes of CLEF 2008.

[3]

J. Ah-Pine, S. Clinchant, and G. Csurka. 2010. Comparison of several combinations of multimodal and diversity seeking methods for multimedia retrieval. In Multilingual Information Access Evaluation. Lecture Notes in Computer Science. Springer.

Digital Library

[4]

J. Ah-Pine, S. Clinchant, G. Csurka, and Y. Liu. 2009. XRCE’s participation to ImageCLEF 2009. In Working Notes of the 2009 CLEF Workshop.

[5]

J. Ah-Pine, S. Clinchant, G. Csurka, F. Perronnin, and J.-M. Renders. 2010. Leveraging image, text and cross-media similarities for diversity-focused multimedia retrieval. In ImageCLEF- Experimental Evaluation in Visual Information Retrieval. Vol. INRE, H. MÜller, P. Clough, Th. Deselaers, and B. Caputo (Eds.). Retrieval. Springer, Chapter 3.4.

[6]

A. L. Berger and J. D. Lafferty. 1999. Information retrieval as statistical translation. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 222--229.

Digital Library

[7]

S. Brin and L. Page. 1998a. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30, 1--7 (April 1998), 107--117.

Digital Library

[8]

S. Brin and L. Page. 1998b. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30 (1998), 107--117.

Digital Library

[9]

E. Bruno, N. Moënne-Loccoz, and S. Marchand-Maillet. 2008. Design of multimodal dissimilarity spaces for retrieval of video documents. PAMI 30, 9 (2008), 1520--1533.

Digital Library

[10]

J. C. Caicedo, J. G. Moreno, E. A. Niño, and F. A. González. 2010. Combining visual features and text data for medical image retrieval using latent semantic kernels. In Multimedia Information Retrieval.

Digital Library

[11]

S. Clinchant, G. Csurka, J. Ah-Pine, G. Jacquet, F. Perronnin, J. Sánchez, and K. Minoukadeh. 2010. XRCE’s participation in Wikipedia retrieval, medical image modality classification and ad-hoc retrieval tasks of ImageCLEF 2010. In CLEF (Notebook Papers/LABs/Workshops).

[12]

S. Clinchant, G. Csurka, F. Perronnin, and J.-M. Renders. 2007. XRCE’s participation to ImagEval. In ImageEval Workshop at CVIR.

[13]

S. Clinchant and E. Gaussier. 2010. Information-based models for ad hoc IR. In SIGIR. ACM, 8.

Digital Library

[14]

S. Clinchant, C. Goutte, and É. Gaussier. 2006. Lexical entailment for information retrieval. In Advances in Information Retrieval, 28th European Conference on IR Research (ECIR’06). 217--228.

Digital Library

[15]

S. Clinchant, J. M. Renders, and G. Csurka. 2007. XRCE’s participation to ImageCLEF. In CLEF Working Notes.

[16]

S. Clinchant, J.-M. Renders, and G. Csurka. 2008. Trans--media pseudo--relevance feedback methods in multimedia retrieval. In Advances in Multilingual and Multimodal Information Retrieval. Lecture Notes in Computer Science, Vol. 552. Springer, 569--576.

Digital Library

[17]

N. Craswell and M. Szummer. 2007. Random walks on the click graph. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’07). ACM, New York, NY, USA, 239--246.

Digital Library

[18]

G. Csurka and S. Clinchant. 2012. An empirical study of fusion operators for multimodal image retrieval. In CBMI.

[19]

G. Csurka, S. Clinchant, and A. Popescu. 2011. XRCE’s participation at Wikipedia retrieval of ImageCLEF 2011. In CLEF (Notebook Papers/Labs/Workshop).

[20]

G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray. 2004. Visual categorization with bags of keypoints. In ECCV Workshop on Statistical Learning for Computer Vision.

[21]

H. J. Escalante, C. A. Hernández, L. E. Sucar, and M. Montes y Gómez. 2008. Late fusion of heterogeneous methods for multimedia image retrieval. In MIR.

Digital Library

[22]

M. Franceschet. 2011. PageRank: Standing on the shoulders of giants. Communications of the ACM 54, 6 (2011), 92--101.

Digital Library

[23]

Y. Gao, M. Wang, Z.-J. Zha, J. Shen, X. Li, and X. Wu. 2013. Visual-textual joint relevance learning for tag-based social image search. IEEE Transactions on Image Processing, 22, 1 (2013), 363--376.

Digital Library

[24]

M. Grubinger, P. D. Clough, H. Müller, and T. Deselaers. 2006. The IAPR benchmark: A new evaluation resource for visual information systems. In International Conference on Language Resources and Evaluation, Genoa, Italy.

[25]

W. H. Hsu, L. S. Kennedy, and S.-F. Chang. 2006. Video search reranking via information bottleneck principle. In ACM Multimedia. 35--44.

Digital Library

[26]

W. H. Hsu, L. S. Kennedy, and S.-F. Chang. 2007a. Reranking methods for visual search. IEEE MultiMedia 14, 3 (2007), 14--22.

Digital Library

[27]

W. H. Hsu, L. S. Kennedy, and S.-F. Chang. 2007b. Video search reranking through random walk over document-level context graph. In ACM Multimedia. 971--980.

Digital Library

[28]

N. Jardine and C. J. van Rijsbergen. 1971. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval 7, 5 (1971), 217--240.

[29]

J. Jeon, V. Lavrenko, and R. Manmatha. 2003. Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’03). ACM, New York, NY, 119--126.

Digital Library

[30]

M. Karimzadehgan and C.-X. Zhai. 2010. Estimation of statistical translation models based on mutual information for ad hoc information retrieval. In SIGIR, F. Crestani, S. Marchand-Maillet, H-H. Chen, E.N. Efthimiadis, and J. Savoy (Eds.). ACM, 323--330.

Digital Library

[31]

J. M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM 46, 5 (Sept. 1999), 604--632.

Digital Library

[32]

J. Krapac, M. Allan, J. Verbeek, and F. Jurie. 2010. Improving web-image search results using query-relative classifiers. In IEEE Conference on Computer Vision & Pattern Recognition (CVPR’’10). IEEE Computer Society, San Francisco, CA, 1094--1101.

[33]

A. N. Langville and C. D. Meyer. 2005. A survey of eigenvector methods for web information retrieval. SIAM Reviews 47, 1 (Jan. 2005), 135--161.

Digital Library

[34]

V. Lavrenko, R. Manmatha, and J. Jeon. 2003. A model for learning the semantics of pictures. In NIPS.

[35]

S. Lazebnik, C. Schmid, and J. Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR.

Digital Library

[36]

T.-Y. Liu. 2009. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval 3, 3 (March 2009), 225--331.

Digital Library

[37]

H. Ma, J. Zhu, M. R. Lyu, and I. King. 2010. Bridging the semantic gap between image contents and tags. IEEE Transactions on Multimedia 12, 5 (2010), 462--473.

Digital Library

[38]

J. Magalhães and S. M. RÜger. 2010. An information-theoretic framework for semantic-multimedia retrieval. ACM Transactions on Information and Systems 28, 4 (2010), 19.

Digital Library

[39]

N. Maillot, J.-P. Chevallet, and J.-H. Lim. 2006. Inter-media pseudo-relevance feedback application to ImageCLEF 2006 photo retrieval. In CLEF, C. Peters, P. Clough, F. C. Gey, J. Karlgren, B. Magnini, D. W. Oard, M. de Rijke, and M. Stempfhuber (Eds.). Lecture Notes in Computer Science, Vol. 4730. Springer, 735--738.

Digital Library

[40]

Y. Mori, H. Takahashi, and R. Oka. 1999. Image--to--word transformation based on dividing and vector quantizing images with words. In Proceedings of the First International Workshop on Multimedia Intelligent Storage and Retrieval Management (MISRM'99).

[41]

N. Morioka and J. Wang. 2011. Robust visual reranking via sparsity and ranking constraints. In ACM Multimedia. 533--542.

Digital Library

[42]

H. Müller, P. Clough, Th. Deselaers, and B. Caputo (Eds.). 2010. ImageCLEF- Experimental Evaluation in Visual Information Retrieval. Vol. INRE. Springer.

Digital Library

[43]

A. Natsev, A. Haubold, J. Tešić, L. Xie, and R. Yan. 2007. Semantic concept-based query expansion and re-ranking for multimedia retrieval. In Proceedings of the 15th International Conference on Multimedia (MULTIMEDIA’07). ACM, New York, NY, 991--1000.

Digital Library

[44]

J.-Y. Pan, H.-J. Yang, C. Faloutsos, and P. Duygulu. 2004. Automatic multimedia cross-modal correlation discovery. In KDD. 653--658.

Digital Library

[45]

F. Perronnin and C. Dance. 2007. Fisher Kernels on visual vocabularies for image categorization. In CVPR. IEEE.

[46]

F. Perronnin, J. Sánchez, and T. Mensink. 2010. Improving the Fisher Kernel for large-scale image classification. In ECCV.

Digital Library

[47]

A. Popescu, T. Tsikrika, and J. Kludas. 2010. Overview of the Wikipedia retrieval task at ImageCLEF 2010. In Working Notes of the 11th Workshop of the Cross-Language Evaluation Forum. CLEF-campaign. http://clef2010.org/resources/proceedings/clef2010labs_submission_124.pdf.

[48]

N. Rasiwasia, J. C. Pereira, E. Coviello, G. Doyle, G. R. G. Lanckriet, R. Levy, and N. Vasconcelos. 2010. A new approach to cross-modal multimedia retrieval. In ACM Multimedia.

Digital Library

[49]

S. Rodriguez-Vaamonde, L. Torresani, and A. Fitzgibbon. 2013. What can pictures tell us about web pages&quest;: Improving document search using images. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’13). ACM, New York, NY, 849--852.

Digital Library

[50]

S. Rueger. 2010. Multimedia Information Retrieval. Morgan and Claypool.

Digital Library

[51]

I. Ruthven and M. Lalmas. 2003. A survey on the use of relevance feedback for information access systems. Knowledge Engineering Review 18, 2 (2003), 95--145.

Digital Library

[52]

J. S. Sivic and A. Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. In ICCV.

Digital Library

[53]

A. F. Smeaton, P. Over, and W. Kraaij. 2006. Evaluation campaigns and TRECVid. In Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval (MIR’06). ACM, New York, NY, 321--330.

Digital Library

[54]

C. G. M. Snoek, M. Worring, and A. W. M. Smeulders. 2005. Early versus late fusion in semantic video analysis. In ACM International Conference on Multimedia. 399--402.

Digital Library

[55]

X. Tian, L. Yang, J. Wang, Y. Yang, X. Wu, and X.-S. Hua. 2008. Bayesian video search reranking. In ACM Multimedia. 131--140.

Digital Library

[56]

A. Vinokourov, D. R. Hardoon, and J. Shawe-Taylor. 2003. Learning the semantics of multimedia content with application to web image retrieval and classification. In Proceedings of the Fourth International Symposium on Independent Component Analysis and Blind Source Separation (ICA2003). 697--701.

[57]

M. Wang, X.-S. Hua, R. Hong, J. Tang, G.-J. Qi, and Y. Song. 2009a. Unified video annotation via multigraph learning. IEEE Transactions on Circuits and Systems for Video Technology, 19, 5 (2009), 733--746.

Digital Library

[58]

M. Wang, X.-S. Hua, J. Tang, and R. Hong. 2009b. Beyond distance measurement: Constructing neighborhood similarity for video annotation. IEEE Transactions on Multimedia, 11, 3 (2009), 465--476.

Digital Library

[59]

M. Wang, H. Li, D. Tao, K. Lu, and X. Wu. 2012. Multimodal graph-based reranking for web image search. IEEE Transactions on Image Processing, 21, 11 (2012), 4649--4661.

Digital Library

[60]

X.-J. Wang, W.-Y. Ma, G.-R. Xue, and X. Li. 2004. Multi-model similarity propagation and its application for web image retrieval. In ACM Multimedia. 944--951.

Digital Library

[61]

P. Wilkins, A. F. Smeaton, and P. Ferguson. 2010. Properties of optimally weighted data fusion in CBMIR. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). ACM, New York, NY, 643--650.

Digital Library

[62]

L. Yang and A. Hanjalic. 2010. Supervised reranking for web image search. In ACM Multimedia. 183--192.

Digital Library

[63]

Z.-J. Zha, M. Wang, J. Shen, and T.-S. Chua. 2012. Text mining in multimedia. In Mining Text Data. 361--384.

Cited By

Liu XHe YCheung YXu XWang N(2024)Learning Relationship-Enhanced Semantic Graph for Fine-Grained Image–Text MatchingIEEE Transactions on Cybernetics10.1109/TCYB.2022.317902054:2(948-961)Online publication date: Feb-2024
https://doi.org/10.1109/TCYB.2022.3179020
Yang JLassance CSampaio De Rezende RSrinivasan KRedi MClinchant SLin JChen HDuh WHuang HKato MMothe JPoblete B(2023)AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content CreationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591903(2975-2984)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591903
Tosyali ATavakkol B(2021)A node-based index for clustering validation of graph dataAnnals of Operations Research10.1007/s10479-021-04376-7341:1(197-221)Online publication date: 8-Nov-2021
https://doi.org/10.1007/s10479-021-04376-7
Show More Cited By

Index Terms

Unsupervised Visual and Textual Information Fusion in CBMIR Using Graph-Based Methods
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

Semantic combination of textual and visual information in multimedia retrieval
ICMR '11: Proceedings of the 1st ACM International Conference on Multimedia Retrieval

The goal of this paper is to introduce a set of techniques we call semantic combination in order to efficiently fuse text and image retrieval systems in the context of multimedia information access. These techniques emerge from the observation that ...
Relevance feature mapping for content-based multimedia information retrieval

This paper presents a novel ranking framework for content-based multimedia information retrieval (CBMIR). The framework introduces relevance features and a new ranking scheme. Each relevance feature measures the relevance of an instance with respect to ...
Semantic indexing of multimedia content using textual and visual information

The challenge in multimedia information retrieval remains in the indexing process, an active search area. There are three fundamental techniques for indexing multimedia content: using textual information, using low-level information and combining ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems

ACM Transactions on Information Systems Volume 33, Issue 2

February 2015

181 pages

ISSN:1046-8188

EISSN:1558-2868

DOI:10.1145/2737813

Editor:
Maarten de Rijke
University of Amsterdam, The Netherlands

Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 February 2015

Accepted: 01 October 2014

Revised: 01 April 2014

Received: 01 March 2013

Published in TOIS Volume 33, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

30
Total Citations
View Citations
399
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)1

Reflects downloads up to 04 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu XHe YCheung YXu XWang N(2024)Learning Relationship-Enhanced Semantic Graph for Fine-Grained Image–Text MatchingIEEE Transactions on Cybernetics10.1109/TCYB.2022.317902054:2(948-961)Online publication date: Feb-2024
https://doi.org/10.1109/TCYB.2022.3179020
Yang JLassance CSampaio De Rezende RSrinivasan KRedi MClinchant SLin JChen HDuh WHuang HKato MMothe JPoblete B(2023)AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content CreationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591903(2975-2984)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591903
Tosyali ATavakkol B(2021)A node-based index for clustering validation of graph dataAnnals of Operations Research10.1007/s10479-021-04376-7341:1(197-221)Online publication date: 8-Nov-2021
https://doi.org/10.1007/s10479-021-04376-7
Gialampoukidis IMoumtzidou ABakratsas MVrochidis SKompatsiaris I(2021)A Multimodal Tensor-Based Late Fusion Approach for Satellite Image Search in Sentinel 2 ImagesMultiMedia Modeling10.1007/978-3-030-67835-7_25(294-306)Online publication date: 22-Jun-2021
https://dl.acm.org/doi/10.1007/978-3-030-67835-7_25
Cheng MJing LNg M(2020)Robust Unsupervised Cross-modal Hashing for Multimedia RetrievalACM Transactions on Information Systems10.1145/338954738:3(1-25)Online publication date: 5-Jun-2020
https://dl.acm.org/doi/10.1145/3389547
Huang JChen CYe FHu WZheng Z(2020)Nonuniform Hyper-Network Embedding with Dual MechanismACM Transactions on Information Systems10.1145/338892438:3(1-18)Online publication date: 5-May-2020
https://dl.acm.org/doi/10.1145/3388924
Liu YWu Y(2020)FNEDACM Transactions on Information Systems10.1145/338625338:3(1-33)Online publication date: 5-May-2020
https://dl.acm.org/doi/10.1145/3386253
Jagerman RMarkov IRijke M(2020)Safe Exploration for Optimizing Contextual BanditsACM Transactions on Information Systems10.1145/338567038:3(1-23)Online publication date: 21-Apr-2020
https://dl.acm.org/doi/10.1145/3385670
Oliveira WDorini LMinetto RSilva T(2020)OutdoorSentACM Transactions on Information Systems10.1145/338518638:3(1-28)Online publication date: 21-Apr-2020
https://dl.acm.org/doi/10.1145/3385186
Formal TClinchant SRenders JLee SCho G(2020)Learning to Rank Images with Cross-Modal Graph ConvolutionsAdvances in Information Retrieval10.1007/978-3-030-45439-5_39(589-604)Online publication date: 14-Apr-2020
https://dl.acm.org/doi/10.1007/978-3-030-45439-5_39
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents