skip to main content
research-article

Unsupervised Visual and Textual Information Fusion in CBMIR Using Graph-Based Methods

Published: 17 February 2015 Publication History

Abstract

Multimedia collections are more than ever growing in size and diversity. Effective multimedia retrieval systems are thus critical to access these datasets from the end-user perspective and in a scalable way. We are interested in repositories of image/text multimedia objects and we study multimodal information fusion techniques in the context of content-based multimedia information retrieval. We focus on graph-based methods, which have proven to provide state-of-the-art performances. We particularly examine two such methods: cross-media similarities and random-walk-based scores. From a theoretical viewpoint, we propose a unifying graph-based framework, which encompasses the two aforementioned approaches. Our proposal allows us to highlight the core features one should consider when using a graph-based technique for the combination of visual and textual information. We compare cross-media and random-walk-based results using three different real-world datasets. From a practical standpoint, our extended empirical analyses allow us to provide insights and guidelines about the use of graph-based methods for multimodal information fusion in content-based multimedia information retrieval.

References

[1]
J. Ah-Pine, M. Bressan, S. Clinchant, G. Csurka, Y. Hoppenot, and J.M. Renders. 2009. Crossing textual and visual content in different application scenarios. Multimedia Tools and Applications 42, 1 (2009), 31--56.
[2]
J. Ah-Pine, C. Cifarelli, S. Clinchant, G. Csurka, and J.M. Renders. 2008. XRCE’s participation to ImageCLEF 2008. In Working Notes of CLEF 2008.
[3]
J. Ah-Pine, S. Clinchant, and G. Csurka. 2010. Comparison of several combinations of multimodal and diversity seeking methods for multimedia retrieval. In Multilingual Information Access Evaluation. Lecture Notes in Computer Science. Springer.
[4]
J. Ah-Pine, S. Clinchant, G. Csurka, and Y. Liu. 2009. XRCE’s participation to ImageCLEF 2009. In Working Notes of the 2009 CLEF Workshop.
[5]
J. Ah-Pine, S. Clinchant, G. Csurka, F. Perronnin, and J.-M. Renders. 2010. Leveraging image, text and cross-media similarities for diversity-focused multimedia retrieval. In ImageCLEF- Experimental Evaluation in Visual Information Retrieval. Vol. INRE, H. MÜller, P. Clough, Th. Deselaers, and B. Caputo (Eds.). Retrieval. Springer, Chapter 3.4.
[6]
A. L. Berger and J. D. Lafferty. 1999. Information retrieval as statistical translation. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 222--229.
[7]
S. Brin and L. Page. 1998a. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30, 1--7 (April 1998), 107--117.
[8]
S. Brin and L. Page. 1998b. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30 (1998), 107--117.
[9]
E. Bruno, N. Moënne-Loccoz, and S. Marchand-Maillet. 2008. Design of multimodal dissimilarity spaces for retrieval of video documents. PAMI 30, 9 (2008), 1520--1533.
[10]
J. C. Caicedo, J. G. Moreno, E. A. Niño, and F. A. González. 2010. Combining visual features and text data for medical image retrieval using latent semantic kernels. In Multimedia Information Retrieval.
[11]
S. Clinchant, G. Csurka, J. Ah-Pine, G. Jacquet, F. Perronnin, J. Sánchez, and K. Minoukadeh. 2010. XRCE’s participation in Wikipedia retrieval, medical image modality classification and ad-hoc retrieval tasks of ImageCLEF 2010. In CLEF (Notebook Papers/LABs/Workshops).
[12]
S. Clinchant, G. Csurka, F. Perronnin, and J.-M. Renders. 2007. XRCE’s participation to ImagEval. In ImageEval Workshop at CVIR.
[13]
S. Clinchant and E. Gaussier. 2010. Information-based models for ad hoc IR. In SIGIR. ACM, 8.
[14]
S. Clinchant, C. Goutte, and É. Gaussier. 2006. Lexical entailment for information retrieval. In Advances in Information Retrieval, 28th European Conference on IR Research (ECIR’06). 217--228.
[15]
S. Clinchant, J. M. Renders, and G. Csurka. 2007. XRCE’s participation to ImageCLEF. In CLEF Working Notes.
[16]
S. Clinchant, J.-M. Renders, and G. Csurka. 2008. Trans--media pseudo--relevance feedback methods in multimedia retrieval. In Advances in Multilingual and Multimodal Information Retrieval. Lecture Notes in Computer Science, Vol. 552. Springer, 569--576.
[17]
N. Craswell and M. Szummer. 2007. Random walks on the click graph. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’07). ACM, New York, NY, USA, 239--246.
[18]
G. Csurka and S. Clinchant. 2012. An empirical study of fusion operators for multimodal image retrieval. In CBMI.
[19]
G. Csurka, S. Clinchant, and A. Popescu. 2011. XRCE’s participation at Wikipedia retrieval of ImageCLEF 2011. In CLEF (Notebook Papers/Labs/Workshop).
[20]
G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray. 2004. Visual categorization with bags of keypoints. In ECCV Workshop on Statistical Learning for Computer Vision.
[21]
H. J. Escalante, C. A. Hernández, L. E. Sucar, and M. Montes y Gómez. 2008. Late fusion of heterogeneous methods for multimedia image retrieval. In MIR.
[22]
M. Franceschet. 2011. PageRank: Standing on the shoulders of giants. Communications of the ACM 54, 6 (2011), 92--101.
[23]
Y. Gao, M. Wang, Z.-J. Zha, J. Shen, X. Li, and X. Wu. 2013. Visual-textual joint relevance learning for tag-based social image search. IEEE Transactions on Image Processing, 22, 1 (2013), 363--376.
[24]
M. Grubinger, P. D. Clough, H. Müller, and T. Deselaers. 2006. The IAPR benchmark: A new evaluation resource for visual information systems. In International Conference on Language Resources and Evaluation, Genoa, Italy.
[25]
W. H. Hsu, L. S. Kennedy, and S.-F. Chang. 2006. Video search reranking via information bottleneck principle. In ACM Multimedia. 35--44.
[26]
W. H. Hsu, L. S. Kennedy, and S.-F. Chang. 2007a. Reranking methods for visual search. IEEE MultiMedia 14, 3 (2007), 14--22.
[27]
W. H. Hsu, L. S. Kennedy, and S.-F. Chang. 2007b. Video search reranking through random walk over document-level context graph. In ACM Multimedia. 971--980.
[28]
N. Jardine and C. J. van Rijsbergen. 1971. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval 7, 5 (1971), 217--240.
[29]
J. Jeon, V. Lavrenko, and R. Manmatha. 2003. Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’03). ACM, New York, NY, 119--126.
[30]
M. Karimzadehgan and C.-X. Zhai. 2010. Estimation of statistical translation models based on mutual information for ad hoc information retrieval. In SIGIR, F. Crestani, S. Marchand-Maillet, H-H. Chen, E.N. Efthimiadis, and J. Savoy (Eds.). ACM, 323--330.
[31]
J. M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM 46, 5 (Sept. 1999), 604--632.
[32]
J. Krapac, M. Allan, J. Verbeek, and F. Jurie. 2010. Improving web-image search results using query-relative classifiers. In IEEE Conference on Computer Vision & Pattern Recognition (CVPR’’10). IEEE Computer Society, San Francisco, CA, 1094--1101.
[33]
A. N. Langville and C. D. Meyer. 2005. A survey of eigenvector methods for web information retrieval. SIAM Reviews 47, 1 (Jan. 2005), 135--161.
[34]
V. Lavrenko, R. Manmatha, and J. Jeon. 2003. A model for learning the semantics of pictures. In NIPS.
[35]
S. Lazebnik, C. Schmid, and J. Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR.
[36]
T.-Y. Liu. 2009. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval 3, 3 (March 2009), 225--331.
[37]
H. Ma, J. Zhu, M. R. Lyu, and I. King. 2010. Bridging the semantic gap between image contents and tags. IEEE Transactions on Multimedia 12, 5 (2010), 462--473.
[38]
J. Magalhães and S. M. RÜger. 2010. An information-theoretic framework for semantic-multimedia retrieval. ACM Transactions on Information and Systems 28, 4 (2010), 19.
[39]
N. Maillot, J.-P. Chevallet, and J.-H. Lim. 2006. Inter-media pseudo-relevance feedback application to ImageCLEF 2006 photo retrieval. In CLEF, C. Peters, P. Clough, F. C. Gey, J. Karlgren, B. Magnini, D. W. Oard, M. de Rijke, and M. Stempfhuber (Eds.). Lecture Notes in Computer Science, Vol. 4730. Springer, 735--738.
[40]
Y. Mori, H. Takahashi, and R. Oka. 1999. Image--to--word transformation based on dividing and vector quantizing images with words. In Proceedings of the First International Workshop on Multimedia Intelligent Storage and Retrieval Management (MISRM'99).
[41]
N. Morioka and J. Wang. 2011. Robust visual reranking via sparsity and ranking constraints. In ACM Multimedia. 533--542.
[42]
H. Müller, P. Clough, Th. Deselaers, and B. Caputo (Eds.). 2010. ImageCLEF- Experimental Evaluation in Visual Information Retrieval. Vol. INRE. Springer.
[43]
A. Natsev, A. Haubold, J. Tešić, L. Xie, and R. Yan. 2007. Semantic concept-based query expansion and re-ranking for multimedia retrieval. In Proceedings of the 15th International Conference on Multimedia (MULTIMEDIA’07). ACM, New York, NY, 991--1000.
[44]
J.-Y. Pan, H.-J. Yang, C. Faloutsos, and P. Duygulu. 2004. Automatic multimedia cross-modal correlation discovery. In KDD. 653--658.
[45]
F. Perronnin and C. Dance. 2007. Fisher Kernels on visual vocabularies for image categorization. In CVPR. IEEE.
[46]
F. Perronnin, J. Sánchez, and T. Mensink. 2010. Improving the Fisher Kernel for large-scale image classification. In ECCV.
[47]
A. Popescu, T. Tsikrika, and J. Kludas. 2010. Overview of the Wikipedia retrieval task at ImageCLEF 2010. In Working Notes of the 11th Workshop of the Cross-Language Evaluation Forum. CLEF-campaign. http://clef2010.org/resources/proceedings/clef2010labs_submission_124.pdf.
[48]
N. Rasiwasia, J. C. Pereira, E. Coviello, G. Doyle, G. R. G. Lanckriet, R. Levy, and N. Vasconcelos. 2010. A new approach to cross-modal multimedia retrieval. In ACM Multimedia.
[49]
S. Rodriguez-Vaamonde, L. Torresani, and A. Fitzgibbon. 2013. What can pictures tell us about web pages?: Improving document search using images. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’13). ACM, New York, NY, 849--852.
[50]
S. Rueger. 2010. Multimedia Information Retrieval. Morgan and Claypool.
[51]
I. Ruthven and M. Lalmas. 2003. A survey on the use of relevance feedback for information access systems. Knowledge Engineering Review 18, 2 (2003), 95--145.
[52]
J. S. Sivic and A. Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. In ICCV.
[53]
A. F. Smeaton, P. Over, and W. Kraaij. 2006. Evaluation campaigns and TRECVid. In Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval (MIR’06). ACM, New York, NY, 321--330.
[54]
C. G. M. Snoek, M. Worring, and A. W. M. Smeulders. 2005. Early versus late fusion in semantic video analysis. In ACM International Conference on Multimedia. 399--402.
[55]
X. Tian, L. Yang, J. Wang, Y. Yang, X. Wu, and X.-S. Hua. 2008. Bayesian video search reranking. In ACM Multimedia. 131--140.
[56]
A. Vinokourov, D. R. Hardoon, and J. Shawe-Taylor. 2003. Learning the semantics of multimedia content with application to web image retrieval and classification. In Proceedings of the Fourth International Symposium on Independent Component Analysis and Blind Source Separation (ICA2003). 697--701.
[57]
M. Wang, X.-S. Hua, R. Hong, J. Tang, G.-J. Qi, and Y. Song. 2009a. Unified video annotation via multigraph learning. IEEE Transactions on Circuits and Systems for Video Technology, 19, 5 (2009), 733--746.
[58]
M. Wang, X.-S. Hua, J. Tang, and R. Hong. 2009b. Beyond distance measurement: Constructing neighborhood similarity for video annotation. IEEE Transactions on Multimedia, 11, 3 (2009), 465--476.
[59]
M. Wang, H. Li, D. Tao, K. Lu, and X. Wu. 2012. Multimodal graph-based reranking for web image search. IEEE Transactions on Image Processing, 21, 11 (2012), 4649--4661.
[60]
X.-J. Wang, W.-Y. Ma, G.-R. Xue, and X. Li. 2004. Multi-model similarity propagation and its application for web image retrieval. In ACM Multimedia. 944--951.
[61]
P. Wilkins, A. F. Smeaton, and P. Ferguson. 2010. Properties of optimally weighted data fusion in CBMIR. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). ACM, New York, NY, 643--650.
[62]
L. Yang and A. Hanjalic. 2010. Supervised reranking for web image search. In ACM Multimedia. 183--192.
[63]
Z.-J. Zha, M. Wang, J. Shen, and T.-S. Chua. 2012. Text mining in multimedia. In Mining Text Data. 361--384.

Cited By

View all

Index Terms

  1. Unsupervised Visual and Textual Information Fusion in CBMIR Using Graph-Based Methods

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Information Systems
    ACM Transactions on Information Systems  Volume 33, Issue 2
    February 2015
    181 pages
    ISSN:1046-8188
    EISSN:1558-2868
    DOI:10.1145/2737813
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 February 2015
    Accepted: 01 October 2014
    Revised: 01 April 2014
    Received: 01 March 2013
    Published in TOIS Volume 33, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Content-based multimedia information retrieval
    2. Visual reranking
    3. cross-media similarity
    4. graph-based methods
    5. information fusion
    6. random walk

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 04 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media