skip to main content
10.1145/1835449.1835556acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Properties of optimally weighted data fusion in CBMIR

Published: 19 July 2010 Publication History

Abstract

Content-Based Multimedia Information Retrieval (CBMIR) systems which leverage multiple retrieval experts (En) often employ a weighting scheme when combining expert results through data fusion. Typically however a query will comprise multiple query images (Im) leading to potentially N × M weights to be assigned. Because of the large number of potential weights, existing approaches impose a hierarchy for data fusion, such as uniformly combining query image results from a single retrieval expert into a single list and then weighting the results of each expert. In this paper we will demonstrate that this approach is sub-optimal and leads to the poor state of CBMIR performance in benchmarking evaluations. We utilize an optimization method known as Coordinate Ascent to discover the optimal set of weights (|En| ⋅ |Im|) which demonstrates a dramatic difference between known results and the theoretical maximum. We find that imposing common combinatorial hierarchies for data fusion will half the optimal performance that can be achieved. By examining the optimal weight sets at the topic level, we observe that approximately 15% of the weights (from set |En| ⋅ |Im|) for any given query, are assigned 70%-82% of the total weight mass for that topic. Furthermore we discover that the ideal distribution of weights follows a log-normal distribution. We find that we can achieve up to 88% of the performance of fully optimized query using just these 15% of the weights. Our investigation was conducted on TRECVID evaluations 2003 to 2007 inclusive and ImageCLEFPhoto 2007, totalling 181 search topics optimized over a combined collection size of 661,213 images and 1,594 topic images.

References

[1]
S. M. Beitzel, E. C. Jensen, A. Chowdhury, D. Grossman, O. Frieder, and N. Goharian. Fusion of effective retrieval strategies in the same information retrieval system. Journal of the American Society for Information Science and Technology, 55(10):859--868, 2004.
[2]
N. J. Belkin, C. Cool, W. B. Croft, and J. P. Callan. Effect of multiple query representations on information retrieval system performance. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '93), pages 339--346, Pittsburgh, PA, USA, 1993.
[3]
N. J. Belkin, P. Kantor, E. A. Fox, and J. A. Shaw. Combining the evidence of multiple query representations for information retrieval. Information Processing and Management, 31(3):431--448, 1995.
[4]
P. Clough, M. Grubinger, A. Hanbury, and H. Müller. Overview of the imageclef 2007 photographic retrieval task. In Proceedings of the CLEF 2007 Workshop, LNCS, Budapest, Hungary, 2008.
[5]
W. B. Croft. Combining approaches to information retrieval. Advances in Information Retrieval, pages 1--36, 2000.
[6]
P. Das-Gupta and J. Katzer. A study of the overlap among document representations. SIGIR Forum, 17(4):106--114, 1983.
[7]
E. A. Fox and J. A. Shaw. Combination of Multiple Searches. In Proceedings of the 3rd Text REtrieval Conference (TREC-2), Gaithersburg, MD, USA, 1994.
[8]
L. S. Kennedy, A. P. Natsev, and S.-F. Chang. Automatic discovery of query-class-dependent models for multimodal search. In Proceedings of the 13th annual ACM international conference on Multimedia (MULTIMEDIA '05), pages 882--891, Singapore, Singapore, 2005.
[9]
J. H. Lee. Analyses of multiple evidence combination. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '97), pages 267--276, Philadelphia, Pennsylvania, USA, 1997.
[10]
B. Manjunath, P. Salembier, and T. Sikora, editors. Introduction to MPEG-7: Multimedia Content Description Language. Wiley, 2002.
[11]
M. McCabe, A. Chowdhury, D. Grossman, and O. Frieder. System fusion for improving performance in information retrieval systems. In Proceedings of International Conference on Information Technology: Coding and Computing (ITCC 2001), Las Vegas, NV, USA, 2001.
[12]
K. McDonald and A. F. Smeaton. A comparison of score, rank and probability-based fusion methods for video shot retrieval. In Proceedings of the 4th ACM international Conference on Image and Video Retrieval (CIVR '05), Dublin, Ireland, 2005.
[13]
M. McGill, M. Koll, and T. Noreault. An evaluation of factors affecting document ranking by information retrieval systems. Technical Report NSF-IST-78-10454 to the National Science Foundation (USA), Syracuse University, 1979.
[14]
D. Metzler and W. B. Croft. Linear feature-based models for information retrieval. Information Retrieval, 10(3):257--274, 2007.
[15]
A. P. Natsev, M. R. Naphade, and J. Tesic. Learning the semantics of multimedia queries and concepts from a small number of examples. In Proceedings of the 13th annual ACM international conference on Multimedia (MULTIMEDIA '05), pages 598--607, Singapore, Singapore, 2005.
[16]
T. Saracevic and P. Kantor. A study of information seeking and retrieving, iii: Searchers, searches, overlap. Journal of the American Society for Information Science and Technology (JASIST), 39:177--196, 1988.
[17]
A. F. Smeaton, P. Over, and W. Kraaij. Evaluation Campaigns and TRECVid. In Proceedings of the 8th ACM SIGMM International Workshop on Multimedia information retrieval (MIR 2006), 2006.
[18]
H. Turtle and W. Croft. Evaluation of an Inference Network-based Retrieval Model. ACM Transactions on Informaion Systems, 9(3):187--222, 1991.
[19]
C. C. Vogt and G. W. Cottrell. Fusion Via a Linear Combination of Scores. Information Retrieval, 1(3):151--173, 1999.
[20]
P. Wilkins. An Investigation Into Weighted Data Fusion for Content-Based Multimedia Information Retrieval. PhD thesis, Dublin City University, Glasnevin, Dublin, Ireland, September 2009.
[21]
R. Yan and A. G. Hauptmann. The combination limit in multimedia retrieval. In Proceedings of the eleventh ACM international conference on Multimedia (MULTIMEDIA '03), pages 339--342, Berkeley, CA, USA, 2003.
[22]
R. Yan and A. G. Hauptmann. Probabilistic latent query analysis for combining multiple retrieval sources. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006), pages 324--331, Seattle, Washington, USA, 2006.

Cited By

View all

Index Terms

  1. Properties of optimally weighted data fusion in CBMIR

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
    July 2010
    944 pages
    ISBN:9781450301534
    DOI:10.1145/1835449
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 July 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. content-based
    2. data fusion
    3. multimedia fusion

    Qualifiers

    • Research-article

    Conference

    SIGIR '10
    Sponsor:

    Acceptance Rates

    SIGIR '10 Paper Acceptance Rate 87 of 520 submissions, 17%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 01 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media