skip to main content
10.1145/2484028.2484081acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

A general evaluation measure for document organization tasks

Published: 28 July 2013 Publication History

Abstract

A number of key Information Access tasks -- Document Retrieval, Clustering, Filtering, and their combinations -- can be seen as instances of a generic {\em document organization} problem that establishes priority and relatedness relationships between documents (in other words, a problem of forming and ranking clusters). As far as we know, no analysis has been made yet on the evaluation of these tasks from a global perspective. In this paper we propose two complementary evaluation measures -- Reliability and Sensitivity -- for the generic Document Organization task which are derived from a proposed set of formal constraints (properties that any suitable measure must satisfy).
In addition to be the first measures that can be applied to any mixture of ranking, clustering and filtering tasks, Reliability and Sensitivity satisfy more formal constraints than previously existing evaluation metrics for each of the subsumed tasks. Besides their formal properties, its most salient feature from an empirical point of view is their strictness: a high score according to the harmonic mean of Reliability and Sensitivity ensures a high score with any of the most popular evaluation metrics in all the Document Retrieval, Clustering and Filtering datasets used in our experiments.

References

[1]
E. Amigó, J. Artiles, J. Gonzalo, D. Spina, B. Liu, and A. Corujo. WePS3 Evaluation Campaign: Overview of the On-line Reputation Management Task. In 2nd Web People Search Evaluation Workshop (WePS 2010), CLEF 2010 Conference, Padova Italy, 2010.
[2]
E. Amigó, J. Gonzalo, J. Artiles, and F. Verdejo. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retr., 12:461--486, August 2009.
[3]
E. Amigó, J. Gonzalo, and F. Verdejo. A comparison of evaluation metrics for document filtering. In Proceedings of CLEF'11, CLEF'11, pages 38--49, Berlin, Heidelberg, 2011. Springer-Verlag.
[4]
C. Buckley and E. M. Voorhees. Retrieval evaluation with incomplete information. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '04, pages 25--32, New York, NY, USA, 2004. ACM.
[5]
B. Carterette. System effectiveness, user models, and user utility: a conceptual framework for investigation. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, SIGIR '11, pages 903--912, New York, NY, USA, 2011. ACM.
[6]
B. Carterette, E. Kanoulas, and E. Yilmaz. Simulating simple user behavior for system effectiveness evaluation. In CIKM, pages 611--620, 2011.
[7]
O. Chapelle and Y. Zhang. A dynamic bayesian network click model for web search ranking. In WWW, pages 1--10, 2009.
[8]
J. M. Cigarrán, A. Peñas, J. Gonzalo, and F. Verdejo. Automatic selection of noun phrases as document descriptors in an fca-based information retrieval system. In ICFCA, pages 49--63, 2005.
[9]
C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, pages 659--666, 2008.
[10]
J. Cohen. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement, 20(1):37, 1960.
[11]
G. Cormack and T. Lynam. Trec 2005 spam track overview. In Proceedings of the fourteenth Text Retrieval Conference 8TREC 2005), 2005.
[12]
G. V. Cormack and T. R. Lynam. TREC 2005 Spam Track Overview. In Proceedings of the fourteenth Text REtrieval Conference (TREC-2005), 2005.
[13]
M. Halkidi, Y. Batistakis, and M. Vazirgiannis. On Clustering Validation Techniques. Journal of Intelligent Information Systems, 17(2--3):107--145, 2001.
[14]
M. A. Hearst and J. O. Pedersen. Reexamining the cluster hypothesis: Scatter/gather on retrieval results. pages 76--84, 1996.
[15]
B. Hu, Y. Zhang, W. Chen, G. Wang, and Q. Yang. Characterizing search intent diversity into click models. In Proceedings of the 20th international conference on World wide web, WWW '11, pages 17--26, New York, NY, USA, 2011. ACM.
[16]
K. J\"arvelin and J. Kek\"al\"ainen. Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst., 20:422--446, October 2002.
[17]
B. Krishnamurthy, P. Gill, and M. Arlitt. A few chirps about twitter. In WOSP '08: Proceedings of the first workshop on Online social networks, pages 19--24, New York, NY, USA, 2008. ACM.
[18]
A. Leuski. Evaluating document clustering for interactive information retrieval. In CIKM, pages 33--40, 2001.
[19]
M. Meila. Comparing clusterings. In Proceedings of COLT 03, 2003.
[20]
T. M. Mitchell. Machine learning. McGraw Hill, New York, 1997.
[21]
A. Moffat and J. Zobel. Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst., 27(1):2:1--2:27, Dec. 2008.
[22]
A. Rosenberg and J. Hirschberg. V-measure: A conditional entropy-based external cluster evaluation measure. In Proceedings of EMNLP-CoNLL 2007, pages 410--420, 2007.
[23]
M. D. Smucker and C. L. Clarke. Time-based calibration of effectiveness measures. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, SIGIR '12, pages 95--104, New York, NY, USA, 2012. ACM.
[24]
S. Vargas and P. Castells. Rank and relevance in novelty and diversity metrics for recommender systems. In 5th ACM Conference on Recommender Systems (RecSys 2011), pages 109--116, Chicago, Illinois, October 2011.
[25]
E. M. Voorhees. The trec-8 question answering track report. In In Proceedings of TREC-8, pages 77--82, 1999.

Cited By

View all

Index Terms

  1. A general evaluation measure for document organization tasks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
    July 2013
    1188 pages
    ISBN:9781450320344
    DOI:10.1145/2484028
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 July 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tag

    1. effectiveness measures

    Qualifiers

    • Research-article

    Conference

    SIGIR '13
    Sponsor:

    Acceptance Rates

    SIGIR '13 Paper Acceptance Rate 73 of 366 submissions, 20%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)21
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 15 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media