skip to main content
10.1145/502585.502603acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Combining multiple classifiers for text categorization

Published: 05 October 2001 Publication History

Abstract

A major problem facing online information services is how to index and supplement large document collections with respect to a rich set of categories. We focus upon the routing of case law summaries to various secondary law volumes in which they should be cited. Given the large number (> 13,000) of closely related categories, this is a challenging task that is unlikely to succumb to a single algorithmic solution. Our fully implemented and recently deployed system shows that a superior classification engine for this task can be constructed from a combination of classifiers. The multi-classifier approach helps us leverage all the relevant textual features and meta data, and appears to generalize to related classification tasks.

References

[1]
Bartell, B. T., Cottrell, G. W., & Belew, R. K. (1994). Automated combination of multiple ranked retrieval systems. Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 173-181.
[2]
Cohen, W. W. & Singer Y. (1996). Context-sensitive learning methods for text categorization. Proceedings of the 19th Annual International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 307- 315.
[3]
Dumais, S., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive learning algorithms and representations for text categorization. Proceedings of the 7th International Conference on Information and Knowledge Management, pp. 148-155.
[4]
Hayes, P. J. and Weinstein, S. P. (1990). CONSTRUE/TIS: A system for content-based indexing of a database of news stories. In 2nd Annual Conference on Innovative Applications of Artificial Intelligence, pp. 1-5.
[5]
Iyer, R. D., Lewis, D. D., Schapire, R. E., Singer, Y. & Singhal, A. (2000). Boosting for document routing. Proceedings of the 9th International Conference on Information and Knowledge Management, pp. 70-77
[6]
Joachims, T. (1996). A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. Technical Report, Carnegie Mellon University, CMU-CS-96-118.
[7]
Larkey, L. & Croft, W. B. (1996). Combining classifiers in text categorization. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 289-297.
[8]
Lewis, D. D. & Ringuette, M. (1994). A comparison of two learning algorithms for text categorization. 3rd Annual Symposium of Document Analysis and Information Retrieval, pp. 81-93.
[9]
McCallum, A. & Nigam, K. (1998). A comparison of event models for naive Bayes classification. AAAHCML-98 Workshop on Learning for Text Categorization, Technical Report WS-98-05, AAAI Press.
[10]
Papka, R. & Allan, J. (1998). Document Classification using Multiword Features. Proceedings of the 7th International Conference on Information and Knowledge Management, pp. 124- 131.
[11]
Ponte, J. M. & Croft, W. B. (1998). A language modeling approach to information retrieval. Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275-281.
[12]
Salton, G. (1971). Automatic Text Processing. Reading, MA: Addison Wesley.
[13]
Salton, G. & Buckley, C. (1988). Term weighting approaches in automatic text retrieval. Information Processing and Management, Vol. 24, No. 5, pp. 513- 523.
[14]
Stewart, C. V. (1999). Robust parameter estimation in computer vision. SIAM Review, Vol. 41, No. 3, pp. 513-537.
[15]
Turner, K. & Ghosh, J. (1999). Linear and order statistics combiners for pattern classification. In Sharkey, A. (ed.) "Combining Artificial Neural Networks," Springer Verlag, pp. 127-162.
[16]
Van Rijsbergen, C. J. Information Retrieval. Butterworths, London, 1979.
[17]
Voorhees, E. M. (1994). Query expansion using lexical-semantic relations. Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 311-317.
[18]
Yang, Y. (1994). Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. Proceedings of the 17th Annual International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 13-22.
[19]
Yang, Y. & Liu, X. (1999). A re-examination of text categorization methods. Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42-49.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '01: Proceedings of the tenth international conference on Information and knowledge management
October 2001
616 pages
ISBN:1581134363
DOI:10.1145/502585
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2001

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. document classification
  2. multi-classifier

Qualifiers

  • Article

Conference

CIKM01
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media