skip to main content
10.1145/502585.502647acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Summarization as feature selection for text categorization

Published: 05 October 2001 Publication History

Abstract

We address the problem of evaluating the effectiveness of summarization techniques for the task of document categorization. It is argued that for a large class of automatic categorization algorithms, extraction-based document categorization can be viewed as a particular form of feature selection performed on the full text of the document and, in this context, its impact can be compared with state-of-the-art feature selection techniques especially devised to provide good categorization performance. Such a framework provides for a better assessment of the expected performance of a categorizer if the compression rate of the summarizer is known.

References

[1]
C. Apte F. Damerau, and S. M. Weiss. Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 12(3):233-251, 1994.]]
[2]
R. K. Belew. Finding out about: A Cognitive Perspective on Search Engine Technology and the WWW. Cambridge University Press, 2000.]]
[3]
R. Brandow, K. Mitze, and L. F. Rau. Automatic condensation of electronic publications by sentence selection. Information Processing and Management, 31(5):675-685, 1995.]]
[4]
S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In Proceedings of 7th International Conference on Information and Knowledge Management, pages 229-237, 1998.]]
[5]
H. P. Edmundson. New methods in automatic extracting. Technical report, Department of Computer Science, University of Maryland at College Park, 1969.]]
[6]
V. Ganti, J. Gehrke, and R. Ramakrishnan. CACTUS - clustering categorical data using summaries. In Knowledge Discovery and Data Mining, pages 73-83, 1999.]]
[7]
T. F. Hand and B. Sundheim. TIPSTER-SUMMAC summarization evaluation. In Proceedings of the TIPSTER Text Phase III Workshop, 1998.]]
[8]
H. Jing, R. Barzilay, K. McKeown, and M. Elhadad. Summarization evaluation methods: Experiments and analysis. In AAAI Intelligent Text Summarization Workshop, pages 60-68, 1998.]]
[9]
H. Jing and K. McKeown. The decomposition of human-written summary sentences. In Proceedings of the 2Znd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 60-68, 1999.]]
[10]
T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the Tenth European Conference on Machine Learning (ECML-98), pages 137-142, 1998.]]
[11]
S. J. Ker and J. N. Chen. A text categorization based on a summarization technique. In ACT, '2000 Workshop on Recent Advances in Natural Language Processing and Information Retrieval, 2000.]]
[12]
A. Kolcz and J. Alspector. Asymmetric missing-data problems: overcoming the lack of negative data in preference ranking. to appear in Information Retrieval, 2001.]]
[13]
J. T. Y. Kwok. Automated text categorization using support vector machine. In Proceedings of the International Conference on Neural Information Processing (ICONIP), pages 347-351, 1999.]]
[14]
D. D. Lewis. Evaluating text categorization. In Proceedings of Speech and Natural Language Workshop, pages 312.-318. Morgan Kaufmann, 1991.]]
[15]
D. D. Lewis. Naive (Bayes) at forty: the independence assumption in information retrieval. In Proceedings of the 10th European Conference on Machine Learning, pages 4-15, 1998.]]
[16]
II. P. Luhn. The automatic creation of literature abstracts. In IRE National Convention, pages 60-68, 1958.]]
[17]
K. Mahesh. Hypertext summary extraction for fast document browsing. In Working Notes of the AAAl Spring Symposium on Natural Language Processing for the World Wide Web, pages 95-103, 1997.]]
[18]
D. MladeniC and M. Grobelnik. Feature selection for classification based on text hierarchy. In Working notes of Learning from Text and the Web: Conference on Automatic Learning and Discovery (CONALD-98), 1998.]]
[19]
B. Raskutti, H. FerrB, and A. Kowalczyk. Second order features for maximising text classification performance. In Proceedings of the 12th European Conference on Machine Learning, 2001.]]
[20]
A. Tombros, M. Sanderson, and P. Gray. Advantages of query based summaries in information retrieval. In Worlcing Notes of the AAAI Spring Symposium on Natural Language Processing for the World Wide Web, pages 44.-52, 1998.]]
[21]
V. N. Vapnik. Statistical Learning Theory. John Wiley, New York, 1998.]]
[22]
S. M. Weiss, B. F. White, C. Apte and F. Damerau. Lightweight document matching for help-desk applications. IEEE Intelligent Systems, 15(2), 2000.]]
[23]
Y. Yang and X. Liu. A re-examination of text categorization methods. In Proceedings of the 22nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 42-49, 1999.]]
[24]
Y. Yang and J. P. Pedersen. A comparative study on feature selection in text categorization. In Proceedings of the Fourteenth International Conference on Machine Learning (ICML'97), pages 412-420, 1997.]]

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '01: Proceedings of the tenth international conference on Information and knowledge management
October 2001
616 pages
ISBN:1581134363
DOI:10.1145/502585
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2001

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

CIKM01
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media