skip to main content
10.3115/980845.980856dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free access

Trainable, scalable summarization using robust NLP and machine learning

Published: 10 August 1998 Publication History

Abstract

We describe a trainable and scalable summarization system which utilizes features derived from information retrieval, information extraction, and NLP techniques and on-line resources. The system combines these features using a trainable feature combiner learned from summary examples through a machine learning algorithm. We demonstrate system scalability by reporting results on the best combination of summarization features for different document sources. We also present preliminary results from a task-based evaluation on summarization output usability.

References

[1]
Allan, J., J. Callan, B. Croft, L. Ballesteros, J. Broglio, J. Xu, and H. Shu Ellen. 1996. Inquery at trec-5. In Proceedings of The Fifth Text REtrieval Conference (TREC-5).
[2]
Brandow, Ron, Karl Mitze, and Lisa Rau. 1995. Automatic condensation of electronic publications by sentence selection. Information Processing and Management, 31:675--685.
[3]
Edmundson, H. P. 1969. New methods in automatic abstracting. Journal of the Association for Computing Machinery, 16(2):264--228.
[4]
Harman, Donna and Ellen M. Voorhees, editors. 1996. Proceedings of The Fifth Text REtrieval Conference (TREC-5). National Institute of Standards and Technology, Department of Commerce.
[5]
Jing, Y. and B. Croft. 1994. An Association Thesaurus for Information Retrieval. Technical Report 94--17. Center for Intelligent Information Retrieval, University of Massachusetts.
[6]
Johnson, F. C., C. D. Paice, W. J. Black, and A. P. Neal. 1993. The application of linguistic processing to automatic abstract generation. Journal of Documentation and Text Management, 1(3):215--241.
[7]
Jones, Karen Sparck. 1995. Discourse modeling for automatic summaries. In E. Hajicova, M. Cervenka, O. Leska, and P. Sgall, editors, Prague Linguistic Circle Papers, volume 1, pages 201--227.
[8]
Kupiec, Julian, Jan Pedersen, and Francine Chen. 1995. A trainable document summarizer. In Proceedings of the 18th Annual International SIGIR Conference on Research and Development in Information Retrieval, pages 68--73.
[9]
McKeown, Kathleen and Dragomir Radev. 1995. Generating summaries of multiple news articles. In Proceedings of the 18th Annual International SIGIR Conference on Research and Development in Information, pages 74--78.
[10]
Miike, Seiji, Etsuo Itho, Kenji Ono, and Kazuo Sumita. 1994. A full text retrieval system with a dynamic abstract generation function. In Proceedings of 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 152--161.
[11]
Mitra, Mandar, Amit Singhal, and Chris Buckley. 1997. An Automatic Text Summarization and Text Extraction. In Proceedings of Intelligent Scalable Text Summarization Workshop, Association for Computational Linguistics (ACL), pages 39--46.
[12]
Nomoto, T. and Y. Matsumoto. 1997. Data reliability and its effects on automatic abstraction. In Proceedings of the Fifth Workshop on Very Large Corpora.
[13]
Reimer, Ulrich and Udo Hahn. 1988. Text condensation as knowledge base abstraction. In Proceedings of the 4th Conference on Artificial Intelligence Applications (CAIA), pages 338--344.
[14]
Salton, G. and M. McGill, editors. 1983. Introduction to Modern Information Retrieval. McGraw-Hill Book Co., New York, New York.
[15]
Tzoukerman, E., J. Klavans, and C. Jacquemin. 1997. Effective use of naural language processing techniques for automatic conflation of multi-word terms: the role of derivational morphology, part of speech tagging and shallow parsing. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development of Information Retrieval, pages 148--155.
[16]
Voorhees, Ellen M. And Donna Harman. 1996. Overview of the fifth text retrieval conference (trec-5). In Procedings of The Fifth Text REtrieval Conference (TREC-5).

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ACL '98/COLING '98: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1
August 1998
768 pages

Sponsors

  • Government of Canada
  • Université de Montréal

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 10 August 1998

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)57
  • Downloads (Last 6 weeks)12
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media