skip to main content
10.1145/1645953.1646118acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Incident threading for news passages

Published: 02 November 2009 Publication History

Abstract

With an overwhelming volume of news reports currently available, there is an increasing need for automatic techniques to analyze and present news to a general reader in a meaningful and efficient manner. We explore incident threading as a possible solution to this problem. All text that describes the occurrence of a real-world happening is merged into a news incident, and incidents are organized in a network with dependencies of predefined types.
Earlier attempts at this problem have assumed that a news story covers a single topic. We move beyond that limitation to introduce passage threading, which processes news at the passage level. First we develop a new testbed for this research and extend the evaluation methods to address new granularity issues. Then a three-stage algorithm is described that identifies on-subject passages, groups them into incidents, and establishes links between related incidents. Finally, we observe significant improvement over earlier work when we optimize the harmonic mean of the appropriate evaluation measures. The resulting performance exceeds the level that a calibration study shows is necessary to support a reading comprehension task.

References

[1]
Allan, J. Topic Detection and Tracking: event-based information organization. Kluwer Academic Publishers, 2002.
[2]
Allan, J. Introduction to Topic Detection and Tracking. In Topic Detection and Tracking: event-based information organization, Kluwer Academic Publishers, pp. 1--16, 2002.
[3]
Allan, J., Carbonell, J., Doddington, G., Yamron, J., and Yang, Y. Topic Detection and Tracking Pilot Study: Final Report. Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, pp 194--218, 1998.
[4]
Beeferman, D., Berger, A., and Lafferty, J. Text Segmentation using Exponential Models. Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, pp. 35--46, 1997.
[5]
Brown, G. and Yule, G. Discourse Analysis. Cambridge University Press. 1983.
[6]
Callan, J., Croft, W. B., and Harding, S. The INQUERY Retrieval System. Proceedings of the 3rd International Conference on Database and Expert Systems Application, pp. 78--83, 1992.
[7]
Cohen, J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, vol. 20, pp. 37--46, 1960.
[8]
Feng, A. and Allan, J. Finding and Linking Incidents in News. Proceedings of the ACM Sixteenth Conference on Information and Knowledge Management, pp. 821--829, 2007.
[9]
Fiscus, J. and Wheatley, B. Overview of the TDT 2004 Evaluation and Results. Topic Detection and Tracking 2004 Evaluation Workshop, NIST, Dec 2--3, 2004.
[10]
Fleiss, J. L. Measuring nominal scale agreement among many raters. Psychological Bulletin, vol. 76(5), pp. 378--382, 1971.
[11]
Grishman, R. and Sundheim, B. Message Understanding Conference -- 6: A Brief History. Proceedings of the 16th International Conference on Computational Linguistics (COLING), pp. 466--471, 1996.
[12]
Hearst, M. A. TextTiling: A Quantitative Approach to Discourse Segmentation. Proceedings of the 32nd annual meeting on Association for Computational Linguistics, pp. 9--16, 1994.
[13]
Jaynes, E. T. Information Theory and Statistical Mechanics. Physics Reviews, vol. 106, pp. 620--630, 1957.
[14]
Nallapati, R., Feng, A., Peng, F., and Allan, J. Event Threading within News Topics. Proceedings of CIKM 2004 conference, pp. 446--453, 2004.
[15]
O'Leary, D. E. The Internet, intranets, and the AI renaissance. Computer, Vol. 30(1), pp. 71--78, 1997.
[16]
Olive, J. Global Autonomous Language Exploitation (GALE). DARPA/IPTO Proposal Information Pamphlet, 2005.
[17]
Robertson, S. E., Walker, S., Honcock--Beaulieu, M., Gull, A., and Lau, M. Okapi in TREC-7: Automatic ad hoc, filtering, VLC and interactive track. The Seventh Text REtrieval Conference (TREC-7), NIST, 1998.
[18]
Schank, R. C. and Abelson, R. P. Scripts, Plans, Goals, and Understanding: an Inquiry into Human Knowledge Structure. Lawrence Erlbaum Associates, 1977.
[19]
Schapire, R. E. and Singer, Y. BoosTexter: A Boosting-based System for Text Categorization. Machine Learning, vol. 39(2/3), pp. 135--168, 2000.
[20]
van Dijk, T. A. Discourse Analysis: Its Development and Application to the Structure of News. The Journal of Communication, 33(2), pp. 20--43, 1983.
[21]
van Dijk, T. A. News as Discourse. Lawrence Erlbaum Associates, 1988.
[22]
Vapnik, V. N. The Nature of Statistical Learning Theory. Springer, 1995.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management
November 2009
2162 pages
ISBN:9781605585123
DOI:10.1145/1645953
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. automatic news organization
  2. incident threading
  3. information overload
  4. passage threading

Qualifiers

  • Research-article

Conference

CIKM '09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media