skip to main content
10.1145/319950.319956acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article
Free access

Extracting significant time varying features from text

Published: 01 November 1999 Publication History

Abstract

We propose a simple statistical model for the frequency of occurrence of features in a stream of text. Adoption of this model allows us to use classical significance tests to filter the stream for interesting events. We tested the model by building a system and running it on a news corpus. By a subjective evaluation, the system worked remarkably well: almost all of the groups of identified tokens corresponded to news stories and were appropriately placed in time. A preliminary objective evaluation was also used to measure the quality of the system and it showed some of the weaknesses and the power of our approach.

References

[1]
Facts on File, 1996. Facts on File, New York, 1997.
[2]
J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study: Final report. In Proceedings of the DARPA Broadcast News 2kanscription and Understanding Workshop, pages 194-218, 1998.
[3]
R. B. Allen. Timelines as information system interfaces. In Proceedings International Symposium on Digital Libraries, pages 175-180, Tsukuba, Japan, 1995.
[4]
Yvonne M. M. Bishop, Stephen E. Feinberg, and Paul W. Holland. Discrete multivariate analysis: theory and practice. MIT Press, Cambridge, Massachusetts, 1974.
[5]
Ido Dagan and Ronen Feldman. Keyword-based browsing and analysis of large document sets. In Proceedings o/the Symposium on Document Analysis and Information Retrieval (SDAiR-96), Las Vegas, Nevada, 1996.
[6]
D. Fisher, S. Soderland, J. McCarthy, F. Feng, and W. Lehnert. Description of the umass systems as used for muc-6. In Proceedings of the 6th Message Understanding Conjference, November, I995, pages 127-140, 1996.
[7]
Robin L. Kultberg. Dynamic timelines: Visualizing historical information in three dimensions. Master's thesis, Massachusetts Institute of Technology Media Laboratory, 1995.
[8]
E. L. Margulis. Modeling documents with multiple poisson distributions, information Processing and Management, 29(2):215-227, 1993.
[9]
Ron Papka, James Allan, and Victor Lavrenko. Umass approaches to detection and tracking at TDT2. In Proceedings of the DARPA Broadcast Workshop, 1999.
[10]
C. PlaJsant, B. Milash, A. Rose, S. Widoff, and B. Shneiderman. Life lines: Visualizing personal histories. In CHI'96 Conference Proceedings, pages 221-227, Vancouver, BC, Canada, 1996.
[11]
M. Sanderson and W. B. Croft. Deriving structure from texts. In Proceedings o/the 22nd International A CM SIGIR Conference on Research and Development in Information Retrieval (SIGIR99), 1999.
[12]
Jinxi Xu, J. Broglio, and W. B. Croft. The design and implementation of a part of speech tagger for english. Technical Report IR-52, Center for Intelligent Information Retrieval, University of Massachusetts, Amherst, 1994.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '99: Proceedings of the eighth international conference on Information and knowledge management
November 1999
564 pages
ISBN:1581131461
DOI:10.1145/319950
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 1999

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

CIKM99
Sponsor:
CIKM99: Conference on Information and Knowledge Management
November 2 - 6, 1999
Missouri, Kansas City, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)283
  • Downloads (Last 6 weeks)14
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media