skip to main content
10.1145/2433396.2433431acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Mining the web to predict future events

Published: 04 February 2013 Publication History

Abstract

We describe and evaluate methods for learning to forecast forthcoming events of interest from a corpus containing 22 years of news stories. We consider the examples of identifying significant increases in the likelihood of disease outbreaks, deaths, and riots in advance of the occurrence of these events in the world. We provide details of methods and studies, including the automated extraction and generalization of sequences of events from news corpora and multiple web resources. We evaluate the predictive power of the approach on real-world events withheld from the system.

References

[1]
E. Adar, D. S. Weld, B. N. Bershad, and S. D. Gribble. Why we search: visualizing and predicting user behavior. In WWW, 2007.
[2]
A. Ahmed, Q. Ho, J. Eisenstein, E. Xing, A. J. Smola, and C. H. Teo. Unified analysis of streaming news. In Proc. of WWW, 2011.
[3]
J. Allan, editor. Topic detection and tracking: event-based information organization. Kluwer Academic Publishers, Norwell, MA, USA, 2002.
[4]
G. Amodeo, R. Blanco, and U. Brefeld. Hybrid models for future event prediction. In CIKM, 2011.
[5]
S. Asur and B. A. Huberman. Predicting the future with social media, 2010.
[6]
C. Bizer, T. Heath, and T. Berners-Lee. Linked data -- the story so far. IJSWIS, 2009.
[7]
J. Carbonell, Y. Yang, J. Lafferty, R. D. Brown, T. Pierce, and X. Liu. Cmu report on tdt-2: segmentation, detection and tracking, 2000.
[8]
C. Cieri, D. Graff, M. Libermann, N. Martey, and S. Strassel. Large, multilingual, broadcast news corpora for cooperative research in topic detection and tracking: The tdt-2 and tdt-3 corpus efforts. In LREC, 2000.
[9]
J. R. Finkel, T. Grenager, and C. Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of ACL, 2005.
[10]
J. Ginsberg, M. Mohebbi, R. Patel, Brammer, M. L., Smolinski, and L. Brilliant. Detecting influenza epidemics using search engine query data. Nature, 457:1012--1014, 2009.
[11]
O. Glickman, I. Dagan, and M. Koppel. A probabilistic classification approach for lexical textual entailment. In Proc. of AAAI, 2005.
[12]
M. Joshi, D. Das, K. Gimpel, and N. A. Smith. Movie reviews and revenues: An experiment in text regression. In In Proc. of NAACL-HLT, 2010.
[13]
Kalev. Culturomics 2.0: Forecasting large-scale human behavior using global news media tone in time and space. First Monday, 15(9), 2011.
[14]
J. Kleinberg. Bursty and hierarchical structure in streams. In KDD, 2002.
[15]
J. Kleinberg. Temporal dynamics of on-line information systems. Data Stream Management: Processing High-Speed Data Streams. Springer, 2006.
[16]
J. Michel, Y. Shen, A. Aiden, A. Veres, M. Gray, Google Books Team, J. Pickett, D. Hoiberg, D. Clancy, P. Norvig, J. Orwant, S. Pinker, M. Nowak, and E. Aiden. Cholera epidemics in bangladesh: 1985--1991. Journal of Diarrhoeal Diseases Research (JDDR), 10(2):79--86, 1992.
[17]
J. Michel, Y. Shen, A. Aiden, A. Veres, M. Gray, Google Books Team, J. Pickett, D. Hoiberg, D. Clancy, P. Norvig, J. Orwant, S. Pinker, M. Nowak, and E. Aiden. Quantitative analysis of culture using millions of digitized books. Science, 331:176--182, 2011.
[18]
G. Mishne. Predicting movie sales from blogger sentiment. In In AAAI Spring Symposium, 2006.
[19]
R. Nagarajan. Drought Assessment. Springer, 2009.
[20]
K. Radinsky, S. Davidovich, and S. Markovitch. Predicting the news of tomorrow using patterns in web search queries. In WI, 2008.
[21]
K. Radinsky, S. Davidovich, and S. Markovitch. Learning causality for news events prediction. In Proceedings of WWW, 2012.
[22]
D. Richards, editor. Political Complexity: Nonlinear Models of Politics. Ann Arbor: University of Michigan Press, Norwell, MA, USA, 2000.
[23]
R. J. Stoll and D. Subramanian. Hubs, authorities, and networks: Predicting conflict using events data, 2006.
[24]
F. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In Proc. of WWW, 2007.
[25]
C. Yeung and A. Jatowt. Studying how the past is remembered: Towards computational history through large scale text mining. In Proc. of CIKM, 2011.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '13: Proceedings of the sixth ACM international conference on Web search and data mining
February 2013
816 pages
ISBN:9781450318693
DOI:10.1145/2433396
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 February 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. future prediction
  2. news prediction
  3. web knowledge for future prediction

Qualifiers

  • Research-article

Conference

WSDM 2013

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)42
  • Downloads (Last 6 weeks)4
Reflects downloads up to 07 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media