skip to main content
research-article

Nowcasting Events from the Social Web with Statistical Learning

Published: 01 September 2012 Publication History

Abstract

We present a general methodology for inferring the occurrence and magnitude of an event or phenomenon by exploring the rich amount of unstructured textual information on the social part of the Web. Having geo-tagged user posts on the microblogging service of Twitter as our input data, we investigate two case studies. The first consists of a benchmark problem, where actual levels of rainfall in a given location and time are inferred from the content of tweets. The second one is a real-life task, where we infer regional Influenza-like Illness rates in the effort of detecting timely an emerging epidemic disease. Our analysis builds on a statistical learning framework, which performs sparse learning via the bootstrapped version of LASSO to select a consistent subset of textual features from a large amount of candidates. In both case studies, selected features indicate close semantic correlation with the target topics and inference, conducted by regression, has a significant performance, especially given the short length --approximately one year-- of Twitter’s data time series.

References

[1]
Asur, S. and Huberman, B. A. 2010. Predicting the future with social media. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology. IEEE, 492--499.
[2]
Bach, F. R. 2008. Bolasso: Model consistent Lasso estimation through the bootstrap. In Proceedings of the 25th International Conference on Machine Learning. 33--40.
[3]
Bartlett, P. L., Mendelson, S., and Neeman, J. 2009. l1-regularized linear regression: Persistence and oracle inequalities. Tech. rep., UC-Berkeley.
[4]
Bollen, J., Mao, H., and Zeng, X. 2011. Twitter mood predicts the stock market. J. Comput. Sci.
[5]
Breiman, L. 1996. Bagging predictors. Mach. Learn. 24, 2, 123--140.
[6]
Corley, C. D., Mikler, A. R., Singh, K. P., and Cook, D. J. 2009. Monitoring influenza trends through mining social media. In Proceedings of the International Conference on Bioinformatics and Computational Biology. 340--346.
[7]
Culotta, A. 2010. Towards detecting influenza epidemics by analyzing Twitter messages. In Proceedings of the KDD Workshop on Social Media Analytics.
[8]
Efron, B. 1979. Bootstrap methods: Another look at the jackknife. Ann. Statist. 7, 1, 1--26.
[9]
Efron, B. and Tibshirani, R. J. 1993. An Introduction to the Bootstrap. Chapman & Hall.
[10]
Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. 2004. Least angle regression. Ann. Statist. 32, 2, 407--451.
[11]
Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., and Brilliant, L. 2008. Detecting influenza epidemics using search engine query data. Nature 457, 7232, 1012--1014.
[12]
Guyon, I. and Elisseeff, A. 2003. An introduction to variable and feature selection. J. Mach. Learn. Resear. 3, 7--8, 1157--1182.
[13]
Jenkins, G. J., Perry, M. C., and Prior, M. J. 2008. The Climate of the United Kingdom and Recent Trends. Met Office, Hadley Centre, Exeter, UK.
[14]
Lampos, V. and Cristianini, N. 2010. Tracking the flu pandemic by monitoring the Social Web. In Proceedings of the 2nd IAPR Workshop on Cognitive Information Processing. IEEE Press, 411--416.
[15]
Lampos, V., De Bie, T., and Cristianini, N. 2010. Flu detector---Tracking epidemics on Twitter. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Springer, 599--602.
[16]
Lv, J. and Fan, Y. 2009. A unified approach to model selection and sparse recovery using regularized least squares. Ann. Statist. 37, 6A, 3498--3528.
[17]
Manning, C. D., Raghavan, P., and Schütze, H. 2008. Introduction to Information Retrieval. Cambridge University Press.
[18]
Pang, B. and Lee, L. 2008. Opinion mining and sentiment analysis. Found. Trends Inf. Retriev. 2, 1--2, 1--135.
[19]
Polgreen, P. M., Chen, Y., Pennock, D. M., Nelson, F. D., and Weinstein, R. A. 2008. Using internet searches for influenza surveillance. Clinical Infectious Diseases 47, 11, 1443--1448.
[20]
Porter, M. F. 1980. An algorithm for suffix stripping. Program 14, 3, 130--137.
[21]
Sakaki, T., Okazaki, M., and Matsuo, Y. 2010. Earthquake shakes Twitter users: Real-time event detection by social sensors. In Proceedings of the 19th International Conference on World Wide Web. 851--860.
[22]
Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. J. Royal Statist. Soc. Series B (Methodological) 58, 1, 267--288.
[23]
Tumasjan, A., Sprenger, T. O., Sandner, P. G., and Welpe, I. M. 2010. Predicting elections with Twitter: What 140 characters reveal about political sentiment. In Proceedings of the International AAAI Conference on Weblogs and Social Media. 178--185.
[24]
Zhao, P. and Yu, B. 2006. On model selection consistency of Lasso. J. Mach. Learn. Resear. 7, 11, 2541--2563.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology  Volume 3, Issue 4
September 2012
410 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/2337542
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2012
Accepted: 01 September 2011
Revised: 01 August 2011
Received: 01 April 2011
Published in TIST Volume 3, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Event detection
  2. LASSO
  3. Twitter
  4. feature selection
  5. social network mining
  6. sparse learning

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)4
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media