skip to main content
10.1145/2187836.2187940acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Discovering geographical topics in the twitter stream

Published: 16 April 2012 Publication History

Abstract

Micro-blogging services have become indispensable communication tools for online users for disseminating breaking news, eyewitness accounts, individual expression, and protest groups. Recently, Twitter, along with other online social networking services such as Foursquare, Gowalla, Facebook and Yelp, have started supporting location services in their messages, either explicitly, by letting users choose their places, or implicitly, by enabling geo-tagging, which is to associate messages with latitudes and longitudes. This functionality allows researchers to address an exciting set of questions: 1) How is information created and shared across geographical locations, 2) How do spatial and linguistic characteristics of people vary across regions, and 3) How to model human mobility. Although many attempts have been made for tackling these problems, previous methods are either complicated to be implemented or oversimplified that cannot yield reasonable performance. It is a challenge task to discover topics and identify users' interests from these geo-tagged messages due to the sheer amount of data and diversity of language variations used on these location sharing services. In this paper we focus on Twitter and present an algorithm by modeling diversity in tweets based on topical diversity, geographical diversity, and an interest distribution of the user. Furthermore, we take the Markovian nature of a user's location into account. Our model exploits sparse factorial coding of the attributes, thus allowing us to deal with a large and diverse set of covariates efficiently. Our approach is vital for applications such as user profiling, content recommendation and topic tracking. We show high accuracy in location estimation based on our model. Moreover, the algorithm identifies interesting topics based on location and language.

References

[1]
A. Ahmed, E. P. Xing, W. W. Cohen, and R. F. Murphy. Structured correspondence topic models for mining captioned figures in biological literature. In Proceedings of KDD 2009, pages 39--48, New York, NY, USA. ACM.
[2]
A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2:183--202, March 2009.
[3]
C. Chemudugunta, P. Smyth, and M. Steyvers. Modeling general and specific aspects of documents with a probabilistic topic model. In NIPS 2006, pages 241--248.
[4]
Z. Cheng, J. Caverlee, K. Lee, and D. Sui. Exploring millions of footprints in location sharing services. In ICWSM 2011.
[5]
E. Cho, S. A. Myers, and J. Leskovec. Friendship and mobility: user movement in location-based social networks. In Proceedings of KDD 2011, pages 1082--1090, New York, NY, USA. ACM.
[6]
J. Eisenstein, A. Ahmed, and E. Xing. Sparse additive generative models of text. In Proceedings of ICML 2011, pages 1041--1048, New York, NY, USA, June. ACM.
[7]
J. Eisenstein, B. O'Connor, N. A. Smith, and E. P. Xing. A latent variable model for geographic lexical variation. In Proceedings of EMNLP 2010, pages 1277--1287, Stroudsburg, PA, USA. Association for Computational Linguistics.
[8]
A. Gelman, J. Carlin, H. Stern, and D. Rubin. Bayesian Data Analysis 2nd edition. Chapman-Hall, 2003.
[9]
T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, 101(Suppl. 1):5228--5235, April 2004.
[10]
Q. Hao, R. Cai, C. Wang, R. Xiao, J.-M. Yang, Y. Pang, and L. Zhang. Equip tourists with knowledge mined from travelogues. In Proceedings of WWW 2010, pages 401--410, New York, NY, USA. ACM.
[11]
T. Hofmann. Unsupervised learning by Probabilistic Latent Semantic Analysis. Machine Learning, 2001.
[12]
Q. Mei, C. Liu, H. Su, and C. Zhai. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proceedings of WWW 2006, pages 533--542, New York, NY, USA. ACM.
[13]
S. Sizov. Geofolk: latent spatial semantics in web 2.0 social media. In Proceedings of WSDM 2010, pages 281--290, New York, NY, USA. ACM.
[14]
H. M. Wallach. Topic modeling: beyond bag-of-words. In Proceedings of ICML 2006, pages 977--984, New York, NY, USA. ACM.
[15]
C. Wang, J. Wang, X. Xie, and W.-Y. Ma. Mining geographic knowledge using location aware topic model. In Proceedings of the 4th ACM workshop on Geographical information retrieval, GIR '07, pages 65--70, New York, NY, USA, 2007. ACM.
[16]
B. P. Wing and J. Baldridge. Simple supervised document geolocation with geodesic grids. In Proceedings of ACL 2011, pages 955--964, Stroudsburg, PA, USA. Association for Computational Linguistics.
[17]
Z. Yin, L. Cao, J. Han, C. Zhai, and T. Huang. Geographical topic discovery and comparison. In Proceedings of WWW 2011, pages 247--256, New York, NY, USA. ACM.
[18]
X. Zhu, D. M. Blei, and J. Lafferty. TagLDA: Bringing document structure knowledge into topic models. Technical Report TR-1553, University of Wisconsin, Madison, 2006.

Cited By

View all

Index Terms

  1. Discovering geographical topics in the twitter stream

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      WWW '12: Proceedings of the 21st international conference on World Wide Web
      April 2012
      1078 pages
      ISBN:9781450312295
      DOI:10.1145/2187836
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      • Univ. de Lyon: Universite de Lyon

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 16 April 2012

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. geolocation
      2. graphical model
      3. language model
      4. latent variable inference
      5. topic models
      6. twitter
      7. user profiling

      Qualifiers

      • Research-article

      Conference

      WWW 2012
      Sponsor:
      • Univ. de Lyon
      WWW 2012: 21st World Wide Web Conference 2012
      April 16 - 20, 2012
      Lyon, France

      Acceptance Rates

      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)31
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 15 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media