skip to main content
10.5555/2886521.2886652guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Gazetteer-independent toponym resolution using geographic word profiles

Published: 25 January 2015 Publication History

Abstract

Toponym resolution, or grounding names of places to their actual locations, is an important problem in analysis of both historical corpora and present-day news and web content. Recent approaches have shifted from rule-based spatial minimization methods to machine learned classifiers that use features of the text surrounding a toponym. Such methods have been shown to be highly effective, but they crucially rely on gazetteers and are unable to handle unknown place names or locations. We address this limitation by modeling the geographic distributions of words over the earth's surface: we calculate the geographic profile of each word based on local spatial statistics over a set of geo-referenced language models. These geo-profiles can be further refined by combining in-domain data with background statistics from Wikipedia. Our resolver computes the overlap of all geo-profiles in a given text span; without using a gazetteer, it performs on par with existing classifiers. When combined with a gazetteer, it achieves state-of-the-art performance for two standard toponym resolution corpora (TR-CoNLL and Civil War). Furthermore, it dramatically improves recall when toponyms are identified by named entity recognizers, which often (correctly) find non-standard variants of toponyms.

References

[1]
Backstrom, L.; Kleinberg, J.; Kumar, R.; and Novak, J. 2008. Spatial variation in search engine queries. In Proc. of the 17th International Conference on World Wide Web, WWW '08, 357-366. New York, NY, USA: ACM.
[2]
Cheng, Z.; Caverlee, J.; and Lee, K. 2010. You are where you tweet: a content-based approach to geo-locating Twitter users. In Proceedings of the 19th ACM international conference on Information and knowledge management, 759-768. ACM.
[3]
Daoud, M., and Huang, J. X. 2013. Mining query-driven contexts for geographic and temporal search. International Journal of Geographical Information Science 27(8):1530-1549.
[4]
Eisenstein, J.; O'Connor, B.; Smith, N. A.; and Xing, E. P. 2010. A latent variable model for geographic lexical variation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 1277-1287. Association for Computational Linguistics.
[5]
Finkel, J. R., and Manning, C. D. 2009. Nested named entity recognition. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1, 141-150. Association for Computational Linguistics.
[6]
Finkel, J. R.; Grenager, T.; and Manning, C. 2005. Incorporating non-local information into information extraction systems by Gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 363-370. Association for Computational Linguistics.
[7]
Grover, C.; Tobin, R.; Byrne, K.; Woollard, M.; Reid, J.; Dunn, S.; and Ball, J. 2010. Use of the Edinburgh geoparser for georefer-encing digitized historical collections. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 368(1925):3875-3889.
[8]
Leidner, J. L. 2008. Toponym Resolution in Text : Annotation, Evaluation and Applications of Spatial Grounding of Place Names. Boca Raton, FL, USA: Universal Press.
[9]
Lieberman, M. D., and Samet, H. 2012. Adaptive context features for toponym resolution in streaming news. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, 731-740. ACM.
[10]
Lieberman, M. D.; Samet, H.; and Sankaranarayanan, J. 2010. Geotagging with local lexicons to build indexes for textually-specified spatial data. In Data Engineering (ICDE), 2010 IEEE 26th International Conference on, 201-212. IEEE.
[11]
Nesbit, S. 2013. In Zander, J., and Mosterman, P. J., eds., Computation for Humanity: Information Technology to Advance Society. New York: Taylor & Francis. chapter Visualizing Emancipation: Mapping the End of Slavery in the American Civil War, 427-435.
[12]
Ord, J. K., and Getis, A. 1995. Local spatial autocorrelation statistics: distributional issues and an application. Geographical analysis 27(4):286-306.
[13]
O'Sullivan, D., and Unwin, D. J. 2010. Geographic Information Analysis. Hoboken, New Jersey: John Wiley & Sons.
[14]
Roller, S.; Speriosu, M.; Rallapalli, S.; Wing, B.; and Baldridge, J. 2012. Supervised text-based geolocation using language models on an adaptive grid. In Proce. of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 1500-1510.
[15]
Santos, J.; Anastácio, I.; and Martins, B. 2014. Using machine learning methods for disambiguating place references in textual documents. GeoJournal 1-18.
[16]
Smith, D. A., and Crane, G. 2001. Disambiguating geographic names in a historical digital library. In Research and Advanced Technology for Digital Libraries. Springer. 127-136.
[17]
Speriosu, M., and Baldridge, J. 2013. Text-driven toponym resolution using indirect supervision. In ACL (1), 1466-1476.
[18]
Speriosu, M. 2013. Methods and Applications of Text-Driven Toponym Resolution with Indirect Supervision. Ph.D. Dissertation, University of Texas at Austin.
[19]
Wing, B. P., and Baldridge, J. 2011. Simple supervised document geolocation with geodesic grids. In Proc. of the 49th Annual Meeting of the Assoc. for Computational Linguistics: Human Language Technologies-Volume 1, 955-964.
[20]
Wing, B., and Baldridge, J. 2014. Hierarchical discriminative classification for text-based geolocation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 336-348.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
AAAI'15: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence
January 2015
4331 pages
ISBN:0262511290

Sponsors

  • Association for the Advancement of Artificial Intelligence

Publisher

AAAI Press

Publication History

Published: 25 January 2015

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media