skip to main content
10.1145/3423334.3431450acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article
Open access

From PIace2Vec to Multi-Scale Built-Environment Representation: A General-Purpose Distributional Embedding for Urban Data Analysis

Published: 06 November 2020 Publication History

Abstract

Built environments like cities, roads, communities are rich sources of urban data. Many downstream applications require comprehensive analysis like geographic information retrieval, recommender systems, geographic knowledge graphs, and in general, understanding urban spaces [28]. Points of Interests (POI), as one of the most researched aspects of urban data, has been successfully modeled using concepts borrowed from Machine Learning (ML) and Natural Language Processing (NLP). In the work of Place2Vec [28], a Word2Vec-like statistical model is proposed to represent spatial adjacency with a continuous embedding space. This method successfully models the functional semantics of POIs with regard to several human-assessment based evaluations. However, though the Place2Vec model addresses the distributional heterogeneity within a given spatial context with ITDL augmentation, it does not address the spatial heterogeneity among different regions. To solve this problem, we propose to introduce a hierarchical, density-based, self-adjusting clustering mechanism. The boundary of relatedness and unrelatedness is learned from the given context, where denser areas have tighter bounds while sparser areas have looser ones. We train our model on both the baseline Yelp hierarchical dataset [28] and our OpenStreetMap dataset. We demonstrate that 1) our model significantly improves the performance on 2 of the 3 baseline tasks and the stability of training, and 2) our model generalizes excellently across 112 cities of radically different scales (minimum 1725 POIs, maximum 2694070 POIs), regions (North America, Europe, Asia, Africa) and types (commercial, touristy, industrial, etc.) without the need of adjusting or tuning any hyperparameters. We also demonstrate that our model can be used to discover interesting facts about cities like inter-city semantic analogy and intra-city connectivity, which can be very useful in urban planning, social computing and public policy making.

References

[1]
Pierre Baldi. 2012. Autoencoders, Unsupervised Learning, and Deep Architectures. In Proceedings of ICML Workshop on Unsupervised and Transfer Learning (Proceedings of Machine Learning Research), Isabelle Guyon, Gideon Dror, Vincent Lemaire, Graham Taylor, and Daniel Silver (Eds.), Vol. 27. PMLR, Bellevue, Washington, USA, 37--49. http://proceedings.mlr.press/v27/baldi12a.html
[2]
Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinícius Flores Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, Çaglar Gülçehre, H. Francis Song, Andrew J. Ballard, Justin Gilmer, George E. Dahl, Ashish Vaswani, Kelsey R. Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess, Daan Wierstra, Pushmeet Kohli, Matthew Botvinick, Oriol Vinyals, Yujia Li, and Razvan Pascanu. 2018. Relational inductive biases, deep learning, and graph networks. CoRR abs/1806.01261 (2018). arXiv:1806.01261 http://arxiv.org/abs/1806.01261
[3]
Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation Learning: A Review and New Perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 8 (Aug. 2013), 1798--1828. https://doi.org/10.1109/TPAMI.2013.50
[4]
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. 2003. A Neural Probabilistic Language Model. J. Mach. Learn. Res. 3 (March 2003), 1137--1155. http://dl.acm.org/citation.cfm?id=944919.944966
[5]
Miguel Á. Carreira-Perpiñán. 2015. A review of mean-shift algorithms for clustering. CoRR abs/1503.00687 (2015). arXiv:1503.00687 http://arxiv.org/abs/1503.00687
[6]
Sumit Chopra, Trivikraman Thampy, John Leahy, Andrew Caplin, and Yann LeCun. 2007. Discovering the Hidden Structure of House Prices with a Non-parametric Latent Manifold Model. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '07). ACM, New York, NY, USA, 173--182. https://doi.org/10.1145/1281192.1281214
[7]
Anne Cocos and Chris Callison-Burch. 2017. The Language of Place: Semantic Value from Geospatial Context. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Association for Computational Linguistics, Valencia, Spain, 99--104. https://www.aclweb.org/anthology/E17-2016
[8]
Shanshan Feng, Gao Cong, Bo An, and Yeow Meng Chee. 2017. POI2Vec: Geographical Latent Representation for Predicting Future Visitors. https://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14902
[9]
Paul A Gagniuc. 2017. Markov chains: from theory to implementation and experimentation. John Wiley & Sons.
[10]
Krzysztof Janowicz, Martin Raubal, and Werner Kuhn. 2011. The semantics of similarity in geographic information retrieval. J. Spatial Information Science 2 (2011), 29--57.
[11]
Jay J. Jiang and David W. Conrath. 1997. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In Proceedings of the 10th Research on Computational Linguistics International Conference. The Association for Computational Linguistics and Chinese Language Processing (ACLCLP), Taipei, Taiwan, 19--33. https://www.aclweb.org/anthology/O97-1002
[12]
Andrea Lancichinetti and Santo Fortunato. 2009. Community detection algorithms: A comparative analysis. Physical Review E 80, 5 (Nov 2009). https://doi.org/10.1103/physreve.80.056117
[13]
Claudia Leacock and Martin Chodorow. 1998. Combining local context and wordnet similarity for word sense identification.
[14]
Omer Levy and Yoav Goldberg. 2014. Linguistic Regularities in Sparse and Explicit Word Representations. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning. Association for Computational Linguistics, Ann Arbor, Michigan, 171--180. https://doi.org/10.3115/v1/W14-1618
[15]
Omer Levy and Yoav Goldberg. 2014. Neural Word Embedding As Implicit Matrix Factorization. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 (NIPS'14). MIT Press, Cambridge, MA, USA, 2177--2185. http://dl.acm.org/citation.cfm?id=2969033.2969070
[16]
Dekang Lin. 1998. An Information-Theoretic Definition of Similarity. In Proceedings of the Fifteenth International Conference on Machine Learning (ICML '98). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 296--304.
[17]
Kang LIu, Song Gao, Peiyuan Qiu, Xiliang Liu, Bo Yan, and Feng Lu. 2017. Road2Vec: Measuring Traffic Interactions in Urban Road System from Massive Travel Routes. International Journal of Geo-Information 6 (10 2017), 321. https://doi.org/10.3390/ijgi6110321
[18]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579--2605.
[19]
Christopher D Manning, Christopher D Manning, and Hinrich Schütze. 1999. Foundations of statistical natural language processing. MIT press.
[20]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arXiv:cs.CL/1301.3781
[21]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. CoRR abs/1310.4546 (2013). arXiv:1310.4546 http://arxiv.org/abs/1310.4546
[22]
Ofir Press and Lior Wolf. 2016. Using the Output Embedding to Improve Language Models. CoRR abs/1608.05859 (2016). arXiv:1608.05859 http://arxiv.org/abs/1608.05859
[23]
David Sánchez, Montserrat Batet, and David Isern. 2011. Ontology-Based Information Content Computation. Know.-Based Syst. 24, 2 (March 2011), 297--303. https://doi.org/10.1016/j.knosys.2010.10.001
[24]
Nuno Seco, Tony Veale, and Jer Hayes. 2004. An Intrinsic Information Content Metric for Semantic Similarity in WordNet. In Proceedings of the 16th European Conference on Artificial Intelligence (ECAI'04). IOS Press, NLD, 1089--1090.
[25]
Stanislav Sobolevsky, Riccardo Campari, Alexander Belyi, and Carlo Ratti. 2013. A General Optimization Technique for High Quality Community Detection in Complex Networks. CoRR abs/1308.3508 (2013). arXiv:1308.3508 http://arxiv.org/abs/1308.3508
[26]
M.P. Wand and M.C. Jones. 1994. Kernel Smoothing. Taylor & Francis. https://books.google.com/books?id=GTOOi5yE008C
[27]
Zhibiao Wu and Martha Palmer. 1994. Verbs Semantics and Lexical Selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics (ACL '94). Association for Computational Linguistics, USA, 133--138. https://doi.org/10.3115/981732.981751
[28]
Bo Yan, Krzysztof Janowicz, Gengchen Mai, and Song Gao. 2017. From ITDL to Place2Vec: Reasoning About Place Type Similarity and Relatedness by Learning Embeddings From Augmented Spatial Contexts. In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL '17). ACM, New York, NY, USA, Article 35, 10 pages. https://doi.org/10.1145/3139958.3140054
[29]
Yao Yao, Xia Li, Xiaoping Liu, Penghua Liu, Zhaotang Liang, Jinbao Zhang, and Ke Mai. 2017. Sensing spatial distribution of urban land use by integrating points-of-interest and Google Word2Vec model. International Journal of Geographical Information Science 31, 4 (2017), 825--848. https://doi.org/10.1080/13658816.2016.1244608 arXiv:https://doi.org/10.1080/13658816.2016.1244608
[30]
Chao Zhang, Keyang Zhang, Quan Yuan, Haoruo Peng, Yu Zheng, Tim Hanratty, Shaowen Wang, and Jiawei Han. 2017. Regions, Periods, Activities: Uncovering Urban Dynamics via Cross-Modal Representation Learning. In Proceedings of the 26th International Conference on World Wide Web (WWW '17). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 361--370. https://doi.org/10.1145/3038912.3052601
[31]
Yating Zhang, Adam Jatowt, and Katsumi Tanaka. 2017. Is Tofu the Cheese of Asia?: Searching for Corresponding Objects across Geographical Areas. https://doi.org/10.1145/3041021.3055132
[32]
Shenglin Zhao, Michael Lyu, and Irwin King. 2018. Geo-Teaser: Geo-Temporal Sequential Embedding Rank for POI Recommendation. 57--78. https://doi.org/10.1007/978-981-13-1349-3_4

Cited By

View all

Index Terms

  1. From PIace2Vec to Multi-Scale Built-Environment Representation: A General-Purpose Distributional Embedding for Urban Data Analysis

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      LocalRec'20: Proceedings of the 4th ACM SIGSPATIAL Workshop on Location-Based Recommendations, Geosocial Networks, and Geoadvertising
      November 2020
      46 pages
      ISBN:9781450381604
      DOI:10.1145/3423334
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 06 November 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Geo-Semantics
      2. Machine Learning
      3. Points of Interest
      4. Similarity

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      SIGSPATIAL '20
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 17 of 26 submissions, 65%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)167
      • Downloads (Last 6 weeks)27
      Reflects downloads up to 05 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media