Urban Multi-Source Spatio-Temporal Data Analysis Aware Knowledge Graph Embedding
Abstract
:1. Introduction
- We propose a general framework for multi-source spatio-temporal data analysis aware knowledge graph embedding. Knowledge graph embedding models are used to capture heterogeneous network structure features and semantic features in a low dimensional space. We then use link prediction and cluster analysis tasks to mine the network structure and semantic knowledge.
- We recognize the importance of knowledge from practical perspective. Different knowledge has different impacts on travel activities.
- We evaluate the framework using travel data and external knowledge data of research areas in Shanghai. Then we analyze the potential network structure and semantic of multi-source spatio-temporal data from the evaluation results, and understand the practical significance of multi-source spatio-temporal data from the perspective of visualization.
2. Related Works
3. Materials and Methods
3.1. Definition
3.2. Framework Overview
3.3. Methodology
3.3.1. Knowledgeable Multi-Source Spatio-Temporal Data
3.3.2. Knowledge Graph Embedding
3.3.3. Multi-Source Spatio-Temporal Data Analysis
4. Experiments and Results
4.1. Data Description
4.2. Evaluation Metrics
- Average ranking of entities (MeanRank):
- Proportion of top 10 correct entities (Hit@10):
- Silhouette coefficient (SC):
- Calinski–Harabaz index(CHI):
4.3. Model Parameter Design
- Hyperparameters.The hyperparameters of the framework mainly include learning rate , embedding dimension k, train epoch, batch size B, margin , the number of iterations and clusters. In the experiment, we manually adjusted and set learning rate to 0.001, batch size to 100, embedding dimension to 100, training epoch to 500, number of iterations to 1000, and number of clusters to 5.To select the appropriate embedding dimension, we experimented with different embedding dimensions to compare their impact on link prediction accuracy. In the experiment, we utilized the TransE embedding model to select dimensions (50, 80, 100, 150, and 200). As shown in Figure 6, it shows the results of MeanRank and Hit@10 in different embedding dimensions on the MobikeKG dataset. The horizontal axis represents the size of embedding dimension, and the vertical axis represents the change of the different evaluation indices. In addition, the filter indicates that the network is evaluated after removing the negative samples on the basis of the original data. Performance is best when the embedding dimension is 150. We then selected the stabler embedding dimension of 150 for the next experiment.
- Training.A detailed description is shown in Table 1 of triples, and the experiment training, test, and validation datasets (80%, 10%, and 10% of the total, respectively) for the seven types of travel data in the city knowledge graph:
4.4. Experiment Results
4.4.1. Analysis from Network Structure Perspective
- Different clustering dimensionality reduction methodsDifferent dimensionality reduction methods change the clustering effect between entities at different angles. In order to understand the aggregation state of entities in the network from different dimensionality reduction methods, the dimensional representation of entity expression vectors are reduced by TSNE, ICA, ISOMAP, LLE, and PCA. SC and CHI are used to evaluate results. Experiment results are shown in Figure 8.The radar diagram in Figure 8 shows the results of clustering dimensionality reduction evaluation methods based on the TransR model. The outer ring represents six dimensionality reduction methods, and colors represent different datasets. Figure 8 shows that the LLE effect is most prominent in the six traditional methods. It may be that LLE is more advantageous in capturing local features and entity similarity. So, similar entities are generally distributed around the entity.In addition, to clearly show the clustering effect of the multi-source spatio-temporal data analysis model, we used geometrical visualization to understand the network structure based on MobikeKG datasets. Experiment results are shown in Figure 9.From the visualization results of Figure 9, we can know that ICA and PCA linear dimensionality reduction models can separate entities of different structural types well. ISOMAP and TSNE nonlinear dimensionality reduction models are more concerned with overall data characteristics. LLE local linear model preserves the popular structure between data and uses local linearity to reflect global nonlinearity, which can better distinguish different categories.
- Different KGE methodsDifferent KGE methods can change the clustering effect between entities. To understand the aggregation state of entities in the network from different KGE methods, TransE, TransH, TransR, STransE, and ComplEx KGE methods are used to map high-dimensional data to a low-dimensional space. On the basis of the LLE dimension reduction method, results are evaluated by clustering coefficients SC and CHI. Experiment results are shown in Figure 10.The heat map of Figure 10 shows the characterization vector clustering dimension reduction evaluation results of different KGE methods in LLE. The horizontal axis and vertical axis describe different datasets and different KGE methods, respectively. The darker the color is, the better the clustering effect is. From Figure 10a, it can be seen that the TransE model has the greatest influence on clustering classes among the various KGE methods. As can be seen from Figure 10b, several types of embedding models have similar effects on entity similarity. In general, KGE models have less impact on the aggregation state of entities within the class than between classes.
4.4.2. Analysis from Knowledge Semantic Perspective
- The performance of residents in MobikeOD is more dispersed, and the active urban space is not clear. MobikeAD show that the Pudong New Area can be divided well, and the concentration of the urban space is relatively high. From MobikeGrid, we can know that first-order and second-order association can ensure regional clustering. The urban spatial active domain is not only limited to administrative divisions but also related to grids.
- MobikePOI shows that the distribution of urban POIs in combination with residents mainly presented a ring-enclosed structure, and the discovery of resident activities from urban spatial active areas is based on the importance of the POI in the area. MobikeStation is similar to MobikeWeather distribution, which can reflect the same degree of influence on clustering results, consistent with the previous conclusions.
4.4.3. Perturbation Analysis and Robustness
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Uselton, S.P.; Treinish, L.; Ahrens, J.P. Multi-source data analysis challenges. In Proceedings of the Visualization ’98, Research Triangle Park, NC, USA, 18–23 October 1998. [Google Scholar]
- Lin, J.; Wu, Z.; Li, X. Measuring inter-city connectivity in an urban agglomeration based on multi-source data. Int. J. Geogr. Inf. Sci. 2019, 5, 1–20. [Google Scholar] [CrossRef]
- Ma, Z.; Lu, D.; Liu, Q.; Wang, J.; Xiong, Z. City-Eyes: A multi-source data integration basec smart city analysis system. In Proceedings of the 2017 IEEE 18th International Symposium on A World of Wireless, Mobile and Multimedia Networks (WoWMoM), Macau, China, 12–15 June 2017. [Google Scholar]
- Lin, X.; Li, H.F.; Zhang, Y.; Gao, L.; Zhao, L.; Deng, M. A Probabilistic Embedding Clustering Method for Urban Structure Detection. In Proceedings of the International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, ISPRS Geospatial Week 2017, Wuhan, China, 18–22 September 2017; Volume XLII-2/W7. [Google Scholar]
- Yang, X.Y.; Huang, L.; Wang, K.P. Detecting Link Communities Based on Hadoop. Appl. Mech. Mater. 2015, 727–728, 955–958. [Google Scholar] [CrossRef]
- Agryzkov, T.; Oliver, J.; Tortosa, L.; Vicent, J.-F. Extracting Information from an Urban Network by Combining a Visibility Index and a City Data Set. Symmetry 2019, 11, 704. [Google Scholar] [CrossRef] [Green Version]
- Du, R.; Qiu, G.; Gao, K.; Hu, L.; Liu, L. Abnormal Road Surface Recognition Based on Smartphone Acceleration Sensor. Sensors 2020, 20, 451. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Visvizi, A.; Lytras, M.D. Rescaling and refocusing smart cities research: From mega cities to smart villages. J. Sci. Technol. Policy Manag. 2018, 9, 134–145. [Google Scholar] [CrossRef]
- Yao, Y.; Li, X.; Liu, X.; Liu, P.; Liang, Z.; Zhang, J.; Mai, K. Sensing spatial distribution of urban land use by integrating points-of-interest and Google Word2Vec model. Int. J. Geogr. Inf. Sci. 2016, 31, 825–848. [Google Scholar] [CrossRef]
- Wang, P.; Fu, Y.; Zhang, J.; Li, X.; Lin, D. Learning Urban Community Structures: A Collective Embedding Perspective with Periodic Spatial-temporal Mobility Graphs. ACM Trans. Intell. Syst. Technol. 2018, 9, 1–28. [Google Scholar] [CrossRef]
- Tang, J.; Qu, M.; Wang, M.; Zhang, M.; Yan, J.; Mei, Q. LINE: Large-scale Information Network Embedding. In Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, 18–22 May 2015. [Google Scholar]
- Grover, A.; Leskovec, J. Node2vec: Scalable Feature Learning for Networks. In Proceedings of the Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
- Wang, D.; Peng, C.; Zhu, W. Structural Deep Network Embedding. In Proceedings of the Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
- Wang, H.; Zhang, F.; Hou, M.; Xie, X.; Guo, M.; Liu, Q. SHINE: Signed Heterogeneous Information Network Embedding for Sentiment Link Prediction. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, Los Angeles, CA, USA, 5–9 February 2018. [Google Scholar]
- Chang, S.; Han, W.; Tang, J.; Qi, G.J.; Aggarwal, C.C.; Huang, T.S. Heterogeneous Network Embedding via Deep Architectures. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, 10–13 August 2015. [Google Scholar]
- Niu, L.; Luo, W.; Jiang, M.; Lu, N. Land-Use Degree and Spatial Autocorrelation Analysis in Kunming City Based on Big Data. In Proceedings of the International Conference on Intelligent Transportation, Xiamen, China, 25–26 January 2018. [Google Scholar]
- Borges, J.; Ziehr, D.; Beigl, M.; Cacho, N.; Martins, A.; Araujo, A.; Bezerra, L.; Geisler, S. Time-Series Features for Predictive Policing. In Proceedings of the 2018 IEEE International Smart Cities Conference (ISC2), Kansas City, MO, USA, 16–19 September 2018; pp. 1–8. [Google Scholar]
- Chen, Y.X.; Zhen, F. Re-exploration of Urban Spatial Functional Organization Based on Resident Activity Data: A Case Study of Nanjing. Urban Plan. J. 2014, 72–78. (In Chinese) [Google Scholar]
- Liu, W.; Li, Y.; Du, M.; Wang, S. Cluster analysis of urban load spatial distribution. Power Syst. Autom. 2019, 43, 96–324+343. (In Chinese) [Google Scholar]
- Radha, D.; Kulkarni, S. A Social Network Analysis of World Cities Network. In Proceedings of the 2017 2nd International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS), Bangalore, India, 21–23 December 2017. [Google Scholar]
- Yuan, J.; Zheng, Y.; Xie, X. Discovering regions of different functions in a city using human mobility and POIs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2010; pp. 186–194. [Google Scholar]
- Hofmann, T. Probabilistic latent semantic analysis. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, 1–30 July 1999; pp. 289–296. [Google Scholar]
- Hofmann, T. Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 2001, 42, 177–196. [Google Scholar] [CrossRef]
- Perozzi, B.; Alrfou, R.; Skiena, S. DeepWalk: Online Learning of Social Representations. In Proceedings of the ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014. [Google Scholar]
- Jacob, Y.; Denoyer, L.; Gallinari, P. Learning latent representations of nodes for classifying in heterogeneous social networks. Altern. High Cost Litig. 2014, 13, 373–382. [Google Scholar]
- Yang, C.; Liu, Z.; Zhao, D.; Sun, M.; Chang, E. Network representation learning with rich text information. In Proceedings of the International Conference on Artificial Intelligence, San Diego, CA, USA, 8–12 June 2015. [Google Scholar]
- Figueiredo, D.R.; Ribeiro, L.F.R.; Saverese, P.H.P. Struc2vec: Learning Node Representations from Structural Identity. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017. [Google Scholar]
- Tang, J.; Qu, M.; Mei, Q. Pte: Predictive text embedding through large-scale heterogeneous text networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, 10–13 August 2015; pp. 1165–1174. [Google Scholar]
- Huang, Z.; Mamoulis, N. Heterogeneous Information Network Embedding for Meta Path based Proximity. arXiv 2017, arXiv:1701.05291. [Google Scholar]
- Gui, H.; Liu, J.; Tao, F.; Jiang, M.; Norick, B.; Han, J. Large-Scale Embedding Learning in Heterogeneous Event Data. In Proceedings of the IEEE International Conference on Data Mining, New Orleans, LA, USA, 18–21 November 2017. [Google Scholar]
- Gui, H.; Liu, J.; Tao, F.; Jiang, M.; Norick, B.; Kaplan, L.; Han, J. Embedding Learning with Events in Heterogeneous Information Networks. IEEE Trans. Knowl. Data Eng. 2017, 29, 2428–2441. [Google Scholar] [CrossRef] [PubMed]
- Chi, Y.; Qin, Y.; Song, R.; Xu, H. Knowledge Graph in Smart Education: A Case Study of Entrepreneurship Scientific Publication Management. Substainability 2018, 10, 995. [Google Scholar] [CrossRef] [Green Version]
- Ma, J.; Qiao, Y.; Hu, G.; Wang, Y.; Zhang, C.; Huang, Y.; Sangaiah, A.K.; Wu, H.; Zhang, H.; Ren, K. ELPKG: A High-Accuracy Link Prediction Approach for Knowledge Graph Completion. Symmetry 2019, 11, 1096. [Google Scholar] [CrossRef] [Green Version]
- Yang, B.; Yih, W.T.; He, X.; Gao, J.; Deng, L. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. arXiv 2014, arXiv:1412.6575. [Google Scholar]
- Cao, Z.; Qiao, X.; Jiang, S.; Zhang, X. An Efficient Knowledge-Graph-Based Web Service Recommendation Algorithm. Symmetry 2019, 11, 392. [Google Scholar]
- Bordes, A.; Usunier, N.; García-Durán, A.; Weston, J.; Yakhnenko, O. Translating embeddings for modeling multi-relational data. Adv. Neural Inf. Process. Syst. 2013, 26, 2787–2795. [Google Scholar]
- Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the Twenty-Eighth Aaai Conference on Artificial Intelligence, Quebec City, QC, Canada, 27–31 July 2014. [Google Scholar]
- Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the Twenty-Ninth Aaai Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015. [Google Scholar]
- Nguyen, D.Q.; Sirts, K.; Qu, L.; Johnson, M. STransE: A novel embedding model of entities and relationships in knowledge bases. arXiv 2016, arXiv:1606.08140. [Google Scholar]
- Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, E.; Bouchard, G. Complex Embeddings for Simple Link Prediction. arXiv 2016, arXiv:1606.06357. [Google Scholar]
Datasets | #Relation | #Entities | #Train | #Validation | #Test |
---|---|---|---|---|---|
MobikeOD | 744 | 3811 | 907,637 | 9262 | 9262 |
MobikeStation | 754 | 4106 | 907,638 | 9263 | 9263 |
MobikeGrid | 746 | 5820 | 991,333 | 10,117 | 10,117 |
MobikeWeather | 746 | 3828 | 908,479 | 9271 | 9271 |
MobikeAD | 747 | 5850 | 913,965 | 9327 | 9327 |
MobikePOI | 815 | 5163 | 923,508 | 9425 | 9425 |
MobikeKG | 828 | 6208 | 1,249,587 | 12,751 | 12,751 |
Methods | MeanRank | Hit@10 | MeanRank(Filter) | Hit@10(Filter) |
---|---|---|---|---|
DeepWalk | 1329.39 | 5.32282 | 1271.12 | 5.87346 |
Node2Vec | 1221.34 | 5.722 | 1156.55 | 6.154 |
TransE | 45.4797 | 44.643 | 44.328 | 47.29 |
TransR | 46.9104 | 45.0551 | 45.7181 | 47.7651 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, L.; Deng, H.; Qiu, L.; Li, S.; Hou, Z.; Sun, H.; Chen, Y. Urban Multi-Source Spatio-Temporal Data Analysis Aware Knowledge Graph Embedding. Symmetry 2020, 12, 199. https://doi.org/10.3390/sym12020199
Zhao L, Deng H, Qiu L, Li S, Hou Z, Sun H, Chen Y. Urban Multi-Source Spatio-Temporal Data Analysis Aware Knowledge Graph Embedding. Symmetry. 2020; 12(2):199. https://doi.org/10.3390/sym12020199
Chicago/Turabian StyleZhao, Ling, Hanhan Deng, Linyao Qiu, Sumin Li, Zhixiang Hou, Hai Sun, and Yun Chen. 2020. "Urban Multi-Source Spatio-Temporal Data Analysis Aware Knowledge Graph Embedding" Symmetry 12, no. 2: 199. https://doi.org/10.3390/sym12020199
APA StyleZhao, L., Deng, H., Qiu, L., Li, S., Hou, Z., Sun, H., & Chen, Y. (2020). Urban Multi-Source Spatio-Temporal Data Analysis Aware Knowledge Graph Embedding. Symmetry, 12(2), 199. https://doi.org/10.3390/sym12020199