计算机科学 ›› 2017, Vol. 44 ›› Issue (9): 266-271.doi: 10.11896/j.issn.1002-137X.2017.09.050

• 人工智能 • 上一篇    下一篇

大数据环境下基于贝叶斯推理的中文地名地址匹配方法

许普乐,王杨,黄亚坤,黄少芬,赵传信,陈付龙   

  1. 安徽师范大学数学计算机科学学院 芜湖241000,安徽师范大学数学计算机科学学院 芜湖241000,安徽师范大学数学计算机科学学院 芜湖241000,安徽师范大学数学计算机科学学院 芜湖241000,安徽师范大学数学计算机科学学院 芜湖241000,安徽师范大学数学计算机科学学院 芜湖241000
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受国家自然科学基金(61572036),安徽省自然科学基金(1708085MF156),安徽省重大人文社科基金项目(SK2014ZD033)资助

Chinese Place-name Address Matching Method Based on Large Data Analysis and Bayesian Decision

XU Pu-le, WANG Yang, HUANG Ya-kun, HUANG Shao-fen, ZHAO Chuan-xin and CHEN Fu-long   

  • Online:2018-11-13 Published:2018-11-13

摘要: 传统的中文地名地址匹配技术难以处理大数据环境下海量、多样和异构的智慧城市地理信息空间中的中文地名地址快速匹配问题。提出了一种Spark计算平台下基于中文地名地址要素的匹配框架及应用智能决策的匹配算法(An Intelligent Decision Matching Algorithm,AIDMA)。首先,从中文地名地址中富含的语义性和中文字符串、数字与字母之间的自然分隔性两个方面进行地址要素解析,构建了融合多距离信息的贝叶斯推理网络,从而提出了基于多准则评判的中文地名地址匹配决策方法。然后,利用芜湖市514967条脱敏后的燃气开户中文地名地址信息库与1770979条网格化社区中的中文地名地址信息库(包含网格化地址的地理空间信息)进行实验与分析。实验结果表明,在处理大规模中文地名地址信息时,相比于传统的中文地名地址匹配方法,该方法能够有效提高单条中文地名地址的匹配效率,同时在匹配度与精确度两个指标上匹配结果更加均衡。

关键词: 大数据,Spark,中文地名地址匹配技术,贝叶斯推理

Abstract: Traditional matching technologies of Chinese place-name address is hard to deal with the fast matching pro-blem of Chinese place-name address in matching massive,diverse and heterogeneous geographic information under the big data environment.An intelligent decision matching algorithm(AIDMA) based on computing framework of Spark was proposed.Firstly,geographical elements are analyzed from semantic information and separations of Chinese strings,numbers and letters.Bayesian networks is constructed with three kind of distance combined with multi-criteria decision-making effectively.514957 desensitized gas account information and 1770979 grid addresses information which includes spatial information of Wuhu City are used to perform the experiments.The conclusions prove that the executed time of each record of AIDMA is reduced to about 2.2s from 1min when compared to traditional algorithms.The matching results are more balanced on matching rate and precise rate.The proposed method possesses the theoretical significance and application value on the road to construct the intelligent countries.

Key words: Big data,Spark,Matching technologies of Chinese place-name address,Bayesian decision

[1] REMERO,BARRIGA,MOLANO.Big Data Meaning in the Architecture of IoT for Smart Cities [C]∥International Confe-rence on Data Mining and Big Data.Springer International Publishing,2016:457-465.
[2] DELMASTRO F,ARNABOLDI V,CONTI M.People-centriccomputing and communications in smart cities [J].IEEE Communications Magazine,2016,54(7):122-128.
[3] LIU D,PEI Y,LI C.Research on Establishment of Grid-based Intelligent Community Synergistic Service Platform [J].Bulletin of Surveying and Mapping,2015,3(12):98-100.
[4] PU Z,XU L.Research to the Community Resources Integration Under Grid City Management [J].Asian Social Science,2009,4(7):64-68.
[5] LI D R,CAO J J,YAO Y.Big data in smart cities [J].Science China Information Sciences,2015,58(10):1-12.
[6] HASHEM I A T,CHANG V,ANUAR N B,et al.The role of big data in smart city [J].International Journal of Information Management,2016,36(5):748-758.
[7] GOLDBERG D W,WISON J P,KNOBLOCK C A.From text to geographic coordinates:the current state of geocoding [J].Urisa Journal,2007,19(1):33-46.
[8] DRUMMOND W J.Address Matching:GIS Technology for Map-ping Human Activity Patterns [J].Journal of the American Planning Association,1995,61(61):240-251.
[9] SUN Y,CHEN W.Address Matching Technology Based onWord Segmentation [C]∥China Geographic Information System Association Annual Meeting.2007:1-12.
[10] MA Z,LI Z,SUN W,et al.An Automatic Geocoding Algorithm Based on Address Segmentation [J].Bulletin of Surveying and Mapping,2011,4(2):59-62.
[11] TIAN Q,REN F,HU T,et al.Using an Optimized Chinese Address Matching Method to Develop a Geocoding Service:A Case Study of Shenzhen,China [J].ISPRS International Journal of Geo-Information,2016,5(65):1-17.
[12] WEI J,ZHONG Z.An Approach to Address Matching Based onConfidence [J].Science of Surveying and Mapping,2015,40(1):122-125.
[13] HUANG K,MA S.Chinese Web Page Classification Based onStatistical Word Segmentation [J].Journal of Chinese Information Processing,2002,16(6):25-31.
[14] XIAO J.Method of Recognition and Match of Place Name Based on Statistic [J].Journal of Geomatics Science and Technology,2014,31(4):408-412.
[15] SONG Z.Address matching algorithm based on chinese natural language understanding [J].Journal of Remote Sensing,2013,17(4):788-801.
[16] MA L,GONG J.Application of Spatial Information NaturalLanguage Query Interface [J].Geomatics and Information Scien-ce of Wuhan University,2003,28(3):301-305.
[17] ZHANG X.A knowledge-based agent prototype for Chinese address geocoding [C]∥Geoinformatics 2008 and Joint Conference on GIS and Built environment:Advanced Spatial Data Models and Analyses.2008:1-10.
[18] JING Z,QI L.Research on the application of geocoding[J].Geo-graphy and Geo-Information Science,2003,3(19):22-25.
[19] QIN B,WANG Q Y,LI C.Effective Strategy for Sensitive Analy-sis of Bayesian Networks[J].Journal of Chinese Systems,2016,7(4):732-737.
[20] GE S,XIA X.An Intelligence Decision Model Based on Probabilistic Influence Analysis[J].Computer Engineering,2016,42(6):213-217.
[21] PEARL J.Fusion,propagation,and structuring in belief net-works [J].Artificial Intelligence,1986,29(3):241-288.
[22] YAO X,LI X,PENG L.A Novel Fuzzy Chinese Address Matching Engine Based on Full-text Search Technology [C]∥Proceedings of Science.2015:1-9.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!