Research on Approximate Spatial Keyword Group Queries Based on Differential Privacy and Exclusion Preferences in Road Networks
Abstract
:1. Introduction
- (1)
- Aiming at the traditional spatial keyword group query study that does not consider user input bias and user rejection preference, this paper proposes a new query model, namely the approximate spatial keyword group query problem in road networks based on differential privacy and rejection preference. This query model is based on the traditional group query study, which is more in line with the actual road network environment. And it also takes into account the user’s input bias and rejection preference, which is more in line with the varied needs of users in today’s society.
- (2)
- Currently, existing indexing techniques cannot deal with the query model problem proposed in this paper, so this paper proposes a new type of index, the IGgram-tree. This index not only has the advantages of G-tree [16] but also introduces an n-gram index [17] and inverted file technology to handle keyword queries. In terms of handling exclusion preferences, the index introduces Bloom filters, which can efficiently handle exclusion keywords information. To improve the query efficiency, a filtering algorithm is further proposed, based on the IGgram-tree index. It uses the IGgram-tree to perform the first step of the pruning operation to derive the objects in the spatial database that meet the query requirements of keyword constraints and exclusion preferences, thus reducing the computational overhead of subsequent queries.
- (3)
- To address the problem that the traditional spatial keyword group query algorithm does not perform data privacy protection, this paper proposes a protection algorithm based on differential privacy. It is based on differential privacy and privacy protection of the exact result of the query through an indexing mechanism, solving the problem of privacy leakage that may result from traditional methods.
2. Related Work
3. Problem Descriptions
- (1)
- , the keyword concatenation of objects in S can cover all the keywords in q.K+, and approximate matching is considered for keyword matching in this paper.
- (2)
- , no object in S can contain any of the exclusion keywords specified by query q.
4. Query Algorithm Based on the IGgram-Tree Index and Minimum Hash Set
4.1. Filtering Algorithm Based on the IGgram-Tree Index
- (1)
- ;
- (2)
- ; and
- (3)
- , if, then .
Algorithm 1: Filtering algorithm based on the IGgram-tree index |
Input: IGgram-tree index on P, query q(q.l, q.K+, q.K−). Output: Candidate hash table MH. begin 1: Initialize the priority queue PQ to empty and the hash table MH to empty; 2: the root node G0 joins the team PQ; 3: while PQ is not empty then 4: G ← PQ.dequeue(); /*Queue PQ queues out an element assigned to G*/ 5: if G is a non-leaf node then 6: for exclusion keyword ekey in q.K− then 7: if ekey matches to G.IGF keyword gkey with approximate distance ≤ 3 then /*theorem 2*/ 8: ekey in q.K− ← gkey in q.K−; /* Correction of keyword information*/ 9: end if 10: if G.BF and G.UK determine the existence of ekey results then /*theroem1*/ 11: continue; 12: else then 13: for key in q.K+ then 14: if key matches to G.IGF keyword gkey with approximate distance ≤ 3 then 15: key in q.K+ ← gkey; 16: PQ.enqueue(G); 17: end if 18: end for 19: end if 20: end for 21: else then /*G is a leaf node*/ 22: for POI object p in G then 23: if p.K contains keyword key in q.K+ and no keyword ekey in q.K− then 24: calculate the distance pvi of p from the distance matrix of the node and its boundary point vi in the partition, keeping its distance-related information; 25: MH.add(p); 26: end if 27: end for 28: end if 29: end while 30: return MH; /* Store processed p information in MH*/ end |
4.2. Refinement Algorithm Based on Minimum Hash Set
Algorithm 2: Refinement algorithm based on the minimum hash set |
Input: Hash table MH, IGgram-tree index on P, query q(q.l, q.K+, q.K−). Output: Result set Res. begin 1: Initialize the feasible set S to empty, the integers temp, min to 0, and the set Slist of feasible sets to empty; 2: take the Cartesian product of POI from its chain table according to different keywords in MH to form multiple feasible sets into Slist; 3: while Slist is not empty then 4: Slist takes a feasible set and deposits it in S; 5: for p in S then 6: temp = max{temp, roaddist(p, rq)}; 7: S.indist ← max{ roaddist(pi, pj) | i≠j}; 8: end for 9: S.cost = S.indist + temp; 10: if min > S.cost then 11: Res ← S; /* Overwrite the new optimal set of feasible solutions into Res*/ 12: min ← S.cost; 13: end if 14: end while 15: return Res; end |
Algorithm 3: The roaddist algorithm |
Input: IGgram-tree index on P, two POIs p1 and p2 in the road network. Output: The shortest distance between p1 and p2. begin 1: Locate the leaf node Gij where p1 is located and the leaf node Gst where p2 is located; 2: initialize the integer variable mindist ← 0, k ← 0, curdist ← 0, string top is empty, node Gcur is Gij; 3: if Gij = Gst then 4: return DijkDist(p1,p2); 5: else then 6: k ← Find the index of the first identical character of “ij” and “st”; 7: if k < 0 then 8: top ← “0”; /* The common parent node of the two points is G0 */ 9: else then 10: top ← SubString(k); /* The common parent of two points is the non-leaf node whose serial number is its common string, extracting the common string */ 11: end if 12: while Gcur ≠ Gtop then /* Traverse up from the leaf node until the common parent is queried and calculate the minimum distance path from p1 to Gtop */ 13: curdist ← calculates the minimum distance from p1 to the Gcur bounder in the range of Gcur based on its distance matrix; 14: mindist ← mindist + curdist; 15: Gcur ← Gcur.parent; /* Gcur is adjusted to its parent node */ 16: end while 17: Gcur ← Gst; 18: while Gcur ≠ Gtop then /* Traverse up from the leaf node until the common parent is queried and calculate the minimum distance path from p2 to Gtop */ 19: mindist ← mindist + curdist; 20: Gcur ← Gcur.parent; /* Gcur is adjusted to its parent node*/ 21: end while 22: curdist ← Select the minimum distance according to the distance matrix of Gtop’s child node bounders; /* Connect two paths to form the complete path from p1 to p2 */ 23: mindist ← mindist + curdist; 24: end if 25: return mindist; end |
5. Differential Privacy-Based Protection Methods
Algorithm 4: Differential privacy preservation algorithm (DPP algorithm) |
Input: The precise result set Res(p1, p2, …, pn) and the spatial–textual database P. Output: Global results after protection SafeRes. begin 1: Initialize SafeRes to empty, r ← Res; 2: for p1 to pn in r then /* Iterate through each POI in the exact result set */ 3: Pi ← random(pi∈P && pi ≠ p1–pn); /* Generate a random POI that is not in Res */ 4: Replace the current object with pi to form a new set ri; 5: end for 6: for r and r1~rn then /* Use the exponential mechanism for Res and the generated proximity result set and save the output to the safe result set SafeRes */ 7: ; 8: end for 9: SafeRes ← A(f, P, r); 10: return SafeRes; end |
Algorithm 5: Query algorithm based on the IGgram-tree index and minimum hash set (IGHashDP algorithm) |
Input: IGgram-tree index on P, query q(q.l, q.K+, q.K−). Output: Global results after protection SafeRes. begin 1: Filtering algorithm based on the IGgram-tree index (Algorithm 1); 2: Refinement algorithm based on the minimum hash set (Algorithm 2); 3: Differential privacy preservation algorithm (DPP algorithm) (Algorithm 4); end |
6. Experiment Analysis
7. Conclusions
- Streaming data-based spatial keyword group queries in dynamic environments.
- Research on spatial keyword group query in the big data environment.
- Approximate spatial keyword group queries based on user preferences in road networks.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhang, L.; Li, S.; Guo, Y.; Hao, X. A Method for k Nearest Neighbor Query of Line Segment in Obstructed Spaces. J. Inf. Process. Syst. 2020, 16, 406–420. [Google Scholar] [CrossRef]
- Yang, R.; Niu, B. Continuous k Nearest Neighbor Queries over Large-Scale Spatial–Textual Data Streams. Int. J. Geo-Inf. 2020, 9, 694. [Google Scholar] [CrossRef]
- Li, S.; Hu, Y.; Hao, X.; Zhang, L.; Hao, Z. Approximate k-Nearest Neighbor Query of High Dimensional Data Based on Dimension Grouping and Reducing. J. Comput. Res. Dev. 2021, 58, 609–623. [Google Scholar]
- Wang, J.; Xiong, Z.; Han, Q.; Han, X.; Yang, D. Top-k Socially Constrained Spatial Keyword Search in Large SIoT Networks. IEEE Internet Things J. 2022, 9, 9280–9289. [Google Scholar] [CrossRef]
- Pan, X.; Yu, Q.-D.; Ma, A.; Sun, Y.; Wu, L.; Guo, J. Efficient algorithm of top-k spatial keyword search with OR semantics. J. Softw. 2020, 31, 3197–3215. [Google Scholar] [CrossRef]
- Allheeib, N.; Islam, M.S.; Taniar, D.; Shao, Z.; Cheema, M.A. Density-Based Reverse Nearest Neighbourhood Search in Spatial Databases. J. Ambient Intell. Hum. Comput. 2021, 12, 4335–4346. [Google Scholar] [CrossRef]
- Pan, X.; Nie, S.; Hu, H.; Yu, P.S.; Guo, J. Reverse Nearest Neighbor Search in Semantic Trajectories for Location-Based Services. IEEE Trans. Serv. Comput. 2022, 15, 986–999. [Google Scholar] [CrossRef]
- Cao, X.; Cong, G.; Jensen, C.S.; Ooi, B.C. Collective Spatial Keyword Querying. In Proceedings of the 2011 International Conference on Management of Data—SIGMOD’11, Athens, Greece, 12–16 June 2011; ACM Press: Athens, Greece, 2011; p. 373. [Google Scholar]
- Chan, H.K.-H.; Long, C.; Wong, R.C.-W. On Generalizing Collective Spatial Keyword Queries. IEEE Trans. Knowl. Data Eng. 2018, 30, 1712–1726. [Google Scholar] [CrossRef]
- Xu, H.; Gu, Y.; Sun, Y.; Qi, J.; Yu, G.; Zhang, R. Efficient Processing of Moving Collective Spatial Keyword Queries. VLDB J. 2020, 29, 841–865. [Google Scholar] [CrossRef]
- Zhang, L.; Li, J.; Li, S. Research on Time-Aware Group Query Method with Exclusion Keywords. Int. J. Geo-Inf. 2023, 12, 438. [Google Scholar] [CrossRef]
- Zhang, D.; Chee, Y.M.; Mondal, A.; Tung, A.K.H.; Kitsuregawa, M. Keyword Search in Spatial Databases: Towards Searching by Document. In Proceedings of the 2009 IEEE 25th International Conference on Data Engineering, Shanghai, China, 29 March–2 April 2009; IEEE: Shanghai, China, 2009; pp. 688–699. [Google Scholar]
- Choi, D.-W.; Pei, J.; Lin, X. Finding the Minimum Spatial Keyword Cover. In Proceedings of the 2016 IEEE 32nd International Conference on Data Engineering (ICDE), Helsinki, Finland, 16–20 May 2016; IEEE: Helsinki, Finland, 2016; pp. 685–696. [Google Scholar]
- Guo, T.; Cao, X.; Cong, G. Efficient Algorithms for Answering the M-Closest Keywords Query. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Australia, 31 May–4 June 2015; ACM: Melbourne, Australia, 2015; pp. 405–418. [Google Scholar]
- Gao, Y.; Zhao, J.; Zheng, B.; Chen, G. Efficient Collective Spatial Keyword Query Processing on Road Networks. IEEE Trans. Intell. Transport. Syst. 2016, 17, 469–480. [Google Scholar] [CrossRef]
- Zhong, R.; Li, G.; Tan, K.-L.; Zhou, L.; Gong, Z. G-Tree: An Efficient and Scalable Index for Spatial Search on Road Networks. IEEE Trans. Knowl. Data Eng. 2015, 27, 2175–2189. [Google Scholar] [CrossRef]
- Tripathy, A.; Agrawal, A.; Rath, S.K. Classification of Sentiment Reviews Using N-Gram Machine Learning Approach. Expert Syst. Appl. 2016, 57, 117–126. [Google Scholar] [CrossRef]
- Deng, K.; Li, X.; Lu, J.; Zhou, X. Best Keyword Cover Search. IEEE Trans. Knowl. Data Eng. 2015, 27, 61–73. [Google Scholar] [CrossRef]
- Li, J.; Xu, M. A Parametric Approximation Algorithm for Spatial Group Keyword Queries. IDA 2021, 25, 305–319. [Google Scholar] [CrossRef]
- Zhao, S.; Cheng, X.; Su, S.; Shuang, K. Popularity-Aware Collective Keyword Queries in Road Networks. Geoinformatica 2017, 21, 485–518. [Google Scholar] [CrossRef]
- Su, S.; Zhao, S.; Cheng, X.; Bi, R.; Cao, X.; Wang, J. Group-Based Collective Keyword Querying in Road Networks. Inf. Process. Lett. 2017, 118, 83–90. [Google Scholar] [CrossRef]
- Hu, J.; Fan, J.; Li, G.; Chen, S. Top-k Fuzzy Spatial Keyword Search. Chin. J. Comput. 2012, 35, 2237–2246. [Google Scholar] [CrossRef]
- Zhang, S.; Yang, R.; Zhao, Y. Research on Multi-Spatial Keyword Fuzzy Query Algorithm in Spatial Data. In Proceedings of the 2018 International Conference on Big Data Engineering and Technology, Chengdu, China, 25–27 August 2018; ACM: Chengdu, China, 2018; pp. 40–45. [Google Scholar]
- Yang, S.; Tang, S.; Zhang, X. Privacy-Preserving k Nearest Neighbor Query with Authentication on Road Networks. J. Parallel Distrib. Comput. 2019, 134, 25–36. [Google Scholar] [CrossRef]
- Dwork, C.; Roth, A. The Algorithmic Foundations of Differential Privacy. FNT Theor. Comput. Sci. 2013, 9, 211–407. [Google Scholar] [CrossRef]
- Zhu, S.; Wang, L.; Sun, G. A Perturbation Mechanism for Classified Transformation Satisfying Local Differential Privacy. J. Comput. Res. Dev. 2022, 59, 430–439. [Google Scholar]
- Chen, Z.; Ni, T.; Zhong, H.; Zhang, S.; Cui, J. Differentially Private Double Spectrum Auction With Approximate Social Welfare Maximization. IEEE Trans. Inform. Forensic Secur. 2019, 14, 2805–2818. [Google Scholar] [CrossRef]
- Fan, Z.; Xu, X. APDPk-Means: A New Differential Privacy Clustering Algorithm Based on Arithmetic Progression Privacy Budget Allocation. In Proceedings of the 2019 IEEE 21st International Conference on High Performance Computing and Communications, Zhangjiajie, China, 10–12 August 2019; IEEE: Zhangjiajie, China, 2019; pp. 1737–1742. [Google Scholar]
- Song, X.; Xu, J.; Zhou, R.; Liu, C.; Zheng, K.; Zhao, P.; Falkner, N. Collective Spatial Keyword Search on Activity Trajectories. Geoinformatica 2020, 24, 61–84. [Google Scholar] [CrossRef]
- Yi, X.; Paulet, R.; Bertino, E.; Varadharajan, V. Practical Approximate k Nearest Neighbor Queries with Location and Query Privacy. IEEE Trans. Knowl. Data Eng. 2016, 28, 1546–1559. [Google Scholar] [CrossRef]
POI | Distance | Keywords |
---|---|---|
p1 | (v1,1) | t1,t6 |
p2 | (v2,1) | t1,t2 |
p3 | (v5,1) | t1,t6 |
p4 | (v7,1) | t1,t3 |
p5 | (v9,0) | t6 |
p6 | (v10,2) | t3 |
p7 | (v13,2) | t5,t6 |
p8 | (v13,1) | t5,t7 |
POI | Bounders and Distances | POI | Bounders and Distances |
---|---|---|---|
p1 | v4:2, v3:5 | p6 | v9:3, v11:3 |
p3 | v5:1, v6:1 | p7 | v13:2 |
p4 | v5:4, v6:4 | p8 | v13:1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, L.; Li, J.; Li, S. Research on Approximate Spatial Keyword Group Queries Based on Differential Privacy and Exclusion Preferences in Road Networks. ISPRS Int. J. Geo-Inf. 2023, 12, 480. https://doi.org/10.3390/ijgi12120480
Zhang L, Li J, Li S. Research on Approximate Spatial Keyword Group Queries Based on Differential Privacy and Exclusion Preferences in Road Networks. ISPRS International Journal of Geo-Information. 2023; 12(12):480. https://doi.org/10.3390/ijgi12120480
Chicago/Turabian StyleZhang, Liping, Jing Li, and Song Li. 2023. "Research on Approximate Spatial Keyword Group Queries Based on Differential Privacy and Exclusion Preferences in Road Networks" ISPRS International Journal of Geo-Information 12, no. 12: 480. https://doi.org/10.3390/ijgi12120480
APA StyleZhang, L., Li, J., & Li, S. (2023). Research on Approximate Spatial Keyword Group Queries Based on Differential Privacy and Exclusion Preferences in Road Networks. ISPRS International Journal of Geo-Information, 12(12), 480. https://doi.org/10.3390/ijgi12120480