skip to main content
10.1145/3658644.3690271acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article
Open access

DPM: Clustering Sensitive Data through Separation

Published: 09 December 2024 Publication History

Abstract

Clustering is an important tool for data exploration where the goal is to subdivide a data set into disjoint clusters that fit well into the underlying data structure. When dealing with sensitive data, privacy-preserving algorithms aim to approximate the non-private baseline while minimising the leakage of sensitive information. State-of-the-art privacy-preserving clustering algorithms tend to output clusters that are good in terms of the standard metrics, inertia, silhouette score, and clustering accuracy, however, the clustering result strongly deviates from the non-private KMeans baseline.
In this work, we present a privacy-preserving clustering algorithm called DPM that recursively separates a data set into clusters based on a geometrical clustering approach. In addition, DPM estimates most of the data-dependent hyper-parameters in a privacy-preserving way. We prove that DPM preserves Differential Privacy and analyse the utility guarantees of DPM. Finally, we conduct an extensive empirical evaluation for synthetic and real-life data sets. We show that DPM achieves state-of-the-art utility on the standard clustering metrics and yields a clustering result much closer to that of the popular non-private KMeans algorithm without requiring the number of classes.

References

[1]
Maria-Florina Balcan, Travis Dick, Yingyu Liang, Wenlong Mou, and Hongyang Zhang. 2017. Differentially private clustering in high-dimensional euclidean spaces. In International Conference on Machine Learning. PMLR, 322--331.
[2]
Alisa Chang, Badih Ghazi, Ravi Kumar, and Pasin Manurangsi. 2021. Locally private k-means in one round. In ICML. PMLR, 1441--1451.
[3]
Kamalika Chaudhuri, Claire Monteleoni, and Anand D Sarwate. 2011. Differentially private empirical risk minimization. JMLR 12, 3 (2011).
[4]
E. Cohen, H. Kaplan, Y. Mansour, U. Stemmer, and E. Tsfadia. 2021. Differentiallyprivate clustering of easy instances. In ICML. PMLR, 2049--2059.
[5]
Wenxin Du, Canyon Foot, Monica Moniot, Andrew Bray, and Adam Groce. 2020. Differentially private confidence intervals. arXiv preprint arXiv:2001.02285 (2020).
[6]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating Noise to Sensitivity in Private Data Analysis. In Theory of Cryptography, Shai Halevi and Tal Rabin (Eds.). Springer, 265--284.
[7]
Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science 9 (2014).
[8]
D. Feldman, C. Xiang, R. Zhu, and D. Rus. 2017. Coresets for differentially private k-means clustering and applications to privacy in mobile sensor networks. In Proceedings of the 16th ACM/IEEE IPSN.
[9]
B. Ghazi, R. Kumar, and P. Manurangsi. 2020. Differentially private clustering: Tight approximation ratios. Neurips (2020).
[10]
Google. 2021. Google's differential privacy libraries. https://github.com/google/ differential-privacy/
[11]
Alexander Hinneburg and Daniel A Keim. 1999. Optimal grid-clustering: Towards breaking the curse of dimensionality in high-dimensional clustering. (1999).
[12]
N. Holohan, S. Braghin, Pól M. Aonghusa, and K. Levacher. 2019. Diffprivlib: the IBM differential privacy library. ArXiv e-prints 1907.02444 [cs.CR] (July 2019).
[13]
Zhiyi Huang and Jinyan Liu. 2018. Optimal differentially private algorithms for k-means clustering. In Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 395--408.
[14]
Roger Iyengar, Joseph P Near, Dawn Song, Om Thakkar, Abhradeep Thakurta, and LunWang. 2019. Towards practical differentially private convex optimization. In IEEE S&P. IEEE, 299--316.
[15]
Matthew Jones, Huy L. Nguyen, and Thy D Nguyen. 2021. Differentially Private Clustering via Maximum Coverage. Proceedings of the AAAI Conference on Artificial Intelligence 35, 13 (2021).
[16]
K. LeFevre, D. DeWitt, and R. Ramakrishnan. 2006. Mondrian multidimensional k-anonymity. In 22nd ICDE'06. IEEE, 25--25.
[17]
Samuel Maddock, Graham Cormode, TianhaoWang, Carsten Maple, and Somesh Jha. 2022. Federated Boosted Decision Trees with Differential Privacy. In Proceedings of the 2022 ACM SIGSAC CCS. ACM.
[18]
Huy L. Nguyen, Anamay Chaturvedi, and Eric Z Xu. 2021. Differentially Private k-Means via Exponential Mechanism and Max Cover. Proceedings of the AAAI Conference on Artificial Intelligence 35, 10 (5 2021), 9101--9108.
[19]
Tianjiao Ni, Minghao Qiao, Zhili Chen, Shun Zhang, and Hong Zhong. 2021. Utility-efficient differentially private K-means clustering based on cluster merging. Neurocomputing 424 (2021), 205--214.
[20]
Kobbi Nissim and Uri Stemmer. 2018. Clustering algorithms for the centralized and local models. In Algorithmic Learning Theory. PMLR, 619--653.
[21]
Kobbi Nissim, Uri Stemmer, and Salil Vadhan. 2016. Locating a small cluster privately. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 413--427.
[22]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.
[23]
M. Shechner, O. Sheffet, and U. Stemmer. 2020. Private k-means clustering with stability assumptions. In AISTATS. PMLR, 2518--2528.
[24]
Uri Stemmer. 2021. Locally private k-means clustering. Journal of Machine Learning Research 22, 176 (2021), 1--30.
[25]
Uri Stemmer and Haim Kaplan. 2018. Differentially private k-means with constant multiplicative error. Advances in Neural Information Processing Systems 31 (2018).
[26]
Dong Su, Jianneng Cao, Ninghui Li, Elisa Bertino, and Hongxia Jin. 2016. Differentially private k-means clustering. In Proceedings of the sixth ACM conference on data and application security and privacy. 26--37.
[27]
Jun Zhang, Xiaokui Xiao, and Xing Xie. 2016. PrivTree: A Differentially Private Algorithm for Hierarchical Decompositions. CoRR (2016). arXiv:1601.03229
[28]
Yaling Zhang, Na Liu, and Shangping Wang. 2018. A differential privacy protecting K-means clustering algorithm based on contour coefficients. PloS one 13, 11 (2018), e0206832.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CCS '24: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security
December 2024
5188 pages
ISBN:9798400706363
DOI:10.1145/3658644
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 December 2024

Check for updates

Badges

Author Tags

  1. clustering
  2. differential privacy
  3. machine learning
  4. privacy
  5. privacy-preserving
  6. unsupervised

Qualifiers

  • Research-article

Conference

CCS '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Upcoming Conference

CCS '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 80
    Total Downloads
  • Downloads (Last 12 months)80
  • Downloads (Last 6 weeks)80
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media