SoK: Privacy-Preserving Data Synthesis

Hu, Yuzheng; Wu, Fan; Li, Qinbin; Long, Yunhui; Garrido, Gonzalo Munilla; Ge, Chang; Ding, Bolin; Forsyth, David; Li, Bo; Song, Dawn

Computer Science > Cryptography and Security

arXiv:2307.02106 (cs)

[Submitted on 5 Jul 2023 (v1), last revised 5 Aug 2023 (this version, v2)]

Title:SoK: Privacy-Preserving Data Synthesis

Authors:Yuzheng Hu, Fan Wu, Qinbin Li, Yunhui Long, Gonzalo Munilla Garrido, Chang Ge, Bolin Ding, David Forsyth, Bo Li, Dawn Song

View PDF

Abstract:As the prevalence of data analysis grows, safeguarding data privacy has become a paramount concern. Consequently, there has been an upsurge in the development of mechanisms aimed at privacy-preserving data analyses. However, these approaches are task-specific; designing algorithms for new tasks is a cumbersome process. As an alternative, one can create synthetic data that is (ideally) devoid of private information. This paper focuses on privacy-preserving data synthesis (PPDS) by providing a comprehensive overview, analysis, and discussion of the field. Specifically, we put forth a master recipe that unifies two prominent strands of research in PPDS: statistical methods and deep learning (DL)-based methods. Under the master recipe, we further dissect the statistical methods into choices of modeling and representation, and investigate the DL-based methods by different generative modeling principles. To consolidate our findings, we provide comprehensive reference tables, distill key takeaways, and identify open problems in the existing literature. In doing so, we aim to answer the following questions: What are the design principles behind different PPDS methods? How can we categorize these methods, and what are the advantages and disadvantages associated with each category? Can we provide guidelines for method selection in different real-world scenarios? We proceed to benchmark several prominent DL-based methods on the task of private image synthesis and conclude that DP-MERF is an all-purpose approach. Finally, upon systematizing the work over the past decade, we identify future directions and call for actions from researchers.

Comments:	Accepted at IEEE S&P (Oakland) 2024
Subjects:	Cryptography and Security (cs.CR); Databases (cs.DB); Machine Learning (cs.LG)
Cite as:	arXiv:2307.02106 [cs.CR]
	(or arXiv:2307.02106v2 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2307.02106

Submission history

From: Fan Wu [view email]
[v1] Wed, 5 Jul 2023 08:29:31 UTC (7,519 KB)
[v2] Sat, 5 Aug 2023 06:28:12 UTC (437 KB)

Computer Science > Cryptography and Security

Title:SoK: Privacy-Preserving Data Synthesis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:SoK: Privacy-Preserving Data Synthesis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators