skip to main content
research-article
Open access

A Programming Language for Data Privacy with Accuracy Estimations

Published: 08 June 2021 Publication History

Abstract

Differential privacy offers a formal framework for reasoning about the privacy and accuracy of computations on private data. It also offers a rich set of building blocks for constructing private data analyses. When carefully calibrated, these analyses simultaneously guarantee the privacy of the individuals contributing their data, and the accuracy of the data analysis results, inferring useful properties about the population. The compositional nature of differential privacy has motivated the design and implementation of several programming languages to ease the implementation of differentially private analyses. Even though these programming languages provide support for reasoning about privacy, most of them disregard reasoning about the accuracy of data analyses. To overcome this limitation, we present DPella, a programming framework providing data analysts with support for reasoning about privacy, accuracy, and their trade-offs. The distinguishing feature of DPella is a novel component that statically tracks the accuracy of different data analyses. To provide tight accuracy estimations, this component leverages taint analysis for automatically inferring statistical independence of the different noise quantities added for guaranteeing privacy. We evaluate our approach by implementing several classical queries from the literature and showing how data analysts can calibrate the privacy parameters to meet the accuracy requirements, and vice versa.

References

[1]
Aws Albarghouthi and Justin Hsu. 2018. Synthesizing coupling proofs of differential privacy. Proceedings of the ACM on Programming Languages 2, POPL (2018), Article 58.
[2]
Borja Balle and Yu-Xiang Wang. 2018. Improving the Gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. arXiv:1805.06530
[3]
Gilles Barthe, Rohit Chadha, Paul Krogmeier, A. Prasad Sistla, and Mahesh Viswanathan. 2021. Deciding accuracy of differential privacy schemes. Proceedings of the ACM on Programming Languages 5, POPL (2021), 1--30.
[4]
Gilles Barthe, Gian Pietro Farina, Marco Gaboardi, Emilio Jesús Gallego Arias, Andy Gordon, Justin Hsu, and Pierre-Yves Strub. 2016. Differentially private Bayesian programming. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security.
[5]
Gilles Barthe, Noémie Fong, Marco Gaboardi, Benjamin Grégoire, Justin Hsu, and Pierre-Yves Strub. 2016. Advanced probabilistic couplings for differential privacy. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security.
[6]
Gilles Barthe, Marco Gaboardi, Emilio Jesús Gallego Arias, Justin Hsu, César Kunz, and Pierre-Yves Strub. 2014. Proving differential privacy in Hoare logic. In Proceedings of the IEEE Computer Security Foundations Symposium.
[7]
Gilles Barthe, Marco Gaboardi, Emilio Jesús Gallego Arias, Justin Hsu, Aaron Roth, and Pierre-Yves Strub. 2015. Higher-order approximate relational refinement types for mechanism design and differential privacy. In Proceedings of the 42nd Annual SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’15). ACM, New York, NY.
[8]
Gilles Barthe, Marco Gaboardi, Benjamin Grégoire, Justin Hsu, and Pierre-Yves Strub. 2016. A program logic for union bounds. In Proceedings of the 43rd International Colloquium on Automata, Languages, and Programming (ICALP’16).
[9]
Gilles Barthe, Marco Gaboardi, Benjamin Grégoire, Justin Hsu, and Pierre-Yves Strub. 2016. Proving differential privacy via probabilistic couplings. In Proceedings of the ACM/IEEE Symposium on Logic in Computer Science.
[10]
Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. 2013. Differentially private data analysis of social networks via restricted sensitivity. In Proceedings of the 4th Conference on Innovations in Theoretical Computer Science (ITCS’13).
[11]
Mark Bun, Cynthia Dwork, Guy N. Rothblum, and Thomas Steinke. 2018. Composable and versatile privacy via truncated CDP. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing. ACM, New York, NY, 74–86.
[12]
Mark Bun and Thomas Steinke. 2016. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Proceedings of the Theory of Cryptography Conference.
[13]
T.-H. Hubert Chan, Elaine Shi, and Dawn Song. 2011. Private and continual release of statistics. ACM Transactions on Information and System Security 14, 3 (2011), 26.
[14]
Graham Cormode, Tejas Kulkarni, and Divesh Srivastava. 2018. Marginal release under local differential privacy. In Proceedings of the International Conference on Management of Data (SIGMOD’18). 131–146.
[15]
Devdatt P. Dubhashi and Alessandro Panconesi. 2009. Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press.
[16]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Conference on Theory of Cryptography (TCC’06). 265–284.
[17]
Cynthia Dwork and Aaron Roth. 2014. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science 9, 3-4 (2014), 211–407.
[18]
Cynthia Dwork and Guy N. Rothblum. 2016. Concentrated differential privacy. arXiv:1603.01887
[19]
Cynthia Dwork, Guy N. Rothblum, and Salil P. Vadhan. 2010. Boosting and differential privacy. In Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS’10). 51–60.
[20]
Hamid Ebadi and David Sands. 2017. Featherweight PINQ. Journal of Privacy and Confidentiality 7, 2 (2017), 159–164.
[21]
Richard A. Eisenberg, Dimitrios Vytiniotis, Simon L. Peyton Jones, and Stephanie Weirich. 2014. Closed type families with overlapping equations. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages.
[22]
Marco Gaboardi, Emilio Jesús Gallego Arias, Justin Hsu, Aaron Roth, and Zhiwei Steven Wu. 2014. Dual query: Practical private query release for high dimensional data. In Proceedings of the International Conference on Machine Learning (ICML’14).
[23]
Marco Gaboardi, Andreas Haeberlen, Justin Hsu, Arjun Narayan, and Benjamin C. Pierce. 2013. Linear dependent types for differential privacy. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages.
[24]
Marco Gaboardi, James Honaker, Gary King, Kobbi Nissim, Jonathan Ullman, and Salil P. Vadhan. 2016. PSI (): A private data sharing interface. arXiv:1609.04340
[25]
Chang Ge, Xi He, Ihab F. Ilyas, and Ashwin Machanavajjhala. 2019. APEx: Accuracy-aware differentially private data exploration. In Proceedings of the International Conference on Management of Data.
[26]
Andreas Haeberlen, Benjamin C. Pierce, and Arjun Narayan. 2011. Differential privacy under fire. In Proceedings of the USENIX Security Symposium.
[27]
Moritz Hardt, Katrina Ligett, and Frank McSherry. 2012. A simple and practical algorithm for differentially private data release. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems.
[28]
Moritz Hardt and Kunal Talwar. 2010. On the geometry of differential privacy. In Proceedings of the 42nd ACM Symposium on Theory of Computing (STOC’10).
[29]
Michael Hay, Ashwin Machanavajjhala, Gerome Miklau, Yan Chen, and Dan Zhang. 2016. Principled evaluation of differentially private algorithms using DPBench. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD’16).
[30]
Michael Hay, Vibhor Rastogi, Gerome Miklau, and Dan Suciu. 2010. Boosting the accuracy of differentially private histograms through consistency. Proceedings of the VLDB Endowment 3, 1-2 (2010), 1021–1032.
[31]
Noah M. Johnson, Joseph P. Near, and Dawn Song. 2018. Towards practical differential privacy for SQL queries. Proceedings of the VLDB Endowment 11, 5 (2018), 526–539.
[32]
Ios Kotsogiannis, Yuchao Tao, Xi He, Maryam Fanaeepour, Ashwin Machanavajjhala, Michael Hay, and Gerome Miklau. 2019. PrivateSQL: A differentially private SQL query engine. Proceedings of the VLDB Endowment 12, 11 (July 2019), 1371–1384.
[33]
Christoph H. Lampert, Liva Ralaivola, and Alexander Zimin. 2018. Dependency-dependent bounds for sums of dependent random variables. arXiv:1811.01404
[34]
Chao Li, Gerome Miklau, Michael Hay, Andrew McGregor, and Vibhor Rastogi. 2015. The matrix mechanism: Optimizing linear counting queries under differential privacy. VLDB Journal 24, 6 (2015), 757–781.
[35]
P. Li and S. Zdancewic. 2010. Arrows for secure information flow. Theoretical Computer Science 411, 19 (2010), 1974–1994.
[36]
Katrina Ligett, Seth Neel, Aaron Roth, Bo Waggoner, and Zhiwei Steven Wu. 2017. Accuracy first: Selecting a differential privacy level for accuracy-constrained ERM. arXiv:1705.10829
[37]
E. Lobo-Vesga, A. Russo, and M. Gaboardi. 2020. A programming framework for differential privacy with accuracy concentration bounds. In Proceedings of the 2020 IEEE Symposium on Security and Privacy (SP’20). IEEE, Los Alamitos, CA, 1333–1350.
[38]
Ashwin Machanavajjhala, Daniel Kifer, John M. Abowd, Johannes Gehrke, and Lars Vilhuber. 2008. Privacy: Theory meets practice on the Map. In Proceedings of the International Conference on Data Engineering (ICDE’08).
[39]
Frank McSherry and Ratul Mahajan. 2011. Differentially-private network trace analysis. ACM SIGCOMM Computer Communication Review 41, 4 (2011), 123–134.
[40]
Frank D. McSherry. 2009. Privacy integrated queries: An extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD’09). ACM, New York, NY. 19–30.
[41]
Darakhshan J. Mir, Sibren Isaacman, Ramón Cáceres, Margaret Martonosi, and Rebecca N. Wright. 2013. DP-WHERE: Differentially private modeling of human mobility. In Proceedings of the IEEE International Conference on Big Data.
[42]
Ilya Mironov. 2017. Rényi differential privacy. In Proceedings of the 2017 IEEE 30th Computer Security Foundations Symposium (CSF’17). IEEE, Los Alamitos, CA.
[43]
Eugenio Moggi. 1991. Notions of computation and monads. Information and Computation 93, 1 (1991), 55–92.
[44]
Prashanth Mohan, Abhradeep Thakurta, Elaine Shi, Dawn Song, and David E. Culler. 2012. GUPT: Privacy preserving data analysis made easy. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’12).
[45]
Arjun Narayan and Andreas Haeberlen. 2012. DJoin: Differentially private join queries over distributed databases. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI’12).
[46]
Joseph P. Near, David Darais, Chike Abuah, Tim Stevens, Pranav Gaddamadugu, Lun Wang, Neel Somani, et al. 2019. Duet: An expressive higher-order language and linear type system for statically enforcing differential privacy. Proceedings of the ACM on Programming Languages 3, OOPSLA (Oct. 2019), Article 172.
[47]
Aleksandar Nikolov, Kunal Talwar, and Li Zhang. 2013. The geometry of differential privacy: The sparse and approximate cases. In Proceedings of the 45th Annual ACM Symposium on Theory of Computing (STOC’13).
[48]
Kobbi Nissim, Sofya Raskhodnikova, and Adam D. Smith. 2007. Smooth sensitivity and sampling in private data analysis. In Proceedings of the 39th Annual ACM Symposium on Theory of Computing (STOC’07).
[49]
Davide Proserpio, Sharon Goldberg, and Frank McSherry. 2014. Calibrating data to sensitivity in private data analysis: A platform for differentially-private analysis of weighted datasets. Proceedings of the VLDB Endowment 7, 8 (2014), 637–648.
[50]
Jason Reed and Benjamin C. Pierce. 2010. Distance makes the types grow stronger: A calculus for differential privacy. In Proceedings of the ACM SIGPLAN International Conference on Functional Programming.
[51]
Indrajit Roy, Srinath T. V. Setty, Ann Kilzer, Vitaly Shmatikov, and Emmett Witchel. 2010. Airavat: Security and privacy for MapReduce. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI’10).
[52]
Alejandro Russo. 2015. Functional pearl: Two can keep a secret, if one of them uses Haskell. In Proceedings of the ACM SIGPLAN International Conference on Functional Programming. ACM, New York, NY.
[53]
A. Russo, K. Claessen, and J. Hughes. 2008. A library for light-weight information-flow security in Haskell. In Proceedings of the ACM SIGPLAN Symposium on Haskell. ACM, New York, NY.
[54]
A. Sabelfeld and A. C. Myers. 2003. Language-based information-flow security. IEEE Journal on Selected Areas in Communications 21, 1 (Jan. 2003), 5–19.
[55]
Daniel Schoepe, Musard Balliu, Benjamin C. Pierce, and Andrei Sabelfeld. 2016. Explicit secrecy: A policy for taint tracking. In Proceedings of the IEEE European Symposium on Security and Privacy. 15–30.
[56]
Calvin Smith, Justin Hsu, and Aws Albarghouthi. 2019. Trace abstraction modulo probability. Proceedings of the ACM on Programming Languages 3, POPL (2019), Article 39.
[57]
David Terei, Simon Marlow, Simon L. Peyton Jones, and David Mazières. 2012. Safe Haskell. In Proceedings of the 5th ACM SIGPLAN Symposium on Haskell. 137–148.
[58]
Justin Thaler, Jonathan Ullman, and Salil P. Vadhan. 2012. Faster algorithms for privately releasing marginals. In Proceedings of the 39th International Colloquium on Automata, Languages, and Programming (ICALP’12). 810–821.
[59]
Yuxin Wang, Zeyu Ding, Guanhong Wang, Daniel Kifer, and Danfeng Zhang. 2019. Proving differential privacy with shadow execution. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation.
[60]
Daniel Winograd-Cort, Andreas Haeberlen, Aaron Roth, and Benjamin C. Pierce. 2017. A framework for adaptive differential privacy. Proceedings of the ACM on Programming Languages 1, ICFP (2017), Article 10.
[61]
Xiaokui Xiao, Guozhang Wang, and Johannes Gehrke. 2011. Differential privacy via wavelet transforms. IEEE Transactions on Knowledge and Data Engineering 23, 8 (2011), 1200–1214.
[62]
Danfeng Zhang and Daniel Kifer. 2017. LightDP: Towards automating differential privacy proofs. In Proceedings of the ACM SIGPLAN Symposium on Principles of Programming Languages.
[63]
Dan Zhang, Ryan McKenna, Ios Kotsogiannis, Michael Hay, Ashwin Machanavajjhala, and Gerome Miklau. 2018. EKTELO: A framework for defining differentially-private computations. In Proceedings of the International Conference on Management of Data.
[64]
Hengchu Zhang, Edo Roth, Andreas Haeberlen, Benjamin C. Pierce, and Aaron Roth. 2019. Fuzzi: A three-level logic for differential privacy. In Proceedings of the ACM SIGPLAN International Conference on Functional Programming (ICFP’19).

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Programming Languages and Systems
ACM Transactions on Programming Languages and Systems  Volume 43, Issue 2
June 2021
197 pages
ISSN:0164-0925
EISSN:1558-4593
DOI:10.1145/3470134
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2021
Accepted: 01 February 2021
Revised: 01 February 2021
Received: 01 July 2020
Published in TOPLAS Volume 43, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Accuracy
  2. Haskell
  3. concentration bounds
  4. databases
  5. differential privacy

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)237
  • Downloads (Last 6 weeks)26
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media