Search | arXiv e-print repository

Private federated discovery of out-of-vocabulary words for Gboard

Authors: Ziteng Sun, Peter Kairouz, Haicheng Sun, Adria Gascon, Ananda Theertha Suresh

Abstract: The vocabulary of language models in Gboard, Google's keyboard application, plays a crucial role for improving user experience. One way to improve the vocabulary is to discover frequently typed out-of-vocabulary (OOV) words on user devices. This task requires strong privacy protection due to the sensitive nature of user input data. In this report, we present a private OOV discovery algorithm for G… ▽ More The vocabulary of language models in Gboard, Google's keyboard application, plays a crucial role for improving user experience. One way to improve the vocabulary is to discover frequently typed out-of-vocabulary (OOV) words on user devices. This task requires strong privacy protection due to the sensitive nature of user input data. In this report, we present a private OOV discovery algorithm for Gboard, which builds on recent advances in private federated analytics. The system offers local differential privacy (LDP) guarantees for user contributed words. With anonymous aggregation, the final released result would satisfy central differential privacy guarantees with $\varepsilon = 0.315, δ= 10^{-10}$ for OOV discovery in en-US (English in United States). △ Less

Submitted 18 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.10764 [pdf, other]

Confidential Federated Computations

Authors: Hubert Eichner, Daniel Ramage, Kallista Bonawitz, Dzmitry Huba, Tiziano Santoro, Brett McLarnon, Timon Van Overveldt, Nova Fallen, Peter Kairouz, Albert Cheu, Katharine Daly, Adria Gascon, Marco Gruteser, Brendan McMahan

Abstract: Federated Learning and Analytics (FLA) have seen widespread adoption by technology platforms for processing sensitive on-device data. However, basic FLA systems have privacy limitations: they do not necessarily require anonymization mechanisms like differential privacy (DP), and provide limited protections against a potentially malicious service provider. Adding DP to a basic FLA system currently… ▽ More Federated Learning and Analytics (FLA) have seen widespread adoption by technology platforms for processing sensitive on-device data. However, basic FLA systems have privacy limitations: they do not necessarily require anonymization mechanisms like differential privacy (DP), and provide limited protections against a potentially malicious service provider. Adding DP to a basic FLA system currently requires either adding excessive noise to each device's updates, or assuming an honest service provider that correctly implements the mechanism and only uses the privatized outputs. Secure multiparty computation (SMPC) -based oblivious aggregations can limit the service provider's access to individual user updates and improve DP tradeoffs, but the tradeoffs are still suboptimal, and they suffer from scalability challenges and susceptibility to Sybil attacks. This paper introduces a novel system architecture that leverages trusted execution environments (TEEs) and open-sourcing to both ensure confidentiality of server-side computations and provide externally verifiable privacy properties, bolstering the robustness and trustworthiness of private federated computations. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2307.13347 [pdf, other]

Federated Heavy Hitter Recovery under Linear Sketching

Authors: Adria Gascon, Peter Kairouz, Ziteng Sun, Ananda Theertha Suresh

Abstract: Motivated by real-life deployments of multi-round federated analytics with secure aggregation, we investigate the fundamental communication-accuracy tradeoffs of the heavy hitter discovery and approximate (open-domain) histogram problems under a linear sketching constraint. We propose efficient algorithms based on local subsampling and invertible bloom look-up tables (IBLTs). We also show that our… ▽ More Motivated by real-life deployments of multi-round federated analytics with secure aggregation, we investigate the fundamental communication-accuracy tradeoffs of the heavy hitter discovery and approximate (open-domain) histogram problems under a linear sketching constraint. We propose efficient algorithms based on local subsampling and invertible bloom look-up tables (IBLTs). We also show that our algorithms are information-theoretically optimal for a broad class of interactive schemes. The results show that the linear sketching constraint does increase the communication cost for both tasks by introducing an extra linear dependence on the number of users in a round. Moreover, our results also establish a separation between the communication cost for heavy hitter discovery and approximate histogram in the multi-round setting. The dependence on the number of rounds $R$ is at most logarithmic for heavy hitter discovery whereas that of approximate histogram is $Θ(\sqrt{R})$. We also empirically demonstrate our findings. △ Less

Submitted 25 July, 2023; originally announced July 2023.

arXiv:2305.10867 [pdf, other]

Amplification by Shuffling without Shuffling

Authors: Borja Balle, James Bell, Adrià Gascón

Abstract: Motivated by recent developments in the shuffle model of differential privacy, we propose a new approximate shuffling functionality called Alternating Shuffle, and provide a protocol implementing alternating shuffling in a single-server threat model where the adversary observes all communication. Unlike previous shuffling protocols in this threat model, the per-client communication of our protocol… ▽ More Motivated by recent developments in the shuffle model of differential privacy, we propose a new approximate shuffling functionality called Alternating Shuffle, and provide a protocol implementing alternating shuffling in a single-server threat model where the adversary observes all communication. Unlike previous shuffling protocols in this threat model, the per-client communication of our protocol only grows sub-linearly in the number of clients. Moreover, we study the concrete efficiency of our protocol and show it can improve per-client communication by one or more orders of magnitude with respect to previous (approximate) shuffling protocols. We also show a differential privacy amplification result for alternating shuffling analogous to the one for uniform shuffling, and demonstrate that shuffling-based protocols for secure summation based a construction of Ishai et al. (FOCS'06) remain secure under the Alternating Shuffle. In the process we also develop a protocol for exact shuffling in single-server threat model with amortized logarithmic communication per-client which might be of independent interest. △ Less

Submitted 7 September, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

Journal ref: CCS 2023

arXiv:2301.06167 [pdf]

UN Handbook on Privacy-Preserving Computation Techniques

Authors: David W. Archer, Borja de Balle Pigem, Dan Bogdanov, Mark Craddock, Adria Gascon, Ronald Jansen, Matjaž Jug, Kim Laine, Robert McLellan, Olga Ohrimenko, Mariana Raykova, Andrew Trask, Simon Wardley

Abstract: This paper describes privacy-preserving approaches for the statistical analysis. It describes motivations for privacy-preserving approaches for the statistical analysis of sensitive data, presents examples of use cases where such methods may apply and describes relevant technical capabilities to assure privacy preservation while still allowing analysis of sensitive data. Our focus is on methods th… ▽ More This paper describes privacy-preserving approaches for the statistical analysis. It describes motivations for privacy-preserving approaches for the statistical analysis of sensitive data, presents examples of use cases where such methods may apply and describes relevant technical capabilities to assure privacy preservation while still allowing analysis of sensitive data. Our focus is on methods that enable protecting privacy of data while it is being processed, not only while it is at rest on a system or in transit between systems. The information in this document is intended for use by statisticians and data scientists, data curators and architects, IT specialists, and security and information assurance specialists, so we explicitly avoid cryptographic technical details of the technologies we describe. △ Less

Submitted 15 January, 2023; originally announced January 2023.

Comments: 50 pages

arXiv:2111.02356 [pdf, other]

Towards Sparse Federated Analytics: Location Heatmaps under Distributed Differential Privacy with Secure Aggregation

Authors: Eugene Bagdasaryan, Peter Kairouz, Stefan Mellem, Adrià Gascón, Kallista Bonawitz, Deborah Estrin, Marco Gruteser

Abstract: We design a scalable algorithm to privately generate location heatmaps over decentralized data from millions of user devices. It aims to ensure differential privacy before data becomes visible to a service provider while maintaining high data accuracy and minimizing resource consumption on users' devices. To achieve this, we revisit distributed differential privacy based on recent results in secur… ▽ More We design a scalable algorithm to privately generate location heatmaps over decentralized data from millions of user devices. It aims to ensure differential privacy before data becomes visible to a service provider while maintaining high data accuracy and minimizing resource consumption on users' devices. To achieve this, we revisit distributed differential privacy based on recent results in secure multiparty computation, and we design a scalable and adaptive distributed differential privacy approach for location analytics. Evaluation on public location datasets shows that this approach successfully generates metropolitan-scale heatmaps from millions of user samples with a worst-case client communication overhead that is significantly smaller than existing state-of-the-art private protocols of similar accuracy. △ Less

Submitted 26 June, 2022; v1 submitted 3 November, 2021; originally announced November 2021.

Comments: In PETS'22

arXiv:2109.07461 [pdf, other]

MPC-Friendly Commitments for Publicly Verifiable Covert Security

Authors: Nitin Agrawal, James Bell, Adrià Gascón, Matt J. Kusner

Abstract: We address the problem of efficiently verifying a commitment in a two-party computation. This addresses the scenario where a party P1 commits to a value $x$ to be used in a subsequent secure computation with another party P2 that wants to receive assurance that P1 did not cheat, i.e. that $x$ was indeed the value inputted into the secure computation. Our constructions operate in the publicly verif… ▽ More We address the problem of efficiently verifying a commitment in a two-party computation. This addresses the scenario where a party P1 commits to a value $x$ to be used in a subsequent secure computation with another party P2 that wants to receive assurance that P1 did not cheat, i.e. that $x$ was indeed the value inputted into the secure computation. Our constructions operate in the publicly verifiable covert (PVC) security model, which is a relaxation of the malicious model of MPC appropriate in settings where P1 faces a reputational harm if caught cheating. We introduce the notion of PVC commitment scheme and indexed hash functions to build commitments schemes tailored to the PVC framework, and propose constructions for both arithmetic and Boolean circuits that result in very efficient circuits. From a practical standpoint, our constructions for Boolean circuits are $60\times$ faster to evaluate securely, and use $36\times$ less communication than baseline methods based on hashing. Moreover, we show that our constructions are tight in terms of required non-linear operations, by proving lower bounds on the nonlinear gate count of commitment verification circuits. Finally, we present a technique to amplify the security properties our constructions that allows to efficiently recover malicious guarantees with statistical security. △ Less

Submitted 27 January, 2022; v1 submitted 15 September, 2021; originally announced September 2021.

Comments: Appeared at ACM CCS 2021

Journal ref: ACM CCS 2021

arXiv:2004.05574 [pdf, other]

PrivEdge: From Local to Distributed Private Training and Prediction

Authors: Ali Shahin Shamsabadi, Adria Gascon, Hamed Haddadi, Andrea Cavallaro

Abstract: Machine Learning as a Service (MLaaS) operators provide model training and prediction on the cloud. MLaaS applications often rely on centralised collection and aggregation of user data, which could lead to significant privacy concerns when dealing with sensitive personal data. To address this problem, we propose PrivEdge, a technique for privacy-preserving MLaaS that safeguards the privacy of user… ▽ More Machine Learning as a Service (MLaaS) operators provide model training and prediction on the cloud. MLaaS applications often rely on centralised collection and aggregation of user data, which could lead to significant privacy concerns when dealing with sensitive personal data. To address this problem, we propose PrivEdge, a technique for privacy-preserving MLaaS that safeguards the privacy of users who provide their data for training, as well as users who use the prediction service. With PrivEdge, each user independently uses their private data to locally train a one-class reconstructive adversarial network that succinctly represents their training data. As sending the model parameters to the service provider in the clear would reveal private information, PrivEdge secret-shares the parameters among two non-colluding MLaaS providers, to then provide cryptographically private prediction services through secure multi-party computation techniques. We quantify the benefits of PrivEdge and compare its performance with state-of-the-art centralised architectures on three privacy-sensitive image-based tasks: individual identification, writer identification, and handwritten letter recognition. Experimental results show that PrivEdge has high precision and recall in preserving privacy, as well as in distinguishing between private and non-private images. Moreover, we show the robustness of PrivEdge to image compression and biased training data. The source code is available at https://github.com/smartcameras/PrivEdge. △ Less

Submitted 12 April, 2020; originally announced April 2020.

Comments: IEEE Transactions on Information Forensics and Security (TIFS)

arXiv:2002.00817 [pdf, other]

doi 10.1145/3372297.3417242

Private Summation in the Multi-Message Shuffle Model

Authors: Borja Balle, James Bell, Adria Gascon, Kobbi Nissim

Abstract: The shuffle model of differential privacy (Erlingsson et al. SODA 2019; Cheu et al. EUROCRYPT 2019) and its close relative encode-shuffle-analyze (Bittau et al. SOSP 2017) provide a fertile middle ground between the well-known local and central models. Similarly to the local model, the shuffle model assumes an untrusted data collector who receives privatized messages from users, but in this case a… ▽ More The shuffle model of differential privacy (Erlingsson et al. SODA 2019; Cheu et al. EUROCRYPT 2019) and its close relative encode-shuffle-analyze (Bittau et al. SOSP 2017) provide a fertile middle ground between the well-known local and central models. Similarly to the local model, the shuffle model assumes an untrusted data collector who receives privatized messages from users, but in this case a secure shuffler is used to transmit messages from users to the collector in a way that hides which messages came from which user. An interesting feature of the shuffle model is that increasing the amount of messages sent by each user can lead to protocols with accuracies comparable to the ones achievable in the central model. In particular, for the problem of privately computing the sum of $n$ bounded real values held by $n$ different users, Cheu et al. showed that $O(\sqrt{n})$ messages per user suffice to achieve $O(1)$ error (the optimal rate in the central model), while Balle et al. (CRYPTO 2019) recently showed that a single message per user leads to $Θ(n^{1/3})$ MSE (mean squared error), a rate strictly in-between what is achievable in the local and central models. This paper introduces two new protocols for summation in the shuffle model with improved accuracy and communication trade-offs. Our first contribution is a recursive construction based on the protocol from Balle et al. mentioned above, providing $\mathrm{poly}(\log \log n)$ error with $O(\log \log n)$ messages per user. The second contribution is a protocol with $O(1)$ error and $O(1)$ messages per user based on a novel analysis of the reduction from secure summation to shuffling introduced by Ishai et al. (FOCS 2006) (the original reduction required $O(\log n)$ messages per user). △ Less

Submitted 19 December, 2022; v1 submitted 3 February, 2020; originally announced February 2020.

Comments: Published at CCS'20

arXiv:1912.04977 [pdf, other]

Advances and Open Problems in Federated Learning

Authors: Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson , et al. (34 additional authors not shown)

Abstract: Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs re… ▽ More Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges. △ Less

Submitted 8 March, 2021; v1 submitted 10 December, 2019; originally announced December 2019.

Comments: Published in Foundations and Trends in Machine Learning Vol 4 Issue 1. See: https://www.nowpublishers.com/article/Details/MAL-083

arXiv:1911.02624 [pdf, other]

Data Generation for Neural Programming by Example

Authors: Judith Clymo, Haik Manukian, Nathanaël Fijalkow, Adrià Gascón, Brooks Paige

Abstract: Programming by example is the problem of synthesizing a program from a small set of input / output pairs. Recent works applying machine learning methods to this task show promise, but are typically reliant on generating synthetic examples for training. A particular challenge lies in generating meaningful sets of inputs and outputs, which well-characterize a given program and accurately demonstrate… ▽ More Programming by example is the problem of synthesizing a program from a small set of input / output pairs. Recent works applying machine learning methods to this task show promise, but are typically reliant on generating synthetic examples for training. A particular challenge lies in generating meaningful sets of inputs and outputs, which well-characterize a given program and accurately demonstrate its behavior. Where examples used for testing are generated by the same method as training data then the performance of a model may be partly reliant on this similarity. In this paper we introduce a novel approach using an SMT solver to synthesize inputs which cover a diverse set of behaviors for a given program. We carry out a case study comparing this method to existing synthetic data generation procedures in the literature, and find that data generated using our approach improves both the discriminatory power of example sets and the ability of trained machine learning models to generalize to unfamiliar data. △ Less

Submitted 6 November, 2019; originally announced November 2019.

arXiv:1910.03861 [pdf, other]

Private Protocols for U-Statistics in the Local Model and Beyond

Authors: James Bell, Aurélien Bellet, Adrià Gascón, Tejas Kulkarni

Abstract: In this paper, we study the problem of computing $U$-statistics of degree $2$, i.e., quantities that come in the form of averages over pairs of data points, in the local model of differential privacy (LDP). The class of $U$-statistics covers many statistical estimates of interest, including Gini mean difference, Kendall's tau coefficient and Area under the ROC Curve (AUC), as well as empirical ris… ▽ More In this paper, we study the problem of computing $U$-statistics of degree $2$, i.e., quantities that come in the form of averages over pairs of data points, in the local model of differential privacy (LDP). The class of $U$-statistics covers many statistical estimates of interest, including Gini mean difference, Kendall's tau coefficient and Area under the ROC Curve (AUC), as well as empirical risk measures for machine learning problems such as ranking, clustering and metric learning. We first introduce an LDP protocol based on quantizing the data into bins and applying randomized response, which guarantees an $ε$-LDP estimate with a Mean Squared Error (MSE) of $O(1/\sqrt{n}ε)$ under regularity assumptions on the $U$-statistic or the data distribution. We then propose a specialized protocol for AUC based on a novel use of hierarchical histograms that achieves MSE of $O(α^3/nε^2)$ for arbitrary data distribution. We also show that 2-party secure computation allows to design a protocol with MSE of $O(1/nε^2)$, without any assumption on the kernel function or data distribution and with total communication linear in the number of users $n$. Finally, we evaluate the performance of our protocols through experiments on synthetic and real datasets. △ Less

Submitted 2 March, 2020; v1 submitted 9 October, 2019; originally announced October 2019.

Comments: Accepted to AISTATS 2020

arXiv:1909.11225 [pdf, ps, other]

Improved Summation from Shuffling

Authors: Borja Balle, James Bell, Adria Gascon, Kobbi Nissim

Abstract: A protocol by Ishai et al.\ (FOCS 2006) showing how to implement distributed $n$-party summation from secure shuffling has regained relevance in the context of the recently proposed \emph{shuffle model} of differential privacy, as it allows to attain the accuracy levels of the curator model at a moderate communication cost. To achieve statistical security $2^{-σ}$, the protocol by Ishai et al.\ re… ▽ More A protocol by Ishai et al.\ (FOCS 2006) showing how to implement distributed $n$-party summation from secure shuffling has regained relevance in the context of the recently proposed \emph{shuffle model} of differential privacy, as it allows to attain the accuracy levels of the curator model at a moderate communication cost. To achieve statistical security $2^{-σ}$, the protocol by Ishai et al.\ requires the number of messages sent by each party to {\em grow} logarithmically with $n$ as $O(\log n + σ)$. In this note we give an improved analysis achieving a dependency of the form $O(1+σ/\log n)$. Conceptually, this addresses the intuitive question left open by Ishai et al.\ of whether the shuffling step in their protocol provides a "hiding in the crowd" amplification effect as $n$ increases. From a practical perspective, our analysis provides explicit constants and shows, for example, that the method of Ishai et al.\ applied to summation of $32$-bit numbers from $n=10^4$ parties sending $12$ messages each provides statistical security $2^{-40}$. △ Less

Submitted 24 September, 2019; originally announced September 2019.

arXiv:1907.03372 [pdf, other]

QUOTIENT: Two-Party Secure Neural Network Training and Prediction

Authors: Nitin Agrawal, Ali Shahin Shamsabadi, Matt J. Kusner, Adrià Gascón

Abstract: Recently, there has been a wealth of effort devoted to the design of secure protocols for machine learning tasks. Much of this is aimed at enabling secure prediction from highly-accurate Deep Neural Networks (DNNs). However, as DNNs are trained on data, a key question is how such models can be also trained securely. The few prior works on secure DNN training have focused either on designing custom… ▽ More Recently, there has been a wealth of effort devoted to the design of secure protocols for machine learning tasks. Much of this is aimed at enabling secure prediction from highly-accurate Deep Neural Networks (DNNs). However, as DNNs are trained on data, a key question is how such models can be also trained securely. The few prior works on secure DNN training have focused either on designing custom protocols for existing training algorithms, or on developing tailored training algorithms and then applying generic secure protocols. In this work, we investigate the advantages of designing training algorithms alongside a novel secure protocol, incorporating optimizations on both fronts. We present QUOTIENT, a new method for discretized training of DNNs, along with a customized secure two-party protocol for it. QUOTIENT incorporates key components of state-of-the-art DNN training such as layer normalization and adaptive gradient methods, and improves upon the state-of-the-art in DNN training in two-party computation. Compared to prior work, we obtain an improvement of 50X in WAN time and 6% in absolute accuracy. △ Less

Submitted 7 July, 2019; originally announced July 2019.

arXiv:1906.09116 [pdf, ps, other]

Differentially Private Summation with Multi-Message Shuffling

Authors: Borja Balle, James Bell, Adria Gascon, Kobbi Nissim

Abstract: In recent work, Cheu et al. (Eurocrypt 2019) proposed a protocol for $n$-party real summation in the shuffle model of differential privacy with $O_{ε, δ}(1)$ error and $Θ(ε\sqrt{n})$ one-bit messages per party. In contrast, every local model protocol for real summation must incur error $Ω(1/\sqrt{n})$, and there exist protocols matching this lower bound which require just one bit of communication… ▽ More In recent work, Cheu et al. (Eurocrypt 2019) proposed a protocol for $n$-party real summation in the shuffle model of differential privacy with $O_{ε, δ}(1)$ error and $Θ(ε\sqrt{n})$ one-bit messages per party. In contrast, every local model protocol for real summation must incur error $Ω(1/\sqrt{n})$, and there exist protocols matching this lower bound which require just one bit of communication per party. Whether this gap in number of messages is necessary was left open by Cheu et al. In this note we show a protocol with $O(1/ε)$ error and $O(\log(n/δ))$ messages of size $O(\log(n))$ per party. This protocol is based on the work of Ishai et al.\ (FOCS 2006) showing how to implement distributed summation from secure shuffling, and the observation that this allows simulating the Laplace mechanism in the shuffle model. △ Less

Submitted 21 August, 2019; v1 submitted 20 June, 2019; originally announced June 2019.

arXiv:1903.02837 [pdf, other]

The Privacy Blanket of the Shuffle Model

Authors: Borja Balle, James Bell, Adria Gascon, Kobbi Nissim

Abstract: This work studies differential privacy in the context of the recently proposed shuffle model. Unlike in the local model, where the server collecting privatized data from users can track back an input to a specific user, in the shuffle model users submit their privatized inputs to a server anonymously. This setup yields a trust model which sits in between the classical curator and local models for… ▽ More This work studies differential privacy in the context of the recently proposed shuffle model. Unlike in the local model, where the server collecting privatized data from users can track back an input to a specific user, in the shuffle model users submit their privatized inputs to a server anonymously. This setup yields a trust model which sits in between the classical curator and local models for differential privacy. The shuffle model is the core idea in the Encode, Shuffle, Analyze (ESA) model introduced by Bittau et al. (SOPS 2017). Recent work by Cheu et al. (EUROCRYPT 2019) analyzes the differential privacy properties of the shuffle model and shows that in some cases shuffled protocols provide strictly better accuracy than local protocols. Additionally, Erlingsson et al. (SODA 2019) provide a privacy amplification bound quantifying the level of curator differential privacy achieved by the shuffle model in terms of the local differential privacy of the randomizer used by each user. In this context, we make three contributions. First, we provide an optimal single message protocol for summation of real numbers in the shuffle model. Our protocol is very simple and has better accuracy and communication than the protocols for this same problem proposed by Cheu et al. Optimality of this protocol follows from our second contribution, a new lower bound for the accuracy of private protocols for summation of real numbers in the shuffle model. The third contribution is a new amplification bound for analyzing the privacy of protocols in the shuffle model in terms of the privacy provided by the corresponding local randomizer. Our amplification bound generalizes the results by Erlingsson et al. to a wider range of parameters, and provides a whole family of methods to analyze privacy amplification in the shuffle model. △ Less

Submitted 2 June, 2019; v1 submitted 7 March, 2019; originally announced March 2019.

arXiv:1806.03461 [pdf, other]

TAPAS: Tricks to Accelerate (encrypted) Prediction As a Service

Authors: Amartya Sanyal, Matt J. Kusner, Adrià Gascón, Varun Kanade

Abstract: Machine learning methods are widely used for a variety of prediction problems. \emph{Prediction as a service} is a paradigm in which service providers with technological expertise and computational resources may perform predictions for clients. However, data privacy severely restricts the applicability of such services, unless measures to keep client data private (even from the service provider) a… ▽ More Machine learning methods are widely used for a variety of prediction problems. \emph{Prediction as a service} is a paradigm in which service providers with technological expertise and computational resources may perform predictions for clients. However, data privacy severely restricts the applicability of such services, unless measures to keep client data private (even from the service provider) are designed. Equally important is to minimize the amount of computation and communication required between client and server. Fully homomorphic encryption offers a possible way out, whereby clients may encrypt their data, and on which the server may perform arithmetic computations. The main drawback of using fully homomorphic encryption is the amount of time required to evaluate large machine learning models on encrypted data. We combine ideas from the machine learning literature, particularly work on binarization and sparsification of neural networks, together with algorithmic tools to speed-up and parallelize computation using encrypted data. △ Less

Submitted 9 June, 2018; originally announced June 2018.

Comments: Accepted at International Conference in Machine Learning (ICML), 2018

arXiv:1806.03281 [pdf, other]

Blind Justice: Fairness with Encrypted Sensitive Attributes

Authors: Niki Kilbertus, Adrià Gascón, Matt J. Kusner, Michael Veale, Krishna P. Gummadi, Adrian Weller

Abstract: Recent work has explored how to train machine learning models which do not discriminate against any subgroup of the population as determined by sensitive attributes such as gender or race. To avoid disparate treatment, sensitive attributes should not be considered. On the other hand, in order to avoid disparate impact, sensitive attributes must be examined, e.g., in order to learn a fair model, or… ▽ More Recent work has explored how to train machine learning models which do not discriminate against any subgroup of the population as determined by sensitive attributes such as gender or race. To avoid disparate treatment, sensitive attributes should not be considered. On the other hand, in order to avoid disparate impact, sensitive attributes must be examined, e.g., in order to learn a fair model, or to check if a given model is fair. We introduce methods from secure multi-party computation which allow us to avoid both. By encrypting sensitive attributes, we show how an outcome-based fair model may be learned, checked, or have its outputs verified and held to account, without users revealing their sensitive attributes. △ Less

Submitted 8 June, 2018; originally announced June 2018.

Comments: published at ICML 2018

Journal ref: Proceedings of the 35th International Conference on Machine Learning, PMLR 80:2630-2639, 2018

arXiv:1805.12482 [pdf, ps, other]

How to Simulate It in Isabelle: Towards Formal Proof for Secure Multi-Party Computation

Authors: David Butler, David Aspinall, Adria Gascon

Abstract: In cryptography, secure Multi-Party Computation (MPC) protocols allow participants to compute a function jointly while keeping their inputs private. Recent breakthroughs are bringing MPC into practice, solving fundamental challenges for secure distributed computation. Just as with classic protocols for encryption and key exchange, precise guarantees are needed for MPC designs and implementations;… ▽ More In cryptography, secure Multi-Party Computation (MPC) protocols allow participants to compute a function jointly while keeping their inputs private. Recent breakthroughs are bringing MPC into practice, solving fundamental challenges for secure distributed computation. Just as with classic protocols for encryption and key exchange, precise guarantees are needed for MPC designs and implementations; any flaw will give attackers a chance to break privacy or correctness. In this paper we present the first (as far as we know) formalisation of some MPC security proofs. These proofs provide probabilistic guarantees in the computational model of security, but have a different character to machine proofs and proof tools implemented so far --- MPC proofs use a \emph{simulation} approach, in which security is established by showing indistinguishability between execution traces in the actual protocol execution and an ideal world where security is guaranteed by definition. We show that existing machinery for reasoning about probabilistic programs adapted to this setting, paving the way to precisely check a new class of cryptography arguments. We implement our proofs using the CryptHOL framework inside Isabelle/HOL. △ Less

Submitted 31 May, 2018; originally announced May 2018.

arXiv:1802.05490 [pdf, ps, other]

Grammar-based Compression of Unranked Trees

Authors: Adrià Gascón, Markus Lohrey, Sebastian Maneth, Carl Philipp Reh, Kurt Sieber

Abstract: We introduce forest straight-line programs (FSLPs) as a compressed representation of unranked ordered node-labelled trees. FSLPs are based on the operations of forest algebra and generalize tree straight-line programs. We compare the succinctness of FSLPs with two other compression schemes for unranked trees: top dags and tree straight-line programs of first-child/next sibling encodings. Efficient… ▽ More We introduce forest straight-line programs (FSLPs) as a compressed representation of unranked ordered node-labelled trees. FSLPs are based on the operations of forest algebra and generalize tree straight-line programs. We compare the succinctness of FSLPs with two other compression schemes for unranked trees: top dags and tree straight-line programs of first-child/next sibling encodings. Efficient translations between these formalisms are provided. Finally, we show that equality of unranked trees in the setting where certain symbols are associative or commutative can be tested in polynomial time. This generalizes previous results for testing isomorphism of compressed unordered ranked trees. △ Less

Submitted 15 February, 2018; originally announced February 2018.

Comments: Extended version of a paper at CSR 2018

MSC Class: 68P30; 68Q42 ACM Class: E.4

arXiv:1407.5392 [pdf, ps, other]

doi 10.4204/EPTCS.157.5

Synthesis of a simple self-stabilizing system

Authors: Adrià Gascón, Ashish Tiwari

Abstract: With the increasing importance of distributed systems as a computing paradigm, a systematic approach to their design is needed. Although the area of formal verification has made enormous advances towards this goal, the resulting functionalities are limited to detecting problems in a particular design. By means of a classical example, we illustrate a simple template-based approach to computer-ai… ▽ More With the increasing importance of distributed systems as a computing paradigm, a systematic approach to their design is needed. Although the area of formal verification has made enormous advances towards this goal, the resulting functionalities are limited to detecting problems in a particular design. By means of a classical example, we illustrate a simple template-based approach to computer-aided design of distributed systems based on leveraging the well-known technique of bounded model checking to the synthesis setting. △ Less

Submitted 21 July, 2014; originally announced July 2014.

Comments: In Proceedings SYNT 2014, arXiv:1407.4937

ACM Class: F.3.1; C.2.4

Journal ref: EPTCS 157, 2014, pp. 5-16

arXiv:1003.1632 [pdf, ps, other]

Unification and Matching on Compressed Terms

Authors: Adrià Gascón, Guillem Godoy, Manfred Schmidt-Schauß

Abstract: Term unification plays an important role in many areas of computer science, especially in those related to logic. The universal mechanism of grammar-based compression for terms, in particular the so-called Singleton Tree Grammars (STG), have recently drawn considerable attention. Using STGs, terms of exponential size and height can be represented in linear space. Furthermore, the term representa… ▽ More Term unification plays an important role in many areas of computer science, especially in those related to logic. The universal mechanism of grammar-based compression for terms, in particular the so-called Singleton Tree Grammars (STG), have recently drawn considerable attention. Using STGs, terms of exponential size and height can be represented in linear space. Furthermore, the term representation by directed acyclic graphs (dags) can be efficiently simulated. The present paper is the result of an investigation on term unification and matching when the terms given as input are represented using different compression mechanisms for terms such as dags and Singleton Tree Grammars. We describe a polynomial time algorithm for context matching with dags, when the number of different context variables is fixed for the problem. For the same problem, NP-completeness is obtained when the terms are represented using the more general formalism of Singleton Tree Grammars. For first-order unification and matching polynomial time algorithms are presented, each of them improving previous results for those problems. △ Less

Submitted 8 March, 2010; originally announced March 2010.

Comments: This paper is posted at the Computing Research Repository (CoRR) as part of the process of submission to the journal ACM Transactions on Computational Logic (TOCL).

ACM Class: F.4.1; F.4.2

Showing 1–22 of 22 results for author: Gascón, A