Skip to main content

Showing 1–44 of 44 results for author: Balle, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.08918  [pdf, other

    cs.CR cs.AI cs.LG math.ST stat.ML

    Beyond the Calibration Point: Mechanism Comparison in Differential Privacy

    Authors: Georgios Kaissis, Stefan Kolek, Borja Balle, Jamie Hayes, Daniel Rueckert

    Abstract: In differentially private (DP) machine learning, the privacy guarantees of DP mechanisms are often reported and compared on the basis of a single $(\varepsilon, δ)$-pair. This practice overlooks that DP guarantees can vary substantially even between mechanisms sharing a given $(\varepsilon, δ)$, and potentially introduces privacy vulnerabilities which can remain undetected. This motivates the need… ▽ More

    Submitted 10 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  2. arXiv:2405.05175  [pdf, other

    cs.CR cs.CL cs.LG

    Air Gap: Protecting Privacy-Conscious Conversational Agents

    Authors: Eugene Bagdasaryan, Ren Yi, Sahra Ghalebikesabi, Peter Kairouz, Marco Gruteser, Sewoong Oh, Borja Balle, Daniel Ramage

    Abstract: The growing use of large language model (LLM)-based conversational agents to manage sensitive user data raises significant privacy concerns. While these agents excel at understanding and acting on context, this capability can be exploited by malicious actors. We introduce a novel threat model where adversarial third-party apps manipulate the context of interaction to trick LLM-based agents into re… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  3. arXiv:2404.16244  [pdf, other

    cs.CY

    The Ethics of Advanced AI Assistants

    Authors: Iason Gabriel, Arianna Manzini, Geoff Keeling, Lisa Anne Hendricks, Verena Rieser, Hasan Iqbal, Nenad Tomašev, Ira Ktena, Zachary Kenton, Mikel Rodriguez, Seliem El-Sayed, Sasha Brown, Canfer Akbulut, Andrew Trask, Edward Hughes, A. Stevie Bergman, Renee Shelby, Nahema Marchal, Conor Griffin, Juan Mateos-Garcia, Laura Weidinger, Winnie Street, Benjamin Lange, Alex Ingerman, Alison Lentz , et al. (32 additional authors not shown)

    Abstract: This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, pro… ▽ More

    Submitted 28 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  4. arXiv:2402.06137  [pdf, other

    cs.LG cs.CR

    On the Privacy of Selection Mechanisms with Gaussian Noise

    Authors: Jonathan Lebensold, Doina Precup, Borja Balle

    Abstract: Report Noisy Max and Above Threshold are two classical differentially private (DP) selection mechanisms. Their output is obtained by adding noise to a sequence of low-sensitivity queries and reporting the identity of the query whose (noisy) answer satisfies a certain condition. Pure DP guarantees for these mechanisms are easy to obtain when Laplace noise is added to the queries. On the other hand,… ▽ More

    Submitted 21 March, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: AISTATS 2024

  5. arXiv:2308.10888  [pdf, other

    cs.LG cs.CV cs.CY

    Unlocking Accuracy and Fairness in Differentially Private Image Classification

    Authors: Leonard Berrada, Soham De, Judy Hanwen Shen, Jamie Hayes, Robert Stanforth, David Stutz, Pushmeet Kohli, Samuel L. Smith, Borja Balle

    Abstract: Privacy-preserving machine learning aims to train models on private data without leaking sensitive information. Differential privacy (DP) is considered the gold standard framework for privacy-preserving training, as it provides formal privacy guarantees. However, compared to their non-private counterparts, models trained with DP often have significantly reduced accuracy. Private classifiers are al… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

  6. arXiv:2306.00135  [pdf, other

    cs.FL

    Optimal Approximate Minimization of One-Letter Weighted Finite Automata

    Authors: Clara Lacroce, Borja Balle, Prakash Panangaden, Guillaume Rabusseau

    Abstract: In this paper, we study the approximate minimization problem of weighted finite automata (WFAs): to compute the best possible approximation of a WFA given a bound on the number of states. By reformulating the problem in terms of Hankel matrices, we leverage classical results on the approximation of Hankel operators, namely the celebrated Adamyan-Arov-Krein (AAK) theory. We solve the optimal spec… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

    Comments: 32 pages. arXiv admin note: substantial text overlap with arXiv:2102.06860

  7. arXiv:2305.10867  [pdf, other

    cs.CR

    Amplification by Shuffling without Shuffling

    Authors: Borja Balle, James Bell, Adrià Gascón

    Abstract: Motivated by recent developments in the shuffle model of differential privacy, we propose a new approximate shuffling functionality called Alternating Shuffle, and provide a protocol implementing alternating shuffling in a single-server threat model where the adversary observes all communication. Unlike previous shuffling protocols in this threat model, the per-client communication of our protocol… ▽ More

    Submitted 7 September, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Journal ref: CCS 2023

  8. arXiv:2302.13861  [pdf, other

    cs.LG cs.CR cs.CV stat.ML

    Differentially Private Diffusion Models Generate Useful Synthetic Images

    Authors: Sahra Ghalebikesabi, Leonard Berrada, Sven Gowal, Ira Ktena, Robert Stanforth, Jamie Hayes, Soham De, Samuel L. Smith, Olivia Wiles, Borja Balle

    Abstract: The ability to generate privacy-preserving synthetic versions of sensitive image datasets could unlock numerous ML applications currently constrained by data availability. Due to their astonishing image generation quality, diffusion models are a prime candidate for generating high-quality synthetic data. However, recent studies have found that, by default, the outputs of some diffusion models do n… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

  9. arXiv:2302.07956  [pdf, other

    cs.LG cs.CR

    Tight Auditing of Differentially Private Machine Learning

    Authors: Milad Nasr, Jamie Hayes, Thomas Steinke, Borja Balle, Florian Tramèr, Matthew Jagielski, Nicholas Carlini, Andreas Terzis

    Abstract: Auditing mechanisms for differential privacy use probabilistic means to empirically estimate the privacy level of an algorithm. For private machine learning, existing auditing mechanisms are tight: the empirical privacy estimate (nearly) matches the algorithm's provable privacy guarantee. But these auditing techniques suffer from two limitations. First, they only give tight estimates under implaus… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

  10. arXiv:2302.07225  [pdf, other

    cs.CR cs.LG

    Bounding Training Data Reconstruction in DP-SGD

    Authors: Jamie Hayes, Saeed Mahloujifar, Borja Balle

    Abstract: Differentially private training offers a protection which is usually interpreted as a guarantee against membership inference attacks. By proxy, this guarantee extends to other threats like reconstruction attacks attempting to extract complete training examples. Recent works provide evidence that if one does not need to protect against membership attacks but instead only wants to protect against tr… ▽ More

    Submitted 30 October, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

    Comments: New experiments and comparison with related work

  11. arXiv:2301.13188  [pdf, other

    cs.CR cs.CV cs.LG

    Extracting Training Data from Diffusion Models

    Authors: Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, Eric Wallace

    Abstract: Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images. In this work, we show that diffusion models memorize individual images from their training data and emit them at generation time. With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

  12. arXiv:2204.13650  [pdf, other

    cs.LG cs.CR cs.CV stat.ML

    Unlocking High-Accuracy Differentially Private Image Classification through Scale

    Authors: Soham De, Leonard Berrada, Jamie Hayes, Samuel L. Smith, Borja Balle

    Abstract: Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points. Differentially Private Stochastic Gradient Descent (DP-SGD), the most popular DP training method for deep learning, realizes this protection by injecting noise during training. However previous works have found th… ▽ More

    Submitted 16 June, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

  13. arXiv:2201.04845  [pdf, other

    cs.CR cs.LG

    Reconstructing Training Data with Informed Adversaries

    Authors: Borja Balle, Giovanni Cherubin, Jamie Hayes

    Abstract: Given access to a machine learning model, can an adversary reconstruct the model's training data? This work studies this question from the lens of a powerful informed adversary who knows all the training data points except one. By instantiating concrete attacks, we show it is feasible to reconstruct the remaining data point in this stringent threat model. For convex models (e.g. logistic regressio… ▽ More

    Submitted 25 April, 2022; v1 submitted 13 January, 2022; originally announced January 2022.

    Comments: Published at "2022 IEEE Symposium on Security and Privacy (SP)"

  14. arXiv:2201.02265  [pdf, other

    cs.LG

    Learning to be adversarially robust and differentially private

    Authors: Jamie Hayes, Borja Balle, M. Pawan Kumar

    Abstract: We study the difficulties in learning that arise from robust and differentially private optimization. We first study convergence of gradient descent based adversarial training with differential privacy, taking a simple binary classification task on linearly separable data as an illustrative example. We compare the gap between adversarial and nominal risk in both private and non-private settings, s… ▽ More

    Submitted 6 January, 2022; originally announced January 2022.

    Comments: Preliminary work appeared at PPML 2021

  15. arXiv:2112.04359  [pdf, other

    cs.CL cs.AI cs.CY

    Ethical and social risks of harm from Language Models

    Authors: Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh, Zac Kenton, Sasha Brown, Will Hawkins, Tom Stepleton, Courtney Biles, Abeba Birhane, Julia Haas, Laura Rimell, Lisa Anne Hendricks, William Isaac, Sean Legassick, Geoffrey Irving, Iason Gabriel

    Abstract: This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary expertise and literature from computer science, linguist… ▽ More

    Submitted 8 December, 2021; originally announced December 2021.

  16. arXiv:2102.08093   

    stat.ML cs.LG

    A Law of Robustness for Weight-bounded Neural Networks

    Authors: Hisham Husain, Borja Balle

    Abstract: Robustness of deep neural networks against adversarial perturbations is a pressing concern motivated by recent findings showing the pervasive nature of such vulnerabilities. One method of characterizing the robustness of a neural network model is through its Lipschitz constant, which forms a robustness certificate. A natural question to ask is, for a fixed model class (such as neural networks) and… ▽ More

    Submitted 12 March, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

    Comments: The main result does not resolve the conjecture as claimed. However the proof technique can be used to obtain a weaker result. The manuscript will be updated at a later date

  17. arXiv:2102.06860  [pdf, ps, other

    cs.FL

    Optimal Spectral-Norm Approximate Minimization of Weighted Finite Automata

    Authors: Borja Balle, Clara Lacroce, Prakash Panangaden, Doina Precup, Guillaume Rabusseau

    Abstract: We address the approximate minimization problem for weighted finite automata (WFAs) with weights in $\mathbb{R}$, over a one-letter alphabet: to compute the best possible approximation of a WFA given a bound on the number of states. This work is grounded in Adamyan-Arov-Krein Approximation theory, a remarkable collection of results on the approximation of Hankel operators. In addition to its intri… ▽ More

    Submitted 17 May, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

    Comments: Full version of ICALP2021 paper, authors are listed in alphabetical order

  18. arXiv:2009.09052  [pdf, ps, other

    cs.LG cs.CR stat.ML

    Private Reinforcement Learning with PAC and Regret Guarantees

    Authors: Giuseppe Vietri, Borja Balle, Akshay Krishnamurthy, Zhiwei Steven Wu

    Abstract: Motivated by high-stakes decision-making domains like personalized medicine where user information is inherently sensitive, we design privacy preserving exploration policies for episodic reinforcement learning (RL). We first provide a meaningful privacy formulation using the notion of joint differential privacy (JDP)--a strong variant of differential privacy for settings where each user receives t… ▽ More

    Submitted 18 September, 2020; originally announced September 2020.

  19. arXiv:2007.06605  [pdf, other

    cs.LG cs.CR stat.ML

    Privacy Amplification via Random Check-Ins

    Authors: Borja Balle, Peter Kairouz, H. Brendan McMahan, Om Thakkar, Abhradeep Thakurta

    Abstract: Differentially Private Stochastic Gradient Descent (DP-SGD) forms a fundamental building block in many applications for learning over sensitive data. Two standard approaches, privacy amplification by subsampling, and privacy amplification by shuffling, permit adding lower noise in DP-SGD than via naïve schemes. A key assumption in both these approaches is that the elements in the data set can be u… ▽ More

    Submitted 30 July, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

    Comments: Updated proof for $(ε_0, δ_0)$-DP local randomizers

  20. Private Summation in the Multi-Message Shuffle Model

    Authors: Borja Balle, James Bell, Adria Gascon, Kobbi Nissim

    Abstract: The shuffle model of differential privacy (Erlingsson et al. SODA 2019; Cheu et al. EUROCRYPT 2019) and its close relative encode-shuffle-analyze (Bittau et al. SOSP 2017) provide a fertile middle ground between the well-known local and central models. Similarly to the local model, the shuffle model assumes an untrusted data collector who receives privatized messages from users, but in this case a… ▽ More

    Submitted 19 December, 2022; v1 submitted 3 February, 2020; originally announced February 2020.

    Comments: Published at CCS'20

  21. arXiv:1910.08902  [pdf, ps, other

    cs.LG cs.CL cs.CR stat.ML

    Privacy- and Utility-Preserving Textual Analysis via Calibrated Multivariate Perturbations

    Authors: Oluwaseyi Feyisetan, Borja Balle, Thomas Drake, Tom Diethe

    Abstract: Accurately learning from user data while providing quantifiable privacy guarantees provides an opportunity to build better ML models while maintaining user trust. This paper presents a formal approach to carrying out privacy preserving text perturbation using the notion of dx-privacy designed to achieve geo-indistinguishability in location data. Our approach applies carefully calibrated noise to v… ▽ More

    Submitted 20 October, 2019; originally announced October 2019.

    Comments: Accepted at WSDM 2020

  22. arXiv:1910.05876  [pdf, other

    cs.LG stat.ML

    Actor Critic with Differentially Private Critic

    Authors: Jonathan Lebensold, William Hamilton, Borja Balle, Doina Precup

    Abstract: Reinforcement learning algorithms are known to be sample inefficient, and often performance on one task can be substantially improved by leveraging information (e.g., via pre-training) on other related tasks. In this work, we propose a technique to achieve such knowledge transfer in cases where agent trajectories contain sensitive or private information, such as in the healthcare domain. Our appro… ▽ More

    Submitted 13 October, 2019; originally announced October 2019.

    Comments: 6 Pages, Presented at the Privacy in Machine Learning Workshop, NeurIPS 2019

  23. arXiv:1909.11225  [pdf, ps, other

    cs.CR

    Improved Summation from Shuffling

    Authors: Borja Balle, James Bell, Adria Gascon, Kobbi Nissim

    Abstract: A protocol by Ishai et al.\ (FOCS 2006) showing how to implement distributed $n$-party summation from secure shuffling has regained relevance in the context of the recently proposed \emph{shuffle model} of differential privacy, as it allows to attain the accuracy levels of the curator model at a moderate communication cost. To achieve statistical security $2^{-σ}$, the protocol by Ishai et al.\ re… ▽ More

    Submitted 24 September, 2019; originally announced September 2019.

  24. arXiv:1906.09116  [pdf, ps, other

    cs.CR stat.ML

    Differentially Private Summation with Multi-Message Shuffling

    Authors: Borja Balle, James Bell, Adria Gascon, Kobbi Nissim

    Abstract: In recent work, Cheu et al. (Eurocrypt 2019) proposed a protocol for $n$-party real summation in the shuffle model of differential privacy with $O_{ε, δ}(1)$ error and $Θ(ε\sqrt{n})$ one-bit messages per party. In contrast, every local model protocol for real summation must incur error $Ω(1/\sqrt{n})$, and there exist protocols matching this lower bound which require just one bit of communication… ▽ More

    Submitted 21 August, 2019; v1 submitted 20 June, 2019; originally announced June 2019.

  25. arXiv:1905.12264  [pdf, ps, other

    cs.LG cs.CR math.PR stat.ML

    Privacy Amplification by Mixing and Diffusion Mechanisms

    Authors: Borja Balle, Gilles Barthe, Marco Gaboardi, Joseph Geumlek

    Abstract: A fundamental result in differential privacy states that the privacy guarantees of a mechanism are preserved by any post-processing of its output. In this paper we investigate under what conditions stochastic post-processing can amplify the privacy of a mechanism. By interpreting post-processing as the application of a Markov operator, we first give a series of amplification results in terms of un… ▽ More

    Submitted 27 October, 2019; v1 submitted 29 May, 2019; originally announced May 2019.

  26. arXiv:1905.11190  [pdf, other

    cs.LG cs.AI cs.LO stat.ML

    Model-Agnostic Counterfactual Explanations for Consequential Decisions

    Authors: Amir-Hossein Karimi, Gilles Barthe, Borja Balle, Isabel Valera

    Abstract: Predictive models are being increasingly used to support consequential decision making at the individual level in contexts such as pretrial bail and loan approval. As a result, there is increasing social and legal pressure to provide explanations that help the affected individuals not only to understand why a prediction was output, but also how to act to obtain a desired outcome. To this end, seve… ▽ More

    Submitted 28 February, 2020; v1 submitted 27 May, 2019; originally announced May 2019.

  27. arXiv:1905.10862  [pdf, other

    stat.ML cs.LG

    Automatic Discovery of Privacy-Utility Pareto Fronts

    Authors: Brendan Avent, Javier Gonzalez, Tom Diethe, Andrei Paleyes, Borja Balle

    Abstract: Differential privacy is a mathematical framework for privacy-preserving data analysis. Changing the hyperparameters of a differentially private algorithm allows one to trade off privacy and utility in a principled way. Quantifying this trade-off in advance is essential to decision-makers tasked with deciding how much privacy can be provided in a particular application while maintaining acceptable… ▽ More

    Submitted 21 July, 2020; v1 submitted 26 May, 2019; originally announced May 2019.

    Comments: Proceedings on Privacy Enhancing Technologies 2020

  28. arXiv:1905.09982  [pdf, other

    cs.LG stat.ML

    Hypothesis Testing Interpretations and Renyi Differential Privacy

    Authors: Borja Balle, Gilles Barthe, Marco Gaboardi, Justin Hsu, Tetsuya Sato

    Abstract: Differential privacy is a de facto standard in data privacy, with applications in the public and private sectors. A way to explain differential privacy, which is particularly appealing to statistician and social scientists is by means of its statistical hypothesis testing interpretation. Informally, one cannot effectively test whether a specific individual has contributed her data by observing the… ▽ More

    Submitted 8 October, 2019; v1 submitted 23 May, 2019; originally announced May 2019.

    Journal ref: Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:2496-2506, 2020

  29. arXiv:1903.11112  [pdf, other

    cs.LG cs.CL stat.ML

    Privacy-preserving Active Learning on Sensitive Data for User Intent Classification

    Authors: Oluwaseyi Feyisetan, Thomas Drake, Borja Balle, Tom Diethe

    Abstract: Active learning holds promise of significantly reducing data annotation costs while maintaining reasonable model performance. However, it requires sending data to annotators for labeling. This presents a possible privacy leak when the training set includes sensitive user data. In this paper, we describe an approach for carrying out privacy preserving active learning with quantifiable guarantees. W… ▽ More

    Submitted 26 March, 2019; originally announced March 2019.

    Comments: To appear at PAL: Privacy-Enhancing Artificial Intelligence and Language Technologies as part of the AAAI Spring Symposium Series (AAAI-SSS 2019)

  30. arXiv:1903.05202  [pdf, other

    stat.ML cs.LG

    Continual Learning in Practice

    Authors: Tom Diethe, Tom Borchert, Eno Thereska, Borja Balle, Neil Lawrence

    Abstract: This paper describes a reference architecture for self-maintaining systems that can learn continually, as data arrives. In environments where data evolves, we need architectures that manage Machine Learning (ML) models in production, adapt to shifting data distributions, cope with outliers, retrain when necessary, and adapt to new tasks. This represents continual AutoML or Automatically Adaptive M… ▽ More

    Submitted 18 March, 2019; v1 submitted 12 March, 2019; originally announced March 2019.

    Comments: Presented at the NeurIPS 2018 workshop on Continual Learning https://sites.google.com/view/continual2018/home

  31. arXiv:1903.02837  [pdf, other

    cs.LG cs.CR stat.ML

    The Privacy Blanket of the Shuffle Model

    Authors: Borja Balle, James Bell, Adria Gascon, Kobbi Nissim

    Abstract: This work studies differential privacy in the context of the recently proposed shuffle model. Unlike in the local model, where the server collecting privatized data from users can track back an input to a specific user, in the shuffle model users submit their privatized inputs to a server anonymously. This setup yields a trust model which sits in between the classical curator and local models for… ▽ More

    Submitted 2 June, 2019; v1 submitted 7 March, 2019; originally announced March 2019.

  32. arXiv:1810.07468  [pdf, other

    stat.ML cs.LG

    Hierarchical Methods of Moments

    Authors: Matteo Ruffini, Guillaume Rabusseau, Borja Balle

    Abstract: Spectral methods of moments provide a powerful tool for learning the parameters of latent variable models. Despite their theoretical appeal, the applicability of these methods to real data is still limited due to a lack of robustness to model misspecification. In this paper we present a hierarchical approach to methods of moments to circumvent such limitations. Our method is based on replacing the… ▽ More

    Submitted 17 October, 2018; originally announced October 2018.

    Comments: NIPS 2017

  33. arXiv:1808.00087  [pdf, other

    cs.LG cs.CR stat.ML

    Subsampled Rényi Differential Privacy and Analytical Moments Accountant

    Authors: Yu-Xiang Wang, Borja Balle, Shiva Kasiviswanathan

    Abstract: We study the problem of subsampling in differential privacy (DP), a question that is the centerpiece behind many successful differentially private machine learning algorithms. Specifically, we provide a tight upper bound on the Rényi Differential Privacy (RDP) (Mironov, 2017) parameters for algorithms that: (1) subsample the dataset, and then (2) applies a randomized mechanism M to the subsample,… ▽ More

    Submitted 4 December, 2018; v1 submitted 31 July, 2018; originally announced August 2018.

  34. arXiv:1807.01647  [pdf, other

    cs.LG cs.CR stat.ML

    Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences

    Authors: Borja Balle, Gilles Barthe, Marco Gaboardi

    Abstract: Differential privacy comes equipped with multiple analytical tools for the design of private data analyses. One important tool is the so-called "privacy amplification by subsampling" principle, which ensures that a differentially private mechanism run on a random subsample of a population provides higher privacy guarantees than when run on the entire population. Several instances of this principle… ▽ More

    Submitted 23 November, 2018; v1 submitted 4 July, 2018; originally announced July 2018.

    Comments: To appear in NeurIPS 2018

  35. arXiv:1805.06530  [pdf, other

    cs.LG stat.ML

    Improving the Gaussian Mechanism for Differential Privacy: Analytical Calibration and Optimal Denoising

    Authors: Borja Balle, Yu-Xiang Wang

    Abstract: The Gaussian mechanism is an essential building block used in multitude of differentially private data analysis algorithms. In this paper we revisit the Gaussian mechanism and show that the original analysis has several important limitations. Our analysis reveals that the variance formula for the original mechanism is far from tight in the high privacy regime ($\varepsilon \to 0$) and it cannot be… ▽ More

    Submitted 7 June, 2018; v1 submitted 16 May, 2018; originally announced May 2018.

    Comments: To appear at the 35th International Conference on Machine Learning (ICML), 2018

  36. Singular value automata and approximate minimization

    Authors: Borja Balle, Prakash Panangaden, Doina Precup

    Abstract: The present paper uses spectral theory of linear operators to construct approximately minimal realizations of weighted languages. Our new contributions are: (i) a new algorithm for the SVD decomposition of infinite Hankel matrices based on their representation in terms of weighted automata, (ii) a new canonical form for weighted automata arising from the SVD of its corresponding Hankel matrix and… ▽ More

    Submitted 27 May, 2019; v1 submitted 16 November, 2017; originally announced November 2017.

    Journal ref: Math. Struct. Comp. Sci. 29 (2019) 1444-1478

  37. arXiv:1702.08017  [pdf, ps, other

    cs.FL

    Bisimulation Metrics for Weighted Automata

    Authors: Borja Balle, Pascale Gourdeau, Prakash Panangaden

    Abstract: We develop a new bisimulation (pseudo)metric for weighted finite automata (WFA) that generalizes Boreale's linear bisimulation relation. Our metrics are induced by seminorms on the state space of WFA. Our development is based on spectral properties of sets of linear operators. In particular, the joint spectral radius of the transition matrices of WFA plays a central role. We also study continuity… ▽ More

    Submitted 14 May, 2017; v1 submitted 26 February, 2017; originally announced February 2017.

  38. arXiv:1610.07883  [pdf, other

    cs.LG cs.FL

    Generalization Bounds for Weighted Automata

    Authors: Borja Balle, Mehryar Mohri

    Abstract: This paper studies the problem of learning weighted automata from a finite labeled training sample. We consider several general families of weighted automata defined in terms of three different measures: the norm of an automaton's weights, the norm of the function computed by an automaton, or the norm of the corresponding Hankel matrix. We present new data-dependent generalization guarantees for l… ▽ More

    Submitted 25 October, 2016; originally announced October 2016.

  39. arXiv:1603.02010  [pdf, other

    cs.LG stat.ML

    Differentially Private Policy Evaluation

    Authors: Borja Balle, Maziar Gomrokchi, Doina Precup

    Abstract: We present the first differentially private algorithms for reinforcement learning, which apply to the task of evaluating a fixed policy. We establish two approaches for achieving differential privacy, provide a theoretical analysis of the privacy and utility of the two algorithms, and show promising results on simple empirical examples.

    Submitted 7 March, 2016; originally announced March 2016.

  40. arXiv:1511.01442  [pdf, other

    cs.LG cs.FL

    Low-Rank Approximation of Weighted Tree Automata

    Authors: Guillaume Rabusseau, Borja Balle, Shay B. Cohen

    Abstract: We describe a technique to minimize weighted tree automata (WTA), a powerful formalisms that subsumes probabilistic context-free grammars (PCFGs) and latent-variable PCFGs. Our method relies on a singular value decomposition of the underlying Hankel matrix defined by the WTA. Our main theoretical result is an efficient algorithm for computing the SVD of an infinite Hankel matrix implicitly represe… ▽ More

    Submitted 24 December, 2015; v1 submitted 4 November, 2015; originally announced November 2015.

    Comments: To appear in AISTATS 2016

  41. arXiv:1504.06840  [pdf, ps, other

    math.PR cs.DM math.CO

    Diameter and Stationary Distribution of Random $r$-out Digraphs

    Authors: Louigi Addario-Berry, Borja Balle, Guillem Perarnau

    Abstract: Let $D(n,r)$ be a random $r$-out regular directed multigraph on the set of vertices $\{1,\ldots,n\}$. In this work, we establish that for every $r \ge 2$, there exists $η_r>0$ such that $\text{diam}(D(n,r))=(1+η_r+o(1))\log_r{n}$. Our techniques also allow us to bound some extremal quantities related to the stationary distribution of a simple random walk on $D(n,r)$. In particular, we determine th… ▽ More

    Submitted 26 April, 2015; originally announced April 2015.

    Comments: 31 pages

  42. arXiv:1501.06841  [pdf, other

    cs.FL

    A Canonical Form for Weighted Automata and Applications to Approximate Minimization

    Authors: Borja Balle, Prakash Panangaden, Doina Precup

    Abstract: We study the problem of constructing approximations to a weighted automaton. Weighted finite automata (WFA) are closely related to the theory of rational series. A rational series is a function from strings to real numbers that can be computed by a finite WFA. Among others, this includes probability distributions generated by hidden Markov models and probabilistic automata. The relationship betwee… ▽ More

    Submitted 24 April, 2015; v1 submitted 27 January, 2015; originally announced January 2015.

  43. arXiv:1311.6830  [pdf, ps, other

    cs.FL cs.DM

    Ergodicity of Random Walks on Random DFA

    Authors: Borja Balle

    Abstract: Given a DFA we consider the random walk that starts at the initial state and at each time step moves to a new state by taking a random transition from the current state. This paper shows that for typical DFA this random walk induces an ergodic Markov chain. The notion of typical DFA is formalized by showing that ergodicity holds with high probability when a DFA is sampled uniformly at random from… ▽ More

    Submitted 26 November, 2013; originally announced November 2013.

  44. arXiv:1206.6393  [pdf

    cs.LG stat.ML

    Local Loss Optimization in Operator Models: A New Insight into Spectral Learning

    Authors: Borja Balle, Ariadna Quattoni, Xavier Carreras

    Abstract: This paper re-visits the spectral method for learning latent variable models defined in terms of observable operators. We give a new perspective on the method, showing that operators can be recovered by minimizing a loss defined on a finite subset of the domain. A non-convex optimization similar to the spectral method is derived. We also propose a regularized convex relaxation of this optimization… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)