Search | arXiv e-print repository

The PetShop Dataset -- Finding Causes of Performance Issues across Microservices

Authors: Michaela Hardt, William R. Orchard, Patrick Blöbaum, Shiva Kasiviswanathan, Elke Kirschbaum

Abstract: Identifying root causes for unexpected or undesirable behavior in complex systems is a prevalent challenge. This issue becomes especially crucial in modern cloud applications that employ numerous microservices. Although the machine learning and systems research communities have proposed various techniques to tackle this problem, there is currently a lack of standardized datasets for quantitative b… ▽ More Identifying root causes for unexpected or undesirable behavior in complex systems is a prevalent challenge. This issue becomes especially crucial in modern cloud applications that employ numerous microservices. Although the machine learning and systems research communities have proposed various techniques to tackle this problem, there is currently a lack of standardized datasets for quantitative benchmarking. Consequently, research groups are compelled to create their own datasets for experimentation. This paper introduces a dataset specifically designed for evaluating root cause analyses in microservice-based applications. The dataset encompasses latency, requests, and availability metrics emitted in 5-minute intervals from a distributed application. In addition to normal operation metrics, the dataset includes 68 injected performance issues, which increase latency and reduce availability throughout the system. We showcase how this dataset can be used to evaluate the accuracy of a variety of methods spanning different causal and non-causal characterisations of the root cause analysis problem. We hope the new dataset, available at https://github.com/amazon-science/petshop-root-cause-analysis/ enables further development of techniques in this important area. △ Less

Submitted 8 April, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

Comments: 22 pages, 6 figures, 10 tables, for associated git repo see https://github.com/amazon-science/petshop-root-cause-analysis/, to be published in Proceedings of Machine Learning Research vol 236, 2024, 3rd Conference on Causal Learning and Reasoning

ACM Class: E.0

arXiv:2306.06721 [pdf, other]

Differentially Private Conditional Independence Testing

Authors: Iden Kalemaj, Shiva Prasad Kasiviswanathan, Aaditya Ramdas

Abstract: Conditional independence (CI) tests are widely used in statistical data analysis, e.g., they are the building block of many algorithms for causal graph discovery. The goal of a CI test is to accept or reject the null hypothesis that $X \perp \!\!\! \perp Y \mid Z$, where $X \in \mathbb{R}, Y \in \mathbb{R}, Z \in \mathbb{R}^d$. In this work, we investigate conditional independence testing under th… ▽ More Conditional independence (CI) tests are widely used in statistical data analysis, e.g., they are the building block of many algorithms for causal graph discovery. The goal of a CI test is to accept or reject the null hypothesis that $X \perp \!\!\! \perp Y \mid Z$, where $X \in \mathbb{R}, Y \in \mathbb{R}, Z \in \mathbb{R}^d$. In this work, we investigate conditional independence testing under the constraint of differential privacy. We design two private CI testing procedures: one based on the generalized covariance measure of Shah and Peters (2020) and another based on the conditional randomization test of Candès et al. (2016) (under the model-X assumption). We provide theoretical guarantees on the performance of our tests and validate them empirically. These are the first private CI tests with rigorous theoretical guarantees that work for the general case when $Z$ is continuous. △ Less

Submitted 22 March, 2024; v1 submitted 11 June, 2023; originally announced June 2023.

arXiv:2304.10613 [pdf, other]

Debiasing Conditional Stochastic Optimization

Authors: Lie He, Shiva Prasad Kasiviswanathan

Abstract: In this paper, we study the conditional stochastic optimization (CSO) problem which covers a variety of applications including portfolio selection, reinforcement learning, robust learning, causal inference, etc. The sample-averaged gradient of the CSO objective is biased due to its nested structure, and therefore requires a high sample complexity for convergence. We introduce a general stochastic… ▽ More In this paper, we study the conditional stochastic optimization (CSO) problem which covers a variety of applications including portfolio selection, reinforcement learning, robust learning, causal inference, etc. The sample-averaged gradient of the CSO objective is biased due to its nested structure, and therefore requires a high sample complexity for convergence. We introduce a general stochastic extrapolation technique that effectively reduces the bias. We show that for nonconvex smooth objectives, combining this extrapolation with variance reduction techniques can achieve a significantly better sample complexity than the existing bounds. Additionally, we develop new algorithms for the finite-sum variant of the CSO problem that also significantly improve upon existing results. Finally, we believe that our debiasing technique has the potential to be a useful tool for addressing similar challenges in other stochastic optimization problems. △ Less

Submitted 3 December, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

arXiv:2302.00860 [pdf, other]

Interventional and Counterfactual Inference with Diffusion Models

Authors: Patrick Chao, Patrick Blöbaum, Shiva Prasad Kasiviswanathan

Abstract: We consider the problem of answering observational, interventional, and counterfactual queries in a causally sufficient setting where only observational data and the causal graph are available. Utilizing the recent developments in diffusion models, we introduce diffusion-based causal models (DCM) to learn causal mechanisms, that generate unique latent encodings. These encodings enable us to direct… ▽ More We consider the problem of answering observational, interventional, and counterfactual queries in a causally sufficient setting where only observational data and the causal graph are available. Utilizing the recent developments in diffusion models, we introduce diffusion-based causal models (DCM) to learn causal mechanisms, that generate unique latent encodings. These encodings enable us to directly sample under interventions and perform abduction for counterfactuals. Diffusion models are a natural fit here, since they can encode each node to a latent representation that acts as a proxy for exogenous noise. Our empirical evaluations demonstrate significant improvements over existing state-of-the-art methods for answering causal queries. Furthermore, we provide theoretical results that offer a methodology for analyzing counterfactual estimation in general encoder-decoder models, which could be useful in settings beyond our proposed approach. △ Less

Submitted 6 June, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

arXiv:2301.05182 [pdf, other]

Thompson Sampling with Diffusion Generative Prior

Authors: Yu-Guan Hsieh, Shiva Prasad Kasiviswanathan, Branislav Kveton, Patrick Blöbaum

Abstract: In this work, we initiate the idea of using denoising diffusion models to learn priors for online decision making problems. Our special focus is on the meta-learning for bandit framework, with the goal of learning a strategy that performs well across bandit tasks of a same class. To this end, we train a diffusion model that learns the underlying task distribution and combine Thompson sampling with… ▽ More In this work, we initiate the idea of using denoising diffusion models to learn priors for online decision making problems. Our special focus is on the meta-learning for bandit framework, with the goal of learning a strategy that performs well across bandit tasks of a same class. To this end, we train a diffusion model that learns the underlying task distribution and combine Thompson sampling with the learned prior to deal with new tasks at test time. Our posterior sampling algorithm is designed to carefully balance between the learned prior and the noisy observations that come from the learner's interaction with the environment. To capture realistic bandit scenarios, we also propose a novel diffusion model training procedure that trains even from incomplete and/or noisy data, which could be of independent interest. Finally, our extensive experimental evaluations clearly demonstrate the potential of the proposed approach. △ Less

Submitted 30 January, 2023; v1 submitted 12 January, 2023; originally announced January 2023.

arXiv:2212.07383 [pdf, other]

Sequential Kernelized Independence Testing

Authors: Aleksandr Podkopaev, Patrick Blöbaum, Shiva Prasad Kasiviswanathan, Aaditya Ramdas

Abstract: Independence testing is a classical statistical problem that has been extensively studied in the batch setting when one fixes the sample size before collecting data. However, practitioners often prefer procedures that adapt to the complexity of a problem at hand instead of setting sample size in advance. Ideally, such procedures should (a) stop earlier on easy tasks (and later on harder tasks), he… ▽ More Independence testing is a classical statistical problem that has been extensively studied in the batch setting when one fixes the sample size before collecting data. However, practitioners often prefer procedures that adapt to the complexity of a problem at hand instead of setting sample size in advance. Ideally, such procedures should (a) stop earlier on easy tasks (and later on harder tasks), hence making better use of available resources, and (b) continuously monitor the data and efficiently incorporate statistical evidence after collecting new data, while controlling the false alarm rate. Classical batch tests are not tailored for streaming data: valid inference after data peeking requires correcting for multiple testing which results in low power. Following the principle of testing by betting, we design sequential kernelized independence tests that overcome such shortcomings. We exemplify our broad framework using bets inspired by kernelized dependence measures, e.g., the Hilbert-Schmidt independence criterion. Our test is also valid under non-i.i.d., time-varying settings. We demonstrate the power of our approaches on both simulated and real data. △ Less

Submitted 19 July, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

Comments: To appear at ICML 2023

arXiv:2206.04091 [pdf, other]

Uplifting Bandits

Authors: Yu-Guan Hsieh, Shiva Prasad Kasiviswanathan, Branislav Kveton

Abstract: We introduce a multi-armed bandit model where the reward is a sum of multiple random variables, and each action only alters the distributions of some of them. After each action, the agent observes the realizations of all the variables. This model is motivated by marketing campaigns and recommender systems, where the variables represent outcomes on individual customers, such as clicks. We propose U… ▽ More We introduce a multi-armed bandit model where the reward is a sum of multiple random variables, and each action only alters the distributions of some of them. After each action, the agent observes the realizations of all the variables. This model is motivated by marketing campaigns and recommender systems, where the variables represent outcomes on individual customers, such as clicks. We propose UCB-style algorithms that estimate the uplifts of the actions over a baseline. We study multiple variants of the problem, including when the baseline and affected variables are unknown, and prove sublinear regret bounds for all of these. We also provide lower bounds that justify the necessity of our modeling assumptions. Experiments on synthetic and real-world datasets show the benefit of methods that estimate the uplifts over policies that do not use this structure. △ Less

Submitted 8 June, 2022; originally announced June 2022.

arXiv:2107.03022 [pdf, other]

Reconstructing Test Labels from Noisy Loss Functions

Authors: Abhinav Aggarwal, Shiva Prasad Kasiviswanathan, Zekun Xu, Oluwaseyi Feyisetan, Nathanael Teissier

Abstract: Machine learning classifiers rely on loss functions for performance evaluation, often on a private (hidden) dataset. In a recent line of research, label inference was introduced as the problem of reconstructing the ground truth labels of this private dataset from just the (possibly perturbed) cross-entropy loss function values evaluated at chosen prediction vectors (without any other access to the… ▽ More Machine learning classifiers rely on loss functions for performance evaluation, often on a private (hidden) dataset. In a recent line of research, label inference was introduced as the problem of reconstructing the ground truth labels of this private dataset from just the (possibly perturbed) cross-entropy loss function values evaluated at chosen prediction vectors (without any other access to the hidden dataset). In this paper, we formally study the necessary and sufficient conditions under which label inference is possible from \emph{any} (noisy) loss function value. Using tools from analytical number theory, we show that a broad class of commonly used loss functions, including general Bregman divergence-based losses and multiclass cross-entropy with common activation functions like sigmoid and softmax, it is possible to design label inference attacks that succeed even for arbitrary noise levels and using only a single query from the adversary. We formally study the computational complexity of label inference and show that while in general, designing adversarial prediction vectors for these attacks is co-NP-hard, once we have these vectors, the attacks can also be carried out through a lightweight augmentation to any neural network model, making them look benign and hard to detect. The observations in this paper provide a deeper understanding of the vulnerabilities inherent in modern machine learning and could be used for designing future trustworthy ML. △ Less

Submitted 30 October, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

Comments: Accepted at NeurIPS 2021 Workshop on Privacy in Machine Learning (PriML)

arXiv:2106.03028 [pdf, other]

Collaborative Causal Discovery with Atomic Interventions

Authors: Raghavendra Addanki, Shiva Prasad Kasiviswanathan

Abstract: We introduce a new Collaborative Causal Discovery problem, through which we model a common scenario in which we have multiple independent entities each with their own causal graph, and the goal is to simultaneously learn all these causal graphs. We study this problem without the causal sufficiency assumption, using Maximal Ancestral Graphs (MAG) to model the causal graphs, and assuming that we hav… ▽ More We introduce a new Collaborative Causal Discovery problem, through which we model a common scenario in which we have multiple independent entities each with their own causal graph, and the goal is to simultaneously learn all these causal graphs. We study this problem without the causal sufficiency assumption, using Maximal Ancestral Graphs (MAG) to model the causal graphs, and assuming that we have the ability to actively perform independent single vertex (or atomic) interventions on the entities. If the $M$ underlying (unknown) causal graphs of the entities satisfy a natural notion of clustering, we give algorithms that leverage this property and recovers all the causal graphs using roughly logarithmic in $M$ number of atomic interventions per entity. These are significantly fewer than $n$ atomic interventions per entity required to learn each causal graph separately, where $n$ is the number of observable nodes in the causal graph. We complement our results with a lower bound and discuss various extensions of our collaborative setting. △ Less

Submitted 6 June, 2021; originally announced June 2021.

arXiv:2105.08266 [pdf, other]

Label Inference Attacks from Log-loss Scores

Authors: Abhinav Aggarwal, Shiva Prasad Kasiviswanathan, Zekun Xu, Oluwaseyi Feyisetan, Nathanael Teissier

Abstract: Log-loss (also known as cross-entropy loss) metric is ubiquitously used across machine learning applications to assess the performance of classification algorithms. In this paper, we investigate the problem of inferring the labels of a dataset from single (or multiple) log-loss score(s), without any other access to the dataset. Surprisingly, we show that for any finite number of label classes, it… ▽ More Log-loss (also known as cross-entropy loss) metric is ubiquitously used across machine learning applications to assess the performance of classification algorithms. In this paper, we investigate the problem of inferring the labels of a dataset from single (or multiple) log-loss score(s), without any other access to the dataset. Surprisingly, we show that for any finite number of label classes, it is possible to accurately infer the labels of the dataset from the reported log-loss score of a single carefully constructed prediction vector if we allow arbitrary precision arithmetic. Additionally, we present label inference algorithms (attacks) that succeed even under addition of noise to the log-loss scores and under limited precision arithmetic. All our algorithms rely on ideas from number theory and combinatorics and require no model training. We run experimental simulations on some real datasets to demonstrate the ease of running these attacks in practice. △ Less

Submitted 11 June, 2021; v1 submitted 18 May, 2021; originally announced May 2021.

Comments: Accepted at ICML 2021

arXiv:2005.11736 [pdf, other]

Efficient Intervention Design for Causal Discovery with Latents

Authors: Raghavendra Addanki, Shiva Prasad Kasiviswanathan, Andrew McGregor, Cameron Musco

Abstract: We consider recovering a causal graph in presence of latent variables, where we seek to minimize the cost of interventions used in the recovery process. We consider two intervention cost models: (1) a linear cost model where the cost of an intervention on a subset of variables has a linear form, and (2) an identity cost model where the cost of an intervention is the same, regardless of what variab… ▽ More We consider recovering a causal graph in presence of latent variables, where we seek to minimize the cost of interventions used in the recovery process. We consider two intervention cost models: (1) a linear cost model where the cost of an intervention on a subset of variables has a linear form, and (2) an identity cost model where the cost of an intervention is the same, regardless of what variables it is on, i.e., the goal is just to minimize the number of interventions. Under the linear cost model, we give an algorithm to identify the ancestral relations of the underlying causal graph, achieving within a $2$-factor of the optimal intervention cost. This approximation factor can be improved to $1+ε$ for any $ε> 0$ under some mild restrictions. Under the identity cost model, we bound the number of interventions needed to recover the entire causal graph, including the latent variables, using a parameterization of the causal graph through a special type of colliders. In particular, we introduce the notion of $p$-colliders, that are colliders between pair of nodes arising from a specific type of conditioning in the causal graph, and provide an upper bound on the number of interventions as a function of the maximum number of $p$-colliders between any two nodes in the causal graph. △ Less

Submitted 12 July, 2020; v1 submitted 24 May, 2020; originally announced May 2020.

Comments: International Conference on Machine Learning 2020

arXiv:1904.05510 [pdf, ps, other]

Restricted Isometry Property under High Correlations

Authors: Shiva Prasad Kasiviswanathan, Mark Rudelson

Abstract: Matrices satisfying the Restricted Isometry Property (RIP) play an important role in the areas of compressed sensing and statistical learning. RIP matrices with optimal parameters are mainly obtained via probabilistic arguments, as explicit constructions seem hard. It is therefore interesting to ask whether a fixed matrix can be incorporated into a construction of restricted isometries. In this pa… ▽ More Matrices satisfying the Restricted Isometry Property (RIP) play an important role in the areas of compressed sensing and statistical learning. RIP matrices with optimal parameters are mainly obtained via probabilistic arguments, as explicit constructions seem hard. It is therefore interesting to ask whether a fixed matrix can be incorporated into a construction of restricted isometries. In this paper, we construct a new broad ensemble of random matrices with dependent entries that satisfy the restricted isometry property. Our construction starts with a fixed (deterministic) matrix $X$ satisfying some simple stable rank condition, and we show that the matrix $XR$, where $R$ is a random matrix drawn from various popular probabilistic models (including, subgaussian, sparse, low-randomness, satisfying convex concentration property), satisfies the RIP with high probability. These theorems have various applications in signal recovery, random matrix theory, dimensionality reduction, etc. Additionally, motivated by an application for understanding the effectiveness of word vector embeddings popular in natural language processing and machine learning applications, we investigate the RIP of the matrix $XR^{(l)}$ where $R^{(l)}$ is formed by taking all possible (disregarding order) $l$-way entrywise products of the columns of a random matrix $R$. △ Less

Submitted 31 October, 2019; v1 submitted 10 April, 2019; originally announced April 2019.

Comments: 30 pages, fixed minor typos

arXiv:1808.00087 [pdf, other]

Subsampled Rényi Differential Privacy and Analytical Moments Accountant

Authors: Yu-Xiang Wang, Borja Balle, Shiva Kasiviswanathan

Abstract: We study the problem of subsampling in differential privacy (DP), a question that is the centerpiece behind many successful differentially private machine learning algorithms. Specifically, we provide a tight upper bound on the Rényi Differential Privacy (RDP) (Mironov, 2017) parameters for algorithms that: (1) subsample the dataset, and then (2) applies a randomized mechanism M to the subsample,… ▽ More We study the problem of subsampling in differential privacy (DP), a question that is the centerpiece behind many successful differentially private machine learning algorithms. Specifically, we provide a tight upper bound on the Rényi Differential Privacy (RDP) (Mironov, 2017) parameters for algorithms that: (1) subsample the dataset, and then (2) applies a randomized mechanism M to the subsample, in terms of the RDP parameters of M and the subsampling probability parameter. Our results generalize the moments accounting technique, developed by Abadi et al. (2016) for the Gaussian mechanism, to any subsampled RDP mechanism. △ Less

Submitted 4 December, 2018; v1 submitted 31 July, 2018; originally announced August 2018.

arXiv:1710.07850 [pdf, other]

Deep Neural Network Approximation using Tensor Sketching

Authors: Shiva Prasad Kasiviswanathan, Nina Narodytska, Hongxia Jin

Abstract: Deep neural networks are powerful learning models that achieve state-of-the-art performance on many computer vision, speech, and language processing tasks. In this paper, we study a fundamental question that arises when designing deep network architectures: Given a target network architecture can we design a smaller network architecture that approximates the operation of the target network? The qu… ▽ More Deep neural networks are powerful learning models that achieve state-of-the-art performance on many computer vision, speech, and language processing tasks. In this paper, we study a fundamental question that arises when designing deep network architectures: Given a target network architecture can we design a smaller network architecture that approximates the operation of the target network? The question is, in part, motivated by the challenge of parameter reduction (compression) in modern deep neural networks, as the ever increasing storage and memory requirements of these networks pose a problem in resource constrained environments. In this work, we focus on deep convolutional neural network architectures, and propose a novel randomized tensor sketching technique that we utilize to develop a unified framework for approximating the operation of both the convolutional and fully connected layers. By applying the sketching technique along different tensor dimensions, we design changes to the convolutional and fully connected layers that substantially reduce the number of effective parameters in a network. We show that the resulting smaller network can be trained directly, and has a classification accuracy that is comparable to the original network. △ Less

Submitted 21 October, 2017; originally announced October 2017.

Comments: 19 pages

arXiv:1709.06662 [pdf, other]

Verifying Properties of Binarized Deep Neural Networks

Authors: Nina Narodytska, Shiva Prasad Kasiviswanathan, Leonid Ryzhyk, Mooly Sagiv, Toby Walsh

Abstract: Understanding properties of deep neural networks is an important challenge in deep learning. In this paper, we take a step in this direction by proposing a rigorous way of verifying properties of a popular class of neural networks, Binarized Neural Networks, using the well-developed means of Boolean satisfiability. Our main contribution is a construction that creates a representation of a binarize… ▽ More Understanding properties of deep neural networks is an important challenge in deep learning. In this paper, we take a step in this direction by proposing a rigorous way of verifying properties of a popular class of neural networks, Binarized Neural Networks, using the well-developed means of Boolean satisfiability. Our main contribution is a construction that creates a representation of a binarized neural network as a Boolean formula. Our encoding is the first exact Boolean representation of a deep neural network. Using this encoding, we leverage the power of modern SAT solvers along with a proposed counterexample-guided search procedure to verify various properties of these networks. A particular focus will be on the critical property of robustness to adversarial perturbations. For this property, our experimental results demonstrate that our approach scales to medium-size deep neural networks used in image classification tasks. To the best of our knowledge, this is the first work on verifying properties of deep neural networks using an exact Boolean encoding of the network. △ Less

Submitted 31 May, 2018; v1 submitted 19 September, 2017; originally announced September 2017.

Comments: 10 pages

arXiv:1707.08092 [pdf, other]

Restricted Eigenvalue from Stable Rank with Applications to Sparse Linear Regression

Authors: Shiva Prasad Kasiviswanathan, Mark Rudelson

Abstract: High-dimensional settings, where the data dimension ($d$) far exceeds the number of observations ($n$), are common in many statistical and machine learning applications. Methods based on $\ell_1$-relaxation, such as Lasso, are very popular for sparse recovery in these settings. Restricted Eigenvalue (RE) condition is among the weakest, and hence the most general, condition in literature imposed on… ▽ More High-dimensional settings, where the data dimension ($d$) far exceeds the number of observations ($n$), are common in many statistical and machine learning applications. Methods based on $\ell_1$-relaxation, such as Lasso, are very popular for sparse recovery in these settings. Restricted Eigenvalue (RE) condition is among the weakest, and hence the most general, condition in literature imposed on the Gram matrix that guarantees nice statistical properties for the Lasso estimator. It is natural to ask: what families of matrices satisfy the RE condition? Following a line of work in this area, we construct a new broad ensemble of dependent random design matrices that have an explicit RE bound. Our construction starts with a fixed (deterministic) matrix $X \in \mathbb{R}^{n \times d}$ satisfying a simple stable rank condition, and we show that a matrix drawn from the distribution $X Φ^\top Φ$, where $Φ\in \mathbb{R}^{m \times d}$ is a subgaussian random matrix, with high probability, satisfies the RE condition. This construction allows incorporating a fixed matrix that has an easily {\em verifiable} condition into the design process, and allows for generation of {\em compressed} design matrices that have a lower storage requirement than a standard design matrix. We give two applications of this construction to sparse linear regression problems, including one to a compressed sparse regression setting where the regression algorithm only has access to a compressed representation of a fixed design matrix $X$. △ Less

Submitted 17 February, 2018; v1 submitted 25 July, 2017; originally announced July 2017.

Comments: 27 pages, Updated paper with stronger results, Corrected Applications

arXiv:1701.01093 [pdf, ps, other]

Private Incremental Regression

Authors: Shiva Prasad Kasiviswanathan, Kobbi Nissim, Hongxia Jin

Abstract: Data is continuously generated by modern data sources, and a recent challenge in machine learning has been to develop techniques that perform well in an incremental (streaming) setting. In this paper, we investigate the problem of private machine learning, where as common in practice, the data is not given at once, but rather arrives incrementally over time. We introduce the problems of private… ▽ More Data is continuously generated by modern data sources, and a recent challenge in machine learning has been to develop techniques that perform well in an incremental (streaming) setting. In this paper, we investigate the problem of private machine learning, where as common in practice, the data is not given at once, but rather arrives incrementally over time. We introduce the problems of private incremental ERM and private incremental regression where the general goal is to always maintain a good empirical risk minimizer for the history observed under differential privacy. Our first contribution is a generic transformation of private batch ERM mechanisms into private incremental ERM mechanisms, based on a simple idea of invoking the private batch ERM procedure at some regular time intervals. We take this construction as a baseline for comparison. We then provide two mechanisms for the private incremental regression problem. Our first mechanism is based on privately constructing a noisy incremental gradient function, which is then used in a modified projected gradient procedure at every timestep. This mechanism has an excess empirical risk of $\approx\sqrt{d}$, where $d$ is the dimensionality of the data. While from the results of [Bassily et al. 2014] this bound is tight in the worst-case, we show that certain geometric properties of the input and constraint set can be used to derive significantly better results for certain interesting regression problems. △ Less

Submitted 4 January, 2017; originally announced January 2017.

Comments: To appear in PODS 2017

arXiv:1612.06299 [pdf, other]

Simple Black-Box Adversarial Perturbations for Deep Networks

Authors: Nina Narodytska, Shiva Prasad Kasiviswanathan

Abstract: Deep neural networks are powerful and popular learning models that achieve state-of-the-art pattern recognition performance on many computer vision, speech, and language processing tasks. However, these networks have also been shown susceptible to carefully crafted adversarial perturbations which force misclassification of the inputs. Adversarial examples enable adversaries to subvert the expected… ▽ More Deep neural networks are powerful and popular learning models that achieve state-of-the-art pattern recognition performance on many computer vision, speech, and language processing tasks. However, these networks have also been shown susceptible to carefully crafted adversarial perturbations which force misclassification of the inputs. Adversarial examples enable adversaries to subvert the expected system behavior leading to undesired consequences and could pose a security risk when these systems are deployed in the real world. In this work, we focus on deep convolutional neural networks and demonstrate that adversaries can easily craft adversarial examples even without any internal knowledge of the target network. Our attacks treat the network as an oracle (black-box) and only assume that the output of the network can be observed on the probed inputs. Our first attack is based on a simple idea of adding perturbation to a randomly selected single pixel or a small set of them. We then improve the effectiveness of this attack by carefully constructing a small set of pixels to perturb by using the idea of greedy local-search. Our proposed attacks also naturally extend to a stronger notion of misclassification. Our extensive experimental results illustrate that even these elementary attacks can reveal a deep neural network's vulnerabilities. The simplicity and effectiveness of our proposed schemes mean that they could serve as a litmus test for designing robust networks. △ Less

Submitted 19 December, 2016; originally announced December 2016.

Comments: 19 Pages

arXiv:1504.05880 [pdf, other]

Spectral Norm of Random Kernel Matrices with Applications to Privacy

Authors: Shiva Prasad Kasiviswanathan, Mark Rudelson

Abstract: Kernel methods are an extremely popular set of techniques used for many important machine learning and data analysis applications. In addition to having good practical performances, these methods are supported by a well-developed theory. Kernel methods use an implicit mapping of the input data into a high dimensional feature space defined by a kernel function, i.e., a function returning the inner… ▽ More Kernel methods are an extremely popular set of techniques used for many important machine learning and data analysis applications. In addition to having good practical performances, these methods are supported by a well-developed theory. Kernel methods use an implicit mapping of the input data into a high dimensional feature space defined by a kernel function, i.e., a function returning the inner product between the images of two data points in the feature space. Central to any kernel method is the kernel matrix, which is built by evaluating the kernel function on a given sample dataset. In this paper, we initiate the study of non-asymptotic spectral theory of random kernel matrices. These are n x n random matrices whose (i,j)th entry is obtained by evaluating the kernel function on $x_i$ and $x_j$, where $x_1,...,x_n$ are a set of n independent random high-dimensional vectors. Our main contribution is to obtain tight upper bounds on the spectral norm (largest eigenvalue) of random kernel matrices constructed by commonly used kernel functions based on polynomials and Gaussian radial basis. As an application of these results, we provide lower bounds on the distortion needed for releasing the coefficients of kernel ridge regression under attribute privacy, a general privacy notion which captures a large class of privacy definitions. Kernel ridge regression is standard method for performing non-parametric regression that regularly outperforms traditional regression approaches in various domains. Our privacy distortion lower bounds are the first for any kernel technique, and our analysis assumes realistic scenarios for the input, unlike all previous lower bounds for other release problems which only hold under very restrictive input settings. △ Less

Submitted 22 April, 2015; originally announced April 2015.

Comments: 16 pages, 1 Figure

ACM Class: F.2.1

arXiv:1210.2381 [pdf, ps, other]

The Power of Linear Reconstruction Attacks

Authors: Shiva Prasad Kasiviswanathan, Mark Rudelson, Adam Smith

Abstract: We consider the power of linear reconstruction attacks in statistical data privacy, showing that they can be applied to a much wider range of settings than previously understood. Linear attacks have been studied before (Dinur and Nissim PODS'03, Dwork, McSherry and Talwar STOC'07, Kasiviswanathan, Rudelson, Smith and Ullman STOC'10, De TCC'12, Muthukrishnan and Nikolov STOC'12) but have so far bee… ▽ More We consider the power of linear reconstruction attacks in statistical data privacy, showing that they can be applied to a much wider range of settings than previously understood. Linear attacks have been studied before (Dinur and Nissim PODS'03, Dwork, McSherry and Talwar STOC'07, Kasiviswanathan, Rudelson, Smith and Ullman STOC'10, De TCC'12, Muthukrishnan and Nikolov STOC'12) but have so far been applied only in settings with releases that are obviously linear. Consider a database curator who manages a database of sensitive information but wants to release statistics about how a sensitive attribute (say, disease) in the database relates to some nonsensitive attributes (e.g., postal code, age, gender, etc). We show one can mount linear reconstruction attacks based on any release that gives: a) the fraction of records that satisfy a given non-degenerate boolean function. Such releases include contingency tables (previously studied by Kasiviswanathan et al., STOC'10) as well as more complex outputs like the error rate of classifiers such as decision trees; b) any one of a large class of M-estimators (that is, the output of empirical risk minimization algorithms), including the standard estimators for linear and logistic regression. We make two contributions: first, we show how these types of releases can be transformed into a linear format, making them amenable to existing polynomial-time reconstruction algorithms. This is already perhaps surprising, since many of the above releases (like M-estimators) are obtained by solving highly nonlinear formulations. Second, we show how to analyze the resulting attacks under various distributional assumptions on the data. Specifically, we consider a setting in which the same statistic (either a) or b) above) is released about how the sensitive attribute relates to all subsets of size k (out of a total of d) nonsensitive boolean attributes. △ Less

Submitted 8 October, 2012; originally announced October 2012.

Comments: 30 pages, to appear in ACM-SIAM Symposium on Discrete Algorithms (SODA 2013)

arXiv:1206.6384 [pdf]

Efficient and Practical Stochastic Subgradient Descent for Nuclear Norm Regularization

Authors: Haim Avron, Satyen Kale, Shiva Kasiviswanathan, Vikas Sindhwani

Abstract: We describe novel subgradient methods for a broad class of matrix optimization problems involving nuclear norm regularization. Unlike existing approaches, our method executes very cheap iterations by combining low-rank stochastic subgradients with efficient incremental SVD updates, made possible by highly optimized and parallelizable dense linear algebra operations on small matrices. Our practical… ▽ More We describe novel subgradient methods for a broad class of matrix optimization problems involving nuclear norm regularization. Unlike existing approaches, our method executes very cheap iterations by combining low-rank stochastic subgradients with efficient incremental SVD updates, made possible by highly optimized and parallelizable dense linear algebra operations on small matrices. Our practical algorithms always maintain a low-rank factorization of iterates that can be conveniently held in memory and efficiently multiplied to generate predictions in matrix completion settings. Empirical comparisons confirm that our approach is highly competitive with several recently proposed state-of-the-art solvers for such problems. △ Less

Submitted 27 June, 2012; originally announced June 2012.

Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

arXiv:1010.3605 [pdf, other]

The rigidity transition in random graphs

Authors: Shiva Prasad Kasiviswanathan, Cristopher Moore, Louis Theran

Abstract: As we add rigid bars between points in the plane, at what point is there a giant (linear-sized) rigid component, which can be rotated and translated, but which has no internal flexibility? If the points are generic, this depends only on the combinatorics of the graph formed by the bars. We show that if this graph is an Erdos-Renyi random graph G(n,c/n), then there exists a sharp threshold for a gi… ▽ More As we add rigid bars between points in the plane, at what point is there a giant (linear-sized) rigid component, which can be rotated and translated, but which has no internal flexibility? If the points are generic, this depends only on the combinatorics of the graph formed by the bars. We show that if this graph is an Erdos-Renyi random graph G(n,c/n), then there exists a sharp threshold for a giant rigid component to emerge. For c < c_2, w.h.p. all rigid components span one, two, or three vertices, and when c > c_2, w.h.p. there is a giant rigid component. The constant c_2 \approx 3.588 is the threshold for 2-orientability, discovered independently by Fernholz and Ramachandran and Cain, Sanders, and Wormald in SODA'07. We also give quantitative bounds on the size of the giant rigid component when it emerges, proving that it spans a (1-o(1))-fraction of the vertices in the (3+2)-core. Informally, the (3+2)-core is maximal induced subgraph obtained by starting from the 3-core and then inductively adding vertices with 2 neighbors in the graph obtained so far. △ Less

Submitted 26 November, 2010; v1 submitted 18 October, 2010; originally announced October 2010.

Comments: To appear in SODA'11. Added proofs omitted from the proceedings version

arXiv:0906.1953 [pdf, ps, other]

An Exponential Time 2-Approximation Algorithm for Bandwidth

Authors: Martin Fürer, Serge Gaspers, Shiva Prasad Kasiviswanathan

Abstract: The bandwidth of a graph G on n vertices is the minimum b such that the vertices of G can be labeled from 1 to n such that the labels of every pair of adjacent vertices differ by at most b. In this paper, we present a 2-approximation algorithm for the bandwidth problem that takes worst-case O(1.9797^n) time and uses polynomial space. This improves both the previous best 2- and 3-approximation al… ▽ More The bandwidth of a graph G on n vertices is the minimum b such that the vertices of G can be labeled from 1 to n such that the labels of every pair of adjacent vertices differ by at most b. In this paper, we present a 2-approximation algorithm for the bandwidth problem that takes worst-case O(1.9797^n) time and uses polynomial space. This improves both the previous best 2- and 3-approximation algorithms of Cygan et al. which have an O(3^n) and O(2^n) worst-case time bounds, respectively. Our algorithm is based on constructing bucket decompositions of the input graph. A bucket decomposition partitions the vertex set of a graph into ordered sets (called buckets) of (almost) equal sizes such that all edges are either incident to vertices in the same bucket or to vertices in two consecutive buckets. The idea is to find the smallest bucket size for which there exists a bucket decomposition. The algorithm uses a simple divide-and-conquer strategy along with dynamic programming to achieve this improved time bound. △ Less

Submitted 29 April, 2012; v1 submitted 10 June, 2009; originally announced June 2009.

ACM Class: F.2.2; G.2.2

arXiv:0806.2287 [pdf, other]

doi 10.1017/S0963548314000339

Approximately Counting Embeddings into Random Graphs

Authors: Martin Furer, Shiva Prasad Kasiviswanathan

Abstract: Let H be a graph, and let C_H(G) be the number of (subgraph isomorphic) copies of H contained in a graph G. We investigate the fundamental problem of estimating C_H(G). Previous results cover only a few specific instances of this general problem, for example, the case when H has degree at most one (monomer-dimer problem). In this paper, we present the first general subcase of the subgraph isomorph… ▽ More Let H be a graph, and let C_H(G) be the number of (subgraph isomorphic) copies of H contained in a graph G. We investigate the fundamental problem of estimating C_H(G). Previous results cover only a few specific instances of this general problem, for example, the case when H has degree at most one (monomer-dimer problem). In this paper, we present the first general subcase of the subgraph isomorphism counting problem which is almost always efficiently approximable. The results rely on a new graph decomposition technique. Informally, the decomposition is a labeling of the vertices such that every edge is between vertices with different labels and for every vertex all neighbors with a higher label have identical labels. The labeling implicitly generates a sequence of bipartite graphs which permits us to break the problem of counting embeddings of large subgraphs into that of counting embeddings of small subgraphs. Using this method, we present a simple randomized algorithm for the counting problem. For all decomposable graphs H and all graphs G, the algorithm is an unbiased estimator. Furthermore, for all graphs H having a decomposition where each of the bipartite graphs generated is small and almost all graphs G, the algorithm is a fully polynomial randomized approximation scheme. We show that the graph classes of H for which we obtain a fully polynomial randomized approximation scheme for almost all G includes graphs of degree at most two, bounded-degree forests, bounded-length grid graphs, subdivision of bounded-degree graphs, and major subclasses of outerplanar graphs, series-parallel graphs and planar graphs, whereas unbounded-length grid graphs are excluded. △ Less

Submitted 21 June, 2013; v1 submitted 13 June, 2008; originally announced June 2008.

Comments: Earlier version appeared in Random 2008. Fixed an typo in Definition 3.1

Journal ref: Combinator. Probab. Comp. 23 (2014) 1028-1056

arXiv:0803.3946 [pdf, ps, other]

doi 10.29012/jpc.v6i1.634

On the `Semantics' of Differential Privacy: A Bayesian Formulation

Authors: Shiva Prasad Kasiviswanathan, Adam Smith

Abstract: Differential privacy is a definition of "privacy'" for algorithms that analyze and publish information about statistical databases. It is often claimed that differential privacy provides guarantees against adversaries with arbitrary side information. In this paper, we provide a precise formulation of these guarantees in terms of the inferences drawn by a Bayesian adversary. We show that this formu… ▽ More Differential privacy is a definition of "privacy'" for algorithms that analyze and publish information about statistical databases. It is often claimed that differential privacy provides guarantees against adversaries with arbitrary side information. In this paper, we provide a precise formulation of these guarantees in terms of the inferences drawn by a Bayesian adversary. We show that this formulation is satisfied by both "vanilla" differential privacy as well as a relaxation known as (epsilon,delta)-differential privacy. Our formulation follows the ideas originally due to Dwork and McSherry [Dwork 2006]. This paper is, to our knowledge, the first place such a formulation appears explicitly. The analysis of the relaxed definition is new to this paper, and provides some concrete guidance for setting parameters when using (epsilon,delta)-differential privacy. △ Less

Submitted 22 January, 2023; v1 submitted 27 March, 2008; originally announced March 2008.

Comments: Older version of this paper was titled: "A Note on Differential Privacy: Defining Resistance to Arbitrary Side Information"

Journal ref: Journal of Privacy and Confidentiality, 6 (1), 2014

arXiv:0803.0924 [pdf, other]

What Can We Learn Privately?

Authors: Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, Adam Smith

Abstract: Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large real-life data sets. We ask: what concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or specific training example? More precisely, we investigate learning algorithms that satisfy different… ▽ More Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large real-life data sets. We ask: what concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or specific training example? More precisely, we investigate learning algorithms that satisfy differential privacy, a notion that provides strong confidentiality guarantees in contexts where aggregate information is released about a database containing sensitive information about individuals. We demonstrate that, ignoring computational constraints, it is possible to privately agnostically learn any concept class using a sample size approximately logarithmic in the cardinality of the concept class. Therefore, almost anything learnable is learnable privately: specifically, if a concept class is learnable by a (non-private) algorithm with polynomial sample complexity and output size, then it can be learned privately using a polynomial number of samples. We also present a computationally efficient private PAC learner for the class of parity functions. Local (or randomized response) algorithms are a practical class of private algorithms that have received extensive investigation. We provide a precise characterization of local private learning algorithms. We show that a concept class is learnable by a local algorithm if and only if it is learnable in the statistical query (SQ) model. Finally, we present a separation between the power of interactive and noninteractive local learning algorithms. △ Less

Submitted 18 February, 2010; v1 submitted 6 March, 2008; originally announced March 2008.

Comments: 35 pages, 2 figures

Journal ref: SIAM Journal of Computing 40(3) (2011) 793-826

arXiv:0803.0032 [pdf, other]

Composition Attacks and Auxiliary Information in Data Privacy

Authors: Srivatsava Ranjit Ganta, Shiva Prasad Kasiviswanathan, Adam Smith

Abstract: Privacy is an increasingly important aspect of data publishing. Reasoning about privacy, however, is fraught with pitfalls. One of the most significant is the auxiliary information (also called external knowledge, background knowledge, or side information) that an adversary gleans from other channels such as the web, public records, or domain knowledge. This paper explores how one can reason abo… ▽ More Privacy is an increasingly important aspect of data publishing. Reasoning about privacy, however, is fraught with pitfalls. One of the most significant is the auxiliary information (also called external knowledge, background knowledge, or side information) that an adversary gleans from other channels such as the web, public records, or domain knowledge. This paper explores how one can reason about privacy in the face of rich, realistic sources of auxiliary information. Specifically, we investigate the effectiveness of current anonymization schemes in preserving privacy when multiple organizations independently release anonymized data about overlapping populations. 1. We investigate composition attacks, in which an adversary uses independent anonymized releases to breach privacy. We explain why recently proposed models of limited auxiliary information fail to capture composition attacks. Our experiments demonstrate that even a simple instance of a composition attack can breach privacy in practice, for a large class of currently proposed techniques. The class includes k-anonymity and several recent variants. 2. On a more positive note, certain randomization-based notions of privacy (such as differential privacy) provably resist composition attacks and, in fact, the use of arbitrary side information. This resistance enables stand-alone design of anonymization schemes, without the need for explicitly keeping track of other releases. We provide a precise formulation of this property, and prove that an important class of relaxations of differential privacy also satisfy the property. This significantly enlarges the class of protocols known to enable modular design. △ Less

Submitted 31 March, 2008; v1 submitted 29 February, 2008; originally announced March 2008.

arXiv:cs/0605029 [pdf, ps, other]

Spanners for Geometric Intersection Graphs

Authors: Martin Furer, Shiva Prasad Kasiviswanathan

Abstract: Efficient algorithms are presented for constructing spanners in geometric intersection graphs. For a unit ball graph in R^k, a (1+ε)-spanner is obtained using efficient partitioning of the space into hypercubes and solving bichromatic closest pair problems. The spanner construction has almost equivalent complexity to the construction of Euclidean minimum spanning trees. The results are extended… ▽ More Efficient algorithms are presented for constructing spanners in geometric intersection graphs. For a unit ball graph in R^k, a (1+ε)-spanner is obtained using efficient partitioning of the space into hypercubes and solving bichromatic closest pair problems. The spanner construction has almost equivalent complexity to the construction of Euclidean minimum spanning trees. The results are extended to arbitrary ball graphs with a sub-quadratic running time. For unit ball graphs, the spanners have a small separator decomposition which can be used to obtain efficient algorithms for approximating proximity problems like diameter and distance queries. The results on compressed quadtrees, geometric graph separators, and diameter approximation might be of independent interest. △ Less

Submitted 7 May, 2006; originally announced May 2006.

Comments: 16 pages, 5 figures, Latex

ACM Class: F.2.2

Journal ref: Journal of Computational Geometry 3(1) (2012) 31-64

Showing 1–28 of 28 results for author: Kasiviswanathan, S