Privacy-and utility-preserving textual analysis via calibrated multivariate perturbations

O Feyisetan, B Balle, T Drake, T Diethe - … on web search and data mining, 2020 - dl.acm.org
Proceedings of the 13th international conference on web search and data mining, 2020dl.acm.org
Accurately learning from user data while providing quantifiable privacy guarantees provides
an opportunity to build better ML models while maintaining user trust. This paper presents a
formal approach to carrying out privacy preserving text perturbation using the notion of d_χ-
privacy designed to achieve geo-indistinguishability in location data. Our approach applies
carefully calibrated noise to vector representation of words in a high dimension space as
defined by word embedding models. We present a privacy proof that satisfies d_χ-privacy …
Accurately learning from user data while providing quantifiable privacy guarantees provides an opportunity to build better ML models while maintaining user trust. This paper presents a formal approach to carrying out privacy preserving text perturbation using the notion of d_χ-privacy designed to achieve geo-indistinguishability in location data. Our approach applies carefully calibrated noise to vector representation of words in a high dimension space as defined by word embedding models. We present a privacy proof that satisfies d_χ-privacy where the privacy parameter provides guarantees with respect to a distance metric defined by the word embedding space. We demonstrate how can be selected by analyzing plausible deniability statistics backed up by large scale analysis on GloVe and fastText embeddings. We conduct privacy audit experiments against baseline models and utility experiments on 3 datasets to demonstrate the tradeoff between privacy and utility for varying values of varepsilon on different task types. Our results demonstrate practical utility (< 2% utility loss for training binary classifiers) while providing better privacy guarantees than baseline models.
ACM Digital Library
Showing the best result for this search. See all results