Skip to main content

Showing 1–14 of 14 results for author: Feyisetan, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.11852  [pdf, other

    cs.CL cs.AI cs.LG

    Fast Training Dataset Attribution via In-Context Learning

    Authors: Milad Fotouhi, Mohammad Taha Bahadori, Oluwaseyi Feyisetan, Payman Arabshahi, David Heckerman

    Abstract: We investigate the use of in-context learning and prompt engineering to estimate the contributions of training data in the outputs of instruction-tuned large language models (LLMs). We propose two novel approaches: (1) a similarity-based approach that measures the difference between LLM outputs with and without provided context, and (2) a mixture distribution model approach that frames the problem… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  2. arXiv:2107.07928  [pdf, other

    cs.CR

    TEM: High Utility Metric Differential Privacy on Text

    Authors: Ricardo Silva Carvalho, Theodore Vasiloudis, Oluwaseyi Feyisetan

    Abstract: Ensuring the privacy of users whose data are used to train Natural Language Processing (NLP) models is necessary to build and maintain customer trust. Differential Privacy (DP) has emerged as the most successful method to protect the privacy of individuals. However, applying DP to the NLP domain comes with unique challenges. The most successful previous methods use a generalization of DP for metri… ▽ More

    Submitted 16 July, 2021; originally announced July 2021.

  3. arXiv:2107.07923  [pdf, other

    cs.CR

    BRR: Preserving Privacy of Text Data Efficiently on Device

    Authors: Ricardo Silva Carvalho, Theodore Vasiloudis, Oluwaseyi Feyisetan

    Abstract: With the use of personal devices connected to the Internet for tasks such as searches and shopping becoming ubiquitous, ensuring the privacy of the users of such services has become a requirement in order to build and maintain customer trust. While text privatization methods exist, they require the existence of a trusted party that collects user data before applying a privatization method to prese… ▽ More

    Submitted 16 July, 2021; originally announced July 2021.

  4. arXiv:2107.03022  [pdf, other

    cs.LG

    Reconstructing Test Labels from Noisy Loss Functions

    Authors: Abhinav Aggarwal, Shiva Prasad Kasiviswanathan, Zekun Xu, Oluwaseyi Feyisetan, Nathanael Teissier

    Abstract: Machine learning classifiers rely on loss functions for performance evaluation, often on a private (hidden) dataset. In a recent line of research, label inference was introduced as the problem of reconstructing the ground truth labels of this private dataset from just the (possibly perturbed) cross-entropy loss function values evaluated at chosen prediction vectors (without any other access to the… ▽ More

    Submitted 30 October, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

    Comments: Accepted at NeurIPS 2021 Workshop on Privacy in Machine Learning (PriML)

  5. arXiv:2105.08266  [pdf, other

    cs.LG

    Label Inference Attacks from Log-loss Scores

    Authors: Abhinav Aggarwal, Shiva Prasad Kasiviswanathan, Zekun Xu, Oluwaseyi Feyisetan, Nathanael Teissier

    Abstract: Log-loss (also known as cross-entropy loss) metric is ubiquitously used across machine learning applications to assess the performance of classification algorithms. In this paper, we investigate the problem of inferring the labels of a dataset from single (or multiple) log-loss score(s), without any other access to the dataset. Surprisingly, we show that for any finite number of label classes, it… ▽ More

    Submitted 11 June, 2021; v1 submitted 18 May, 2021; originally announced May 2021.

    Comments: Accepted at ICML 2021

  6. arXiv:2104.11838  [pdf, other

    cs.CL

    On a Utilitarian Approach to Privacy Preserving Text Generation

    Authors: Zekun Xu, Abhinav Aggarwal, Oluwaseyi Feyisetan, Nathanael Teissier

    Abstract: Differentially-private mechanisms for text generation typically add carefully calibrated noise to input words and use the nearest neighbor to the noised input as the output word. When the noise is small in magnitude, these mechanisms are susceptible to reconstruction of the original sensitive text. This is because the nearest neighbor to the noised input is likely to be the original input. To miti… ▽ More

    Submitted 23 April, 2021; originally announced April 2021.

    Comments: 10 pages, 3 figures

  7. arXiv:2012.05403  [pdf, other

    cs.LG cs.CL cs.CR

    Research Challenges in Designing Differentially Private Text Generation Mechanisms

    Authors: Oluwaseyi Feyisetan, Abhinav Aggarwal, Zekun Xu, Nathanael Teissier

    Abstract: Accurately learning from user data while ensuring quantifiable privacy guarantees provides an opportunity to build better Machine Learning (ML) models while maintaining user trust. Recent literature has demonstrated the applicability of a generalized form of Differential Privacy to provide guarantees over text queries. Such mechanisms add privacy preserving noise to vectorial representations of te… ▽ More

    Submitted 9 December, 2020; originally announced December 2020.

    Comments: 14 pages, 1 figure

  8. arXiv:2010.11947  [pdf, other

    cs.CL cs.CR cs.LG stat.ML

    A Differentially Private Text Perturbation Method Using a Regularized Mahalanobis Metric

    Authors: Zekun Xu, Abhinav Aggarwal, Oluwaseyi Feyisetan, Nathanael Teissier

    Abstract: Balancing the privacy-utility tradeoff is a crucial requirement of many practical machine learning systems that deal with sensitive customer data. A popular approach for privacy-preserving text analysis is noise injection, in which text data is first mapped into a continuous embedding space, perturbed by sampling a spherical noise from an appropriate distribution, and then projected back to the di… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

    Comments: 11 pages, 7 figures

  9. arXiv:2009.12718  [pdf, other

    cs.LG cs.CR stat.ML

    Differentially Private Adversarial Robustness Through Randomized Perturbations

    Authors: Nan Xu, Oluwaseyi Feyisetan, Abhinav Aggarwal, Zekun Xu, Nathanael Teissier

    Abstract: Deep Neural Networks, despite their great success in diverse domains, are provably sensitive to small perturbations on correctly classified examples and lead to erroneous predictions. Recently, it was proposed that this behavior can be combatted by optimizing the worst case loss function over all possible substitutions of training examples. However, this can be prone to weighing unlikely substitut… ▽ More

    Submitted 26 September, 2020; originally announced September 2020.

  10. arXiv:2009.08559  [pdf, ps, other

    cs.LG cs.CR stat.ML

    On Primes, Log-Loss Scores and (No) Privacy

    Authors: Abhinav Aggarwal, Zekun Xu, Oluwaseyi Feyisetan, Nathanael Teissier

    Abstract: Membership Inference Attacks exploit the vulnerabilities of exposing models trained on customer data to queries by an adversary. In a recently proposed implementation of an auditing tool for measuring privacy leakage from sensitive datasets, more refined aggregates like the Log-Loss scores are exposed for simulating inference attacks as well as to assess the total privacy leakage based on the adve… ▽ More

    Submitted 17 September, 2020; originally announced September 2020.

  11. arXiv:1910.08917  [pdf, other

    cs.LG cs.CL cs.CR stat.ML

    Leveraging Hierarchical Representations for Preserving Privacy and Utility in Text

    Authors: Oluwaseyi Feyisetan, Tom Diethe, Thomas Drake

    Abstract: Guaranteeing a certain level of user privacy in an arbitrary piece of text is a challenging issue. However, with this challenge comes the potential of unlocking access to vast data stores for training machine learning models and supporting data driven decisions. We address this problem through the lens of dx-privacy, a generalization of Differential Privacy to non Hamming distance metrics. In this… ▽ More

    Submitted 20 October, 2019; originally announced October 2019.

    Comments: Accepted at ICDM 2019

  12. arXiv:1910.08902  [pdf, ps, other

    cs.LG cs.CL cs.CR stat.ML

    Privacy- and Utility-Preserving Textual Analysis via Calibrated Multivariate Perturbations

    Authors: Oluwaseyi Feyisetan, Borja Balle, Thomas Drake, Tom Diethe

    Abstract: Accurately learning from user data while providing quantifiable privacy guarantees provides an opportunity to build better ML models while maintaining user trust. This paper presents a formal approach to carrying out privacy preserving text perturbation using the notion of dx-privacy designed to achieve geo-indistinguishability in location data. Our approach applies carefully calibrated noise to v… ▽ More

    Submitted 20 October, 2019; originally announced October 2019.

    Comments: Accepted at WSDM 2020

  13. arXiv:1903.11112  [pdf, other

    cs.LG cs.CL stat.ML

    Privacy-preserving Active Learning on Sensitive Data for User Intent Classification

    Authors: Oluwaseyi Feyisetan, Thomas Drake, Borja Balle, Tom Diethe

    Abstract: Active learning holds promise of significantly reducing data annotation costs while maintaining reasonable model performance. However, it requires sending data to annotators for labeling. This presents a possible privacy leak when the training set includes sensitive user data. In this paper, we describe an approach for carrying out privacy preserving active learning with quantifiable guarantees. W… ▽ More

    Submitted 26 March, 2019; originally announced March 2019.

    Comments: To appear at PAL: Privacy-Enhancing Artificial Intelligence and Language Technologies as part of the AAAI Spring Symposium Series (AAAI-SSS 2019)

  14. arXiv:1901.05670  [pdf, other

    cs.CY

    Beyond monetary incentives: experiments in paid microtask contests modelled as continuous-time markov chains

    Authors: Oluwaseyi Feyisetan, Elena Simperl

    Abstract: In this paper, we aim to gain a better understanding into how paid microtask crowdsourcing could leverage its appeal and scaling power by using contests to boost crowd performance and engagement. We introduce our microtask-based annotation platform Wordsmith, which features incentives such as points, leaderboards and badges on top of financial remuneration. Our analysis focuses on a particular typ… ▽ More

    Submitted 17 January, 2019; originally announced January 2019.