skip to main content
10.1145/3570991.3571020acmotherconferencesArticle/Chapter ViewAbstractPublication PagescodsConference Proceedingsconference-collections
extended-abstract

Techniques for Privacy-Preserving Data Aggregation in an Untrusted Distributed Environment

Published: 04 January 2023 Publication History

Abstract

Differential Privacy (DP) conceptualizes a system for sharing information about a cluster of individuals in a dataset and protecting their privacy at the same time. DP, characterized by parameters (ϵ, δ), provides mathematical assurances on privacy and hence is a recognized approach for privacy-preserving data analysis and aggregation. DP data transformation mechanisms add randomness to the data to achieve differential privacy, therefore adversely impacting the utility of the data resulting in a trade-off between utility and privacy.
While researchers have developed numerous mechanisms for differentially private data aggregation and sharing, a desirable mechanism should be fault-tolerant (ability to compute aggregate of partially shared data) and should obviate the need for interactive communication amongst data owners or the need for a trusted aggregator. Various researchers have endeavored to address some aspects of desirability. However, none have addressed all the desirable features simultaneously. Developing a distributed, differentially private aggregation mechanism called Privacy-Preserving Endpoint Aggregation (PPEA), satisfying all the desirable characteristics of DP data aggregation and sharing is a key contribution of this thesis. Additionally, by the experimentation, it is demonstrated that PPEA provides better utility at par with an existing ‘Central, Trusted’ method called PrivEx. An improvement of 11% is observed in PPEA w.r.t. PrivEx on UCI ML Smart Meter Dataset [3]. This is further verified for Synthetic Dataset and gives 4 times better utility as compared to PrivEx [5]. PPEA meets the standards of current state of art technique ‘Central, Trusted’ techniques. DPBench benchmark[1] principles have been followed to design the set of experiments while validating the technique. In addition to experimental validation, PPEA has been validated on real-time IoT devices for feasibility [8]. Issues like the implementation of Random number generators and data processing have been resolved and then to verify the correctness of the implementation DPBench trends were verified. Based on the memory, time and power consumption pattern, the implementation is feasible for IoT devices Xiaomi MiBand2 and XD58C Pulse Sensor.
Any DP data aggregation mechanism must ensure data utility for further data analysis in the trade-off between utility and privacy. This trade-off must be addressed by the appropriate choice of key DP parameters (ϵ, δ). While certain guidelines or ranges for the selection of ϵ have been proposed by many researchers, none of them have proposed a systematic method to select the value of DP parameters. We have defined a cost-based profit maximization model [7], where the profit is the difference between the gains obtained by utilizing the data and the loss incurred due to disclosure of data. For this, the utility has been modeled mathematically, thereby reducing the requirement of conducting multiple experiments to calculate the utility, which may not be feasible in many cases due to privacy concerns. The correctness of the mathematical model is verified on multiple real-world and synthetic datasets. It is shown that the model does not depend on the type of input data and is applicable for multiple data distributions. The utility calculated mathematically is within 1.6% of the actual values obtained experimentally for all synthetic datasets, while the mathematical utility for the London Smart Meter dataset[4] is within 4.25% of the actual value.
The cost of privacy loss used in the method of recommendation is based on the disclosure risk [2] which is the direct probability of disclosure, which is more practical and explainable in contrast to the constant multiple of ϵ that is used in the literature. This cost can also be explained in terms of the cost of providing insurance. These privacy recommendations are also shown to be conveniently extensible to other standard statistical aggregates such as mean and standard deviation and also as a group of multiple statistics simultaneously. This has been validated for a synthetic dataset and the recommended ϵ provides the best profit [7].
This work on the recommendation of privacy parameters facilitates the analyst to select the best DP technique from a given set of techniques. This can be used to design a generic method that can help select the best technique for the type of analysis by using weights of each type and data parameters.
The financial model and other methods of selection in the literature depend on the availability of financial information and data parameters, that may not be available in most cases. So another crucial part of this work is the fallback methods[6] for selecting the optimal value when financial information is not available. Three methods viz Linear Scalarization, Least Square Optimization, and Upper-bound recommendation have been proposed to address this. The first two methods enable precise recommendation of the value of ϵ, while the elasticity-based method provides a safe upper bound. This is done by using normalized deviation as a shared unit to allow the comparison of disclosure risk and utility for the trade-off and selection of the best value.
In future the recommendation methods can be extended to Machine Learning. For ML-based algorithms the utility can be measured by metrics such as accuracy, precision, recall, and F1-Score which are difficult to estimate mathematically. So data-driven estimates can be used instead. This allows the application of our recommendation methods to ML algorithms and is an interesting extension to this work. Application of recommendation methods to privacy preservation methods other than differential privacy can also be explored.

References

[1]
Michael Hay, Ashwin Machanavajjhala, Gerome Miklau, Yan Chen, and Dan Zhang. 2016. Principled Evaluation of Differentially Private Algorithms Using DPBench. In Proceedings of the 2016 International Conference on Management of Data (San Francisco, California, USA) (SIGMOD ’16). ACM, New York, NY, USA, 139–154. https://doi.org/10.1145/2882903.2882931
[2]
Jaewoo Lee and Chris Clifton. 2011. How Much Is Enough? Choosing ϵ for Differential Privacy. In Information Security, Xuejia Lai, Jianying Zhou, and Hui Li (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 325–340.
[3]
M. Lichman. 2013. UCI Machine Learning Repository.
[4]
UK Power Networks. 2014. Smart meters in London. https://www.kaggle.com/hsankesara/time-series-forecasting-using-seasonal-arima/data
[5]
Snehkumar Shahani, Jibi Abraham, and R Venkateswaran. 2017. Distributed Data Aggregation with Privacy Preservation at Endpoint. In International Conference on Management of Data.
[6]
Snehkumar Shahani, Jibi Abraham, and R Venkateswaran. 2021. Selection and Verification of Privacy Parameters for Local Differentially Privacy Data Aggregation. (2021), 10.
[7]
Snehkumar Shahani, R Venkateswaran, and Jibi Abraham. 2021. Cost-based recommendation of parameters for local differentially private data aggregation. Computers & Security 102 (2021), 102144. https://doi.org/10.1016/j.cose.2020.102144
[8]
Niramay Vaidya, Srishti Shelke, Snehkumar Shahani, and Jibi Abraham. 2021. Validation and Feasibility of Differentially Private Local Aggregation of Real-Time Data Streams from Resource-Constrained Healthcare IoT Edge Devices. (2021).

Cited By

View all

Index Terms

  1. Techniques for Privacy-Preserving Data Aggregation in an Untrusted Distributed Environment

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    CODS-COMAD '23: Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)
    January 2023
    357 pages
    ISBN:9781450397971
    DOI:10.1145/3570991
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 January 2023

    Check for updates

    Qualifiers

    • Extended-abstract
    • Research
    • Refereed limited

    Conference

    CODS-COMAD 2023

    Acceptance Rates

    Overall Acceptance Rate 197 of 680 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media