extended-abstract

Techniques for Privacy-Preserving Data Aggregation in an Untrusted Distributed Environment

Authors:

CODS-COMAD '23: Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)

Pages 286 - 287

https://doi.org/10.1145/3570991.3571020

Published: 04 January 2023 Publication History

Get Access

Abstract

Differential Privacy (DP) conceptualizes a system for sharing information about a cluster of individuals in a dataset and protecting their privacy at the same time. DP, characterized by parameters (ϵ, δ), provides mathematical assurances on privacy and hence is a recognized approach for privacy-preserving data analysis and aggregation. DP data transformation mechanisms add randomness to the data to achieve differential privacy, therefore adversely impacting the utility of the data resulting in a trade-off between utility and privacy.

While researchers have developed numerous mechanisms for differentially private data aggregation and sharing, a desirable mechanism should be fault-tolerant (ability to compute aggregate of partially shared data) and should obviate the need for interactive communication amongst data owners or the need for a trusted aggregator. Various researchers have endeavored to address some aspects of desirability. However, none have addressed all the desirable features simultaneously. Developing a distributed, differentially private aggregation mechanism called Privacy-Preserving Endpoint Aggregation (PPEA), satisfying all the desirable characteristics of DP data aggregation and sharing is a key contribution of this thesis. Additionally, by the experimentation, it is demonstrated that PPEA provides better utility at par with an existing ‘Central, Trusted’ method called PrivEx. An improvement of 11% is observed in PPEA w.r.t. PrivEx on UCI ML Smart Meter Dataset [3]. This is further verified for Synthetic Dataset and gives 4 times better utility as compared to PrivEx [5]. PPEA meets the standards of current state of art technique ‘Central, Trusted’ techniques. DPBench benchmark[1] principles have been followed to design the set of experiments while validating the technique. In addition to experimental validation, PPEA has been validated on real-time IoT devices for feasibility [8]. Issues like the implementation of Random number generators and data processing have been resolved and then to verify the correctness of the implementation DPBench trends were verified. Based on the memory, time and power consumption pattern, the implementation is feasible for IoT devices Xiaomi MiBand2 and XD58C Pulse Sensor.

Any DP data aggregation mechanism must ensure data utility for further data analysis in the trade-off between utility and privacy. This trade-off must be addressed by the appropriate choice of key DP parameters (ϵ, δ). While certain guidelines or ranges for the selection of ϵ have been proposed by many researchers, none of them have proposed a systematic method to select the value of DP parameters. We have defined a cost-based profit maximization model [7], where the profit is the difference between the gains obtained by utilizing the data and the loss incurred due to disclosure of data. For this, the utility has been modeled mathematically, thereby reducing the requirement of conducting multiple experiments to calculate the utility, which may not be feasible in many cases due to privacy concerns. The correctness of the mathematical model is verified on multiple real-world and synthetic datasets. It is shown that the model does not depend on the type of input data and is applicable for multiple data distributions. The utility calculated mathematically is within 1.6% of the actual values obtained experimentally for all synthetic datasets, while the mathematical utility for the London Smart Meter dataset[4] is within 4.25% of the actual value.

The cost of privacy loss used in the method of recommendation is based on the disclosure risk [2] which is the direct probability of disclosure, which is more practical and explainable in contrast to the constant multiple of ϵ that is used in the literature. This cost can also be explained in terms of the cost of providing insurance. These privacy recommendations are also shown to be conveniently extensible to other standard statistical aggregates such as mean and standard deviation and also as a group of multiple statistics simultaneously. This has been validated for a synthetic dataset and the recommended ϵ provides the best profit [7].

This work on the recommendation of privacy parameters facilitates the analyst to select the best DP technique from a given set of techniques. This can be used to design a generic method that can help select the best technique for the type of analysis by using weights of each type and data parameters.

The financial model and other methods of selection in the literature depend on the availability of financial information and data parameters, that may not be available in most cases. So another crucial part of this work is the fallback methods[6] for selecting the optimal value when financial information is not available. Three methods viz Linear Scalarization, Least Square Optimization, and Upper-bound recommendation have been proposed to address this. The first two methods enable precise recommendation of the value of ϵ, while the elasticity-based method provides a safe upper bound. This is done by using normalized deviation as a shared unit to allow the comparison of disclosure risk and utility for the trade-off and selection of the best value.

In future the recommendation methods can be extended to Machine Learning. For ML-based algorithms the utility can be measured by metrics such as accuracy, precision, recall, and F1-Score which are difficult to estimate mathematically. So data-driven estimates can be used instead. This allows the application of our recommendation methods to ML algorithms and is an interesting extension to this work. Application of recommendation methods to privacy preservation methods other than differential privacy can also be explored.

References

[1]

Michael Hay, Ashwin Machanavajjhala, Gerome Miklau, Yan Chen, and Dan Zhang. 2016. Principled Evaluation of Differentially Private Algorithms Using DPBench. In Proceedings of the 2016 International Conference on Management of Data (San Francisco, California, USA) (SIGMOD ’16). ACM, New York, NY, USA, 139–154. https://doi.org/10.1145/2882903.2882931

Digital Library

Google Scholar

[2]

Jaewoo Lee and Chris Clifton. 2011. How Much Is Enough? Choosing ϵ for Differential Privacy. In Information Security, Xuejia Lai, Jianying Zhou, and Hui Li (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 325–340.

Google Scholar

[3]

M. Lichman. 2013. UCI Machine Learning Repository.

Google Scholar

[4]

UK Power Networks. 2014. Smart meters in London. https://www.kaggle.com/hsankesara/time-series-forecasting-using-seasonal-arima/data

Google Scholar

[5]

Snehkumar Shahani, Jibi Abraham, and R Venkateswaran. 2017. Distributed Data Aggregation with Privacy Preservation at Endpoint. In International Conference on Management of Data.

Google Scholar

[6]

Snehkumar Shahani, Jibi Abraham, and R Venkateswaran. 2021. Selection and Verification of Privacy Parameters for Local Differentially Privacy Data Aggregation. (2021), 10.

Google Scholar

[7]

Snehkumar Shahani, R Venkateswaran, and Jibi Abraham. 2021. Cost-based recommendation of parameters for local differentially private data aggregation. Computers & Security 102 (2021), 102144. https://doi.org/10.1016/j.cose.2020.102144

Digital Library

Google Scholar

[8]

Niramay Vaidya, Srishti Shelke, Snehkumar Shahani, and Jibi Abraham. 2021. Validation and Feasibility of Differentially Private Local Aggregation of Real-Time Data Streams from Resource-Constrained Healthcare IoT Edge Devices. (2021).

Google Scholar

Cited By

View all

Majeed A(2023)Attribute-Centric and Synthetic Data Based Privacy Preserving Methods: A Systematic ReviewJournal of Cybersecurity and Privacy10.3390/jcp30300303:3(638-661)Online publication date: 11-Sep-2023
https://doi.org/10.3390/jcp3030030

Index Terms

Techniques for Privacy-Preserving Data Aggregation in an Untrusted Distributed Environment
1. Security and privacy
  1. Database and storage security
    1. Data anonymization and sanitization

Recommendations

Privacy Preserving Data Mining Techniques: Current Scenario and Future Prospects
ICCCT '12: Proceedings of the 2012 Third International Conference on Computer and Communication Technology

Privacy preserving has originated as an important concern with reference to the success of the data mining. Privacy preserving data mining (PPDM) deals with protecting the privacy of individual data or sensitive knowledge without sacrificing the utility ...
Multi-level privacy preserving data publishing

Policedata is an important source of social media data and can be regarded as a technical assistance to increase government accountability and transparency. Notably, it contains large amounts of personal private information that should be preserved ...
Privacy-Preserving Data Publishing: An Overview

Comments

Information & Contributors

Information

Published In

CODS-COMAD '23: Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)

January 2023

357 pages

ISBN:9781450397971

DOI:10.1145/3570991

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 January 2023

Check for updates

Qualifiers

Extended-abstract
Research
Refereed limited

Conference

CODS-COMAD 2023

CODS-COMAD 2023: 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)

January 4 - 7, 2023

Mumbai, India

Acceptance Rates

Overall Acceptance Rate 197 of 680 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
70
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)1

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Majeed A(2023)Attribute-Centric and Synthetic Data Based Privacy Preserving Methods: A Systematic ReviewJournal of Cybersecurity and Privacy10.3390/jcp30300303:3(638-661)Online publication date: 11-Sep-2023
https://doi.org/10.3390/jcp3030030

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Abstract

References

Cited By

Index Terms

Recommendations

Privacy Preserving Data Mining Techniques: Current Scenario and Future Prospects

Multi-level privacy preserving data publishing

Privacy-Preserving Data Publishing: An Overview

Comments

Information

Published In

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations