skip to main content
10.1145/3664476.3664489acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaresConference Proceedingsconference-collections
research-article

Compromising anonymity in identity-reserved k-anonymous datasets through aggregate knowledge

Published: 30 July 2024 Publication History

Abstract

Data processors increasingly rely on external data sources to improve strategic or operational decision taking. Data owners can facilitate this by releasing datasets directly to data processors or doing so indirectly via data spaces. As data processors often have different needs and due to the sensitivity of the data, multiple anonymized versions of an original dataset are often released. However, doing so can introduce severe privacy risks.
This paper demonstrates the emerging privacy risks when curious – potentially colluding – service providers obtain an identity-reserved and aggregated k-anonymous version of the same dataset. We build a mathematical model of the attack and demonstrate its applicability in the presence of attackers with different goals and computing power. The model is applied to a real world scenario and countermeasures are presented to mitigate the attack.

References

[1]
Adeel Anjum and Guillaume Raschia. 2013. Anonymizing sequential releases under arbitrary updatesv. In Proceedings of the Joint EDBT/ICDT 2013 Workshops. 145–154.
[2]
Antonio Castro, Ignacio Ferrero, and Benjamim Vieira. 2023. How data can help tech companies thrive amid economic uncertainty. https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/how-data-can-help-tech-companies-thrive-amid-economic-uncertainty
[3]
CDC. 2019. International Classification of Diseases,Tenth Revision (ICD-10). https://www.cdc.gov/nchs/icd/icd10.htm
[4]
Giuseppe D’Acquisto, Josep Domingo-Ferrer, Panayiotis Kikiras, Vicenç Torra, Yves-Alexandre de Montjoye, and Athena Bourka. 2015. Privacy by design in big data: an overview of privacy enhancing technologies in the era of big data analytics. arXiv preprint arXiv:1512.06000 (2015).
[5]
Kevin De Boeck, Jenno Verdonck, Michiel Willocx, Jorn Lapon, and Vincent Naessens. 2022. Reviewing review platforms: a privacy perspective. In Proceedings of the 17th International Conference on Availability, Reliability and Security. 1–10.
[6]
Sabrina De Capitani di Vimercati, Sara Foresti, Giovanni Livraga, and Pierangela Samarati. 2023. k-Anonymity: From Theory to Applications.Trans. Data Priv. 16, 1 (2023), 25–49.
[7]
Marie Douriez, Harish Doraiswamy, Juliana Freire, and Cláudio T Silva. 2016. Anonymizing nyc taxi data: Does it matter?. In 2016 IEEE international conference on data science and advanced analytics (DSAA). IEEE, 140–148.
[8]
EUR-Lex. 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016. https://eur-lex.europa.eu/eli/reg/2016/679/oj. [Online; accessed 20-Oct-2022].
[9]
Yaqing Fang, Yiting Nie, and Marshare Penny. 2020. Transmission dynamics of the COVID-19 outbreak and effectiveness of government interventions: A data-driven analysis. Journal of medical virology 92, 6 (2020), 645–659.
[10]
Ivan P Fellegi and Alan B Sunter. 1969. A theory for record linkage. J. Amer. Statist. Assoc. 64, 328 (1969), 1183–1210.
[11]
Benjamin CM Fung, Ke Wang, Rui Chen, and Philip S Yu. 2010. Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys (Csur) 42, 4 (2010), 1–53.
[12]
Aris Gkoulalas-Divanis, Grigorios Loukides, and Jimeng Sun. 2014. Publishing data from electronic health records while preserving privacy: A survey of algorithms. Journal of biomedical informatics 50 (2014), 4–19.
[13]
Qiyuan Gong, Junzhou Luo, Ming Yang, Weiwei Ni, and Xiao-Bai Li. 2017. Anonymizing 1: M microdata with high utility. Knowledge-based systems 115 (2017), 15–26.
[14]
Gurobi Optimization, LLC. 2023. Gurobi Optimizer Reference Manual. https://www.gurobi.com
[15]
Carolin EM Jakob, Florian Kohlmayer, Thierry Meurers, Jörg Janne Vehreschild, and Fabian Prasser. 2020. Design and evaluation of a data anonymization pipeline to promote Open Science on COVID-19. Scientific data 7, 1 (2020), 1–10.
[16]
Hasan B Kartal and Xiao-Bai Li. 2020. Protecting privacy when sharing and releasing data with multiple records per person. Journal of the Association for Information Systems 21, 6 (2020), 1461.
[17]
Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. 2006. t-closeness: Privacy beyond k-anonymity and l-diversity. In 2007 IEEE 23rd international conference on data engineering. IEEE, 106–115.
[18]
Anqi Lin, Hao Wu, Guanghua Liang, Abraham Cardenas-Tristan, Xia Wu, Chong Zhao, and Dan Li. 2020. A big data-driven dynamic estimation model of relief supplies demand in urban flood disaster. International Journal of Disaster Risk Reduction 49 (2020), 101682.
[19]
Grigorios Loukides, Joshua C Denny, and Bradley Malin. 2010. The disclosure of diagnosis codes can breach research participants’ privacy. Journal of the American Medical Informatics Association 17, 3 (2010), 322–327.
[20]
Ashwin Machanavajjhala, Daniel Kifer, Johannes Gehrke, and Muthuramakrishnan Venkitasubramaniam. 2007. l-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data (TKDD) 1, 1 (2007), 3–es.
[21]
Rudolf Mayer, Alicja Karlowicz, and Markus Hittmeir. 2023. K-anonymity on metagenomic features in microbiome databases. In Proceedings of the 18th International Conference on Availability, Reliability and Security. 1–11.
[22]
Frank McSherry and Ilya Mironov. 2009. Differentially private recommender systems: Building privacy into the netflix prize contenders. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 627–636.
[23]
Fabian Prasser, Johanna Eicher, Helmut Spengler, Raffael Bild, and Klaus A Kuhn. 2020. Flexible data anonymization using ARX—Current status and challenges ahead. Software: Practice and Experience 50, 7 (2020), 1277–1304.
[24]
Mauricio Sadinle, Rob Hall, and Stephen E Fienberg. 2011. Approaches to multiple record linkage. In Proceedings of International Statistical Institute, Vol. 260. 1–20.
[25]
Pierangela Samarati. 2001. Protecting respondents identities in microdata release. IEEE transactions on Knowledge and Data Engineering 13, 6 (2001), 1010–1027.
[26]
Erez Shmueli and Tamir Tassa. 2015. Privacy by diversity in sequential releases of databases. Information Sciences 298 (2015), 344–372.
[27]
Erez Shmueli, Tamir Tassa, Raz Wasserstein, Bracha Shapira, and Lior Rokach. 2012. Limiting disclosure of sensitive data in sequential releases of databases. Information Sciences 191 (2012), 98–127.
[28]
Latanya Sweeney, Michael von Loewenfeldt, and Melissa Perry. 2018. Saying it’s Anonymous Doesn’t Make It So: Re-identifications of “anonymized” law school data. Technology Science (2018).
[29]
Youdong Tao, Yunhai Tong, Shaohua Tan, Shiwei Tang, and Dongqing Yang. 2008. Protecting the publishing identity in multiple tuples. In Data and Applications Security XXII: 22nd Annual IFIP WG 11.3 Working Conference on Data and Applications Security London, UK, July 13-16, 2008 Proceedings 22. Springer, 205–218.
[30]
Jenno Verdonck, Kevin De Boeck, Michiel Willocx, Jorn Lapon, and Vincent Naessens. 2023. A hybrid anonymization pipeline to improve the privacy-utility balance in sensitive datasets for ML purposes. In Proceedings of the 18th International Conference on Availability, Reliability and Security. 1–11.
[31]
Ke Wang and Benjamin CM Fung. 2006. Anonymizing sequential releases. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. 414–423.
[32]
Xiaokui Xiao and Yufei Tao. 2007. M-invariance: towards privacy preserving re-publication of dynamic datasets. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data. 689–700.
[33]
Duo Zhang, Benjamin IP Rubinstein, and Jim Gemmell. 2015. Principled graph matching algorithms for integrating multiple data sources. IEEE Transactions on Knowledge and Data Engineering 27, 10 (2015), 2784–2796.
[34]
Hui Zhu, Hong-Bin Liang, Lian Zhao, Dai-Yuan Peng, and Ling Xiong. 2018. τ -Safe (l, k)-Diversity Privacy Model for Sequential Publication With High Utility. IEEE Access 7 (2018), 687–701.
[35]
Athanasios Zigomitros, Fran Casino, Agusti Solanas, and Constantinos Patsakis. 2020. A survey on privacy properties for data publishing of relational data. IEEE Access 8 (2020), 51071–51099.

Index Terms

  1. Compromising anonymity in identity-reserved k-anonymous datasets through aggregate knowledge

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ARES '24: Proceedings of the 19th International Conference on Availability, Reliability and Security
    July 2024
    2032 pages
    ISBN:9798400717185
    DOI:10.1145/3664476
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 July 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Privacy
    2. anonymity
    3. linkage attack
    4. re-identification

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ARES 2024

    Acceptance Rates

    Overall Acceptance Rate 228 of 451 submissions, 51%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 18
      Total Downloads
    • Downloads (Last 12 months)18
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 24 Dec 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media