skip to main content
10.1145/1900546.1900567acmotherconferencesArticle/Chapter ViewAbstractPublication PagesnspwConference Proceedingsconference-collections
research-article

Relationships and data sanitization: a study in scarlet

Published: 21 September 2010 Publication History

Abstract

Research in data sanitization (including anonymization) emphasizes ways to prevent an adversary from desanitizing data. Most work focuses on using mathematical mappings to sanitize data. A few papers examine incorporation of privacy requirements, either in the guise of templates or prioritization. Essentially these approaches reduce the information that can be gleaned from a data set. In contrast, this paper considers both the need to ``desanitize'' and the need to support privacy. We consider conflicts between privacy requirements and the needs of analysts examining the redacted data. Our goal is to enable an informed decision about the effects of redacting, and failing to redact data. We begin with relationships among the data being examined, including relationships with a known data set and other, additional, external data. By capturing these relationships, desanitization techniques that exploit them can be identified, and the information that must be concealed in order to thwart them can be determined. Knowing that, a realistic assessment of whether the information and relationships are already widely known or available will enable the sanitizers to assess whether irreversible sanitization is possible, and if so, what to conceal to prevent desanitization.

References

[1]
IDEF5 method report. Technical report, Knowledge Based Systems, Inc., College Station, TX 77840, 1994.
[2]
Presidential decision directive/NSC-63: Critical infrastructure protection, May 1998.
[3]
Final NIH statement on sharing research data, Feb. 2003.
[4]
Homeland security presidential directive 7: Critical infrastructure identification, prioritization, and protection, Dec. 2003.
[5]
Protecting personal health information in research: Understanding the HIPAA privacy rule. Publication 03--5388, National Institutes of Health, Bethesda, MD, 2003.
[6]
DHS information sharing and access agreements. Publication 2009-01, Department of Homeland Security, May 2009.
[7]
A. Acquisti and R. Gross. Predicting social security numbers from public data. Proceedings of the National Academy of Sciences, 106(27):10975--10980, July 2009.
[8]
D. Agrawal and C. C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the 20th ACM SIGMOD-SIGAC-SIGART Symposium on Principlkes of Database Systems, pages 247--255, 2001.
[9]
E. Astesiano, M. Bidoit, H. Kirchner, B. Krieg-Bruckner, P. Mosses, D. Sannella, and A. Tarlecki. CASL: the common algebraic specification language. Theoretical Computer Science, 286(2):153--196, 2002.
[10]
M. Atzori, F. Bonchi, F. Giannotti, and D. Pedreschi. Blocking anonymity threats raised by frequent itemset mining. In Proceedings of the Fifth IEEE International Conference on Data Mining, Nov. 2005.
[11]
M. Barbaro and T. Zeller. A face is exposed for AOL searcher no. 4417749. New York Times, Aug. 9, 2006.
[12]
R. J. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In Proceedings of the 21st International Conference on Data Engineering (ICDE'05), pages 217--228, Washington, DC, USA, 2005. IEEE Computer Society.
[13]
S. Bhansali and B. N. Grosof. Extending the SweetDeal approach for e-procurement using SweetRules and RuleML. In Proceedings of the 2005 Conference on Rules and Rule Markup Languages for the Semantic Web, pages 113--129, 2005.
[14]
B. Bhumiratana. Privacy Aware Micro Data Sanitization. PhD thesis, Dept. of Computer Science, University of California at Davis, Davis, CA 95616--8562, 2009.
[15]
B. Bhumiratana and M. Bishop. Privacy aware data sharing: Balancing the usability and privacy of datasets. In Proceedings of the 2nd International Conference on Pervsive Technologies Related to Assistive Environments (PETRA 2009), pages 1--8, New York, NY, USA, June 2009. ACM.
[16]
M. Bishop, B. Bhumiratana, R. Crawford, and K. Levitt. How to sanitize data? In Proceedings of the Thirteenth IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE 2004), pages 217--222, Los Alamitos, CA, USA, June 2004. IEEE.
[17]
M. Bishop, R. Crawford, B. Bhumiratana, L. Clark, and K. Levitt. Some problems in sanitizing network data. In Proceedings of the Fifteenth IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE 2004), pages 307--312, June 2006.
[18]
A. Blake and R. Nelson. Scalable architecture for prefix preserving anonymization of IP addresses. In Proceedings of the 8th international Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation, July 2008.
[19]
H. Boley, M. Kifer, P.-L. Patranjan, and A. Polleres. Rule interchange on the web. In Proceedings of the Third International Summer School on the Reasoning Web, pages 269--309, Sep. 2007.
[20]
H. Boley, S. Tabet, and G. Wagner. Design rationale of RuleML: A markup language for semantic web rules. In Proceedings of the Semantic Web Working Symposium, 2001.
[21]
R. Boyle. The unsuccessful experiment. In Certain Physiological Essays. Henry Herringman, 1661.
[22]
D. Brickley and R. Guha. Resource description framework (RDF) schema specification 1.0. Technical report, W3C, Oct. 2000.
[23]
M. Burkhart, D. Brauckho, and M. May. On the utility of anonymized ow traces for anomaly detection. In Proceedings of the 19th ITC Specialist Seminar on Network Usage and Traffic, Oct. 2008.
[24]
M. Burkhart, D. Schatzmann, B. Trammell, E. Boschi, and B. Plattner. The role of network trace anonymization under attack. ACM SIGCOMM Computer Communication Review, 40(1):5--11, January 2010.
[25]
J. Cao, B. Carminati, E. Ferrari, and K. L. Tan. CASTLE: A delay-constrained scheme for ks-anonymizing data streams. In Proceedings of the IEEE 24th International Conference on Data Engineering ICDE 2008, pages 1376--1378, 2008.
[26]
J. J. Carroll, I. Dickinson, C. Dollin, D. Reynolds, A. Seaborne, and K. Wilkinson. Jena: Implementing the semantic web recommendations. In Proceedings of the 13th International World Wide Web Conference, pages 74--83, 2004.
[27]
S. E. Coull, C. V. Wright, A. D. Keromytis, F. Monrose, and M. K. Reiter. Taming the devil: Techniques for evaluating anonymized network data. In Proceedings of the 15th Network and Distributed System Security Symposium, 2008.
[28]
S. E. Coull, C. V. Wright, F. Monrose, M. P. Collins, and M. K. Reiter. Playing devil's advocate: Inferring sensitive information from anonymized network traces. In Proceedings of the 14th Network and Distributed System Security Symposium, Feb. 2007.
[29]
R. Crawford, M. Bishop, B. Bhumiratana, L. Clark, and K. Levitt. Sanitization models and their limitations. In Proceedings of the 2006 Workshop on New Security Paradigms (NSPW 2006), pages 41--56, New York, NY, USA, Sep. 2006. ACM.
[30]
M. Dean, G. Schreiber, S. Bechhofer, F. Van Harmelen, J. Hendler, I. Horrocks, D. McGuinness, P. Patel-Schneider, and L. Stein. OWL web ontology language reference. Technical report, W3C, Feb. 2004.
[31]
H. Delugach. Common logic (CL): A framework for a family of logic-based languages. Standard ISO/IEC 24707:2007, International Organization for Standardization, 2007.
[32]
J. Dolby, A. Fokoue, A. Kalyanpur, E. Schonberg, and K. Srinivas. Scalable highly expressive reasoner (SHER). Web Semantics, 7(4):357--361, Dec. 2009.
[33]
J. Fan, J. Xu, M. Ammar, and S. Moon. Prefix-preserving IP address anonymization: Measurement-based security evaluation and a new cryptography-based scheme. Computer Networks, 46(2):253--272, 2004.
[34]
S. R. Ganta and R. Acharya. Adaptive data anonymization against information fusion based privacy attacks on enterprise data. In Proceedings of the 2008 ACM Symposium on Applied Computing, pages 1075--1076, New York, NY, USA, 2008. ACM.
[35]
S. R. Ganta and R. Acharya. On breaching enterprise data privacy through adversarial information fusion. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering Workshop, pages 246--249, Washington, DC, USA, 2008. IEEE Computer Society.
[36]
J. Gardner and L. Xiong. An integrated framework for de-identifying unstrutured medical data. Data and Knowledge Engineering, 68(12):1441--1451, Dec. 2009.
[37]
M. R. Genesereth and R. E. Fikes. Knowledge interchange format version 3.0 reference manual. Technical reportlogic-92--1, Computer Science Department, Stanford University, Stanford, CA, 1992.
[38]
P. Golle. Revisiting the uniqueness of simple demographics in the us population. In Proceedings of the Fifth ACM Workshop on Privacy in Electronic Society, pages 77--80, New York, NY, USA, 2006. ACM.
[39]
B. C. Grau, I. Horrocks, B. Motik, B. Parsia, P. Patel-Schneider, and U. Sattler. OWL 2: The next step for OWL. Web Semantics: Science, Services and Agents on the World Wide Web, 6(4):309--322, 2008.
[40]
B. N. Grosof, I. Horrocks, R. Volz, and S. Decker. Description logic programs: Combining logic programs with description logic. In Proceedings of the 12th International Conference on the World Wide Web, pages 48--57. ACM, 2003.
[41]
X. He, J. Vaidya, B. Shafiq, N. Adam, and V. Atluri. Preserving privacy in social networks: A structure-aware approach. In Proceedings of the IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies WI-IAT '09, volume 1, pages 647--654, Oct. 2009.
[42]
I. Horrock, P. F. Patel-Schneider, H. Boley, S. Tabet, B. Grosof, and M. Dean. SWRL: A semantic web rule language combining OWL and RuleML. Technical report, W3C, May 2004.
[43]
I. Horrocks. DAML+OIL: a description logic for the semantic web. Bulletin of the Technical Committee on, 51:4, 2002.
[44]
M. Z. Islam and L. Brankovic. A framework for privacy preserving classification in data mining. In Proceedings of the Second Workshop on Australasian Information Security, Data Mining and Web Intelligence, and Software Internationalisation, pages 163--168, 2004.
[45]
M. Jang and J.-C. Sohn. Bossam: An extended rule engine for OWL inferencing. In Proceedings of the 2004 Conference on Rules and Rule Markup Languages for the Semantic Web, pages 128--138, 2004.
[46]
J. Jin and X. Wang. On the e ectiveness of low latency anonymous network in the presence of timing attack. In Proceedings of the 2009 IEEE/IFIP International Conference on Dependable Systems & Networks, pages 429--438, 2009.
[47]
P. Kalnis, G. Ghinita, K. Mouratidis, and D. Papadias. Preventing location-based identity inference in anonymous spatial queries. IEEE Transactions on Knowledge and Data Engineering, 19(12):1719--1733, 2007.
[48]
E. E. Kenneally and K. Cla y. Dialing privacy and utility: a proposed data-sharing framework to advance internet research. IEEE Security and Privacy, 8(2), Mar. 2010.
[49]
A. Khoshgozaran, H. Shirani-Mehr, and C. Shahabi. SPIRAL: A scalable private information retrieval approach to location privacy. In Proc. Ninth International Conference on Mobile Data Management Workshops MDMW 2008, pages 55--62, 2008.
[50]
J. J. Kim and W. E. Winkler. Masking microdata les. Technical report, Bureau of the Census, 1997.
[51]
J. J. Kim and W. E. Winkler. Multiplicative noise for masking continuous data. In Proceedings of the Annual Meeting of the American Statistical Association, 2001.
[52]
D. Koukis, S. Antonatos, D. Antoniades, E. P. Markatos, and P. Trimintzios. A generic anonymization framework for network traffic. In Proceedings of the 2006 IEEE International Conference on Communications, volume 5, pages 2302--2309, June 2006.
[53]
T. S. Kuhn. The Structure of Scientific Revolutions. University of Chicago Press, 1962.
[54]
N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In Proceedings of the IEEE 23rd International Conference on Data Engineering, pages 106--115, June 2007.
[55]
T. Li and N. Li. Injector: Mining background knowledge for data anonymization. In Proceedings of the IEEE 2008 International Conference on Data Engineering, pages 446--455. IEEE Computer Society, Apr. 2008.
[56]
K. Liu and E. Terzi. Towards identity anonymization on graphs. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pages 93--106, New York, NY, USA, 2008. ACM.
[57]
D. Lücke and T. Mossakowski. Heterogeneous model nding with hets. In Preliminary Proceedings of the 19th International Workshop on Algebraix Development Techniques, pages 58--61, June 2008.
[58]
G. Luk--Acsy and P. Szeredi. Efficient description logic reasoning in Prolog: The DLog system. Theory and Practice of Logic Programming, 9(3):343--414, 2009.
[59]
F. G. M. Atzori, F. Bonchi and D. Pedreschi. k-anonymous patterns. In Proceedings of the Ninth European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'05), volume 3721 of Lecture Notes in Computer Science, Springer, Porto, Portugal, October 2005.
[60]
A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In Proceedings of the 22nd International Conference on Data Engineering, Apr. 2006.
[61]
M. Moriconi and R. A. Riemenschneider. Introduction to SADL 1.0: A language for specifying software architecture hierarchies. Technical Report SRI-CSL-97-01, SRI International, Mar. 1997.
[62]
P. D. Moses. CASL Reference Manual, volume 2960 of Lecture Notes in Computer Science. Springer, 2004.
[63]
B. Motik, P. Patel-Schneider, B. Parsia, C. Bock, A. Fokoue, P. Haase, R. Hoekstra, I. Horrocks, A. Ruttenberg, U. Sattler, et al. OWL 2 web ontology language structural specification and functional-style syntax. Technical report, W3C, Oct. 2009.
[64]
B. Motik, U. Sattler, and R. Studer. Query answering for OWL-DL with rules. Web Semantics: Science, Services and Agents on the World Wide Web, 3(1):41--60, 2005. Rules Systems.
[65]
K. Muralidhar, R. Parsa, and R. Sarathy. A general additive data perturbation method for database security. Management Science, 45(10):1399--1415, Oct. 1999.
[66]
K. Muralidhar, R. Parsa, and R. Sarathy. Security of random data perturbation methods. ACM Transactions on Database Systems, 24(4):487--493, Dec. 1999.
[67]
A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In Proceedings of the 2008 IEEE Symposium on Security and Privacy, pages 111--125, May 2008.
[68]
A. Narayanan and V. Shmatikov. De-anonymizing social networks. Proceedings of the 2009 IEEE Symposium on Security and Privacy, pages 173--187, 2009.
[69]
National Center for Health Statistics and the Centers for Medicare and Medicaid Services, Hyattsville, MD 20782. International Classification of Diseases, Ninth Revision, Clinical Modification, Oct. 2009.
[70]
M. E. Nergiz and C. Clifton. δ-presence without complete world knowledge. IEEE Transactions on Knowledge and Data Engineering, 22, 2010.
[71]
M. E. Nergiz, C. Clifton, and A. E. Nergiz. Multirelational k-anonymity. IEEE Transactions on Knowledge and Data Engineering, 21(8):1104--1117, Aug. 2009.
[72]
A. Panchenko and L. Pimenidis. Cross-layer attack on anonymizing networks. In Proceedings of the 2008 International Conference on Telecommunications, pages 1--7, June 2008.
[73]
R. Pang, M. Allman, V. Paxson, and J. Lee. The devil and packet trace anonymization. ACM SIGCOMM Computer Communication Review, 36(1):29--38, January 2006.
[74]
P. Porras and V. Shmatikov. Large-scale collection and sanitization of network security data: Risks and challenges (position paper). In Proceedings of the 2006 Workshop on New Security Paradigms, pages 57--64, Sep. 2006.
[75]
Racer Systems GmbH & Co. KG, Hamburg, Germany. RacerPro User's Guide Version 1.9, Dec. 2005.
[76]
S. L. Reed and D. B. Lenat. Mapping ontologies into Cyc. In Proceedings of the 2002 AAAI Conference Workshop on Ontologies for the Semantic Web, pages 1--6, July 2002.
[77]
M. K. Reiter and A. D. Rubin. Crowds: Anonymity for web transactions. ACM Transactions on Information and System Security, 1(1):66--92, Nov. 1998.
[78]
M. Rennhard and B. Plattner. Introducing MorphMix: Peer-to-peer based anonymous internet usage with collusion detection. In Proceedings of the 2002 ACM Workshop on Privacy in the Electronic Society, pages 91--102, 2002.
[79]
J. Saltzer and M. Schroeder. The protection of information in computer systems. Proceedings of the IEEE, 63(9):1278--1308, Sep. 1975.
[80]
N. Singer. When 2+2 equals a privacy question. New York Times, Oct. 18, 2009.
[81]
E. Sirin, B. Parsia, B. Grau, A. Kalyanpur, and Y. Katz. Pellet: A practical OWL-DL reasoner. Web Semantics: Science, Services and Agents on the World Wide Webervices and agents on the World Wide Web, 5(2):51--53, 2007.
[82]
P. Spyns, R. Meersman, and M. Jarrar. Data modelling versus ontology engineering. ACM SIGMOD Record, 31(4):12--17, Dec. 2002.
[83]
X. Sun, H. Wang, and J. Li. Injecting purpose and trust into data anonymisation. In Proceeding of the 18th ACM Conference on Information and Knowledge Management, pages 1541--1544, New York, NY, USA, 2009. ACM.
[84]
L. Sweeney. Uniqueness of simple demographics in the U. S. population. Technical Report LIDAP-WP4, Laboratory for International Data Privacy, Carnegie Mellon University, Pittsburgh, PA, USA, 2000.
[85]
L. Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557--570, Oct. 2002.
[86]
G. Szarvas, R. Farkas, and R. Busa-Fekete. State-of-the-art anonymization of medical records using an iterative machine learning framework. Journal of the American Medical Informatics Association, 14(5):574--580, Sep. 2007.
[87]
D. Tsarkov and I. Horrocks. FaCT++ description logic reasoner: System description. Automated Reasoning, pages 292--297, 2006.
[88]
A. van Renssen. Gellish: An information representation language, knowledge base, and ontology. In Proceedings of the 3rd IEEE Conference on Standardization and Innovation In Information Technology, pages 215--228, Oct. 2003.
[89]
K. Wang, B. C. M. Fung, and P. S. Yu. Template-based privacy preservation in classification problems. In Proceedings of the 5th IEEE International Conference on Data Mining, pages 466--473, Houston, TX, November 2005.
[90]
K. Wang, B. C. M. Fung, and P. S. Yu. Handicapping attacker's confidence: An alternative to k-anonymization. Knowledge and Information Systems, 11(3):345--368, Apr. 2006.
[91]
K. Wang, P. S. Yu, and S. Chakraborty. Bottom-up generalization: a data mining solution to privacy protection. In Proceedings of the Fourth IEEE International Conference on Data Mining, pages 249--256, Nov. 2004.
[92]
S. Warren and L. D. Brandeis. The right to privacy. Harvard Law Review, 4(5):193--220, 1890.
[93]
R. Wong, J. Li, A. Fu, and K. Wang. (, -anonymity: An enhanced k-anonymity model for privacy-preserving data publishing. In Proceedings of the 12th ACM SIGKDD Iinternational Conference on Knowledge Discovery and Data Mining, pages 754--759, 2006.
[94]
M. Wright, M. Adler, B. N. Levine, and C. Shields. The predecessor attack: An analysis of a threat to anonymous communication systems. ACM Transactions on Information Systems Security, 7(4):489--522, Nov. 2004.
[95]
L. Xiao, Z. Xu, and X. Zhang. Low-cost and reliable mutual anonymity protocols in peer-to-peer networks. IEEE Transactions on Parallel and Distributed Systems, 14(9):829--840, Sep. 2003.
[96]
X. Xiao and Y. Tao. Personalized privacy preservation. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pages 229--240, 2006.

Cited By

View all

Index Terms

  1. Relationships and data sanitization: a study in scarlet

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    NSPW '10: Proceedings of the 2010 New Security Paradigms Workshop
    September 2010
    174 pages
    ISBN:9781450304153
    DOI:10.1145/1900546
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • ACSA: Applied Computing Security Assoc

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 September 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data anonymization
    2. ontology
    3. privacy
    4. sanitization

    Qualifiers

    • Research-article

    Conference

    NSPW '10
    Sponsor:
    • ACSA
    NSPW '10: 2010 New Security Paradigms Workshop
    September 21 - 23, 2010
    Massachusetts, Concord, USA

    Acceptance Rates

    NSPW '10 Paper Acceptance Rate 13 of 32 submissions, 41%;
    Overall Acceptance Rate 98 of 265 submissions, 37%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 23 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media