skip to main content
research-article

Protecting Respondents' Identities in Microdata Release

Published: 01 November 2001 Publication History

Abstract

Today's globally networked society places great demand on the dissemination and sharing of information. While in the past released information was mostly in tabular and statistical form, many situations call today for the release of specific data (microdata). In order to protect the anonymity of the entities (called respondents) to which information refers, data holders often remove or encrypt explicit identifiers such as names, addresses, and phone numbers. Deidentifying data, however, provides no guarantee of anonymity. Released information often contains other data, such as race, birth date, sex, and ZIP code, that can be linked to publicly available information to reidentify respondents and inferring information that was not intended for disclosure. In this paper we address the problem of releasing microdata while safeguarding the anonymity of the respondents to which the data refer. The approach is based on the definition of k-anonymity. A table provides k-anonymity if attempts to link explicitly identifying information to its content map the information to at least k entities. We illustrate how k-anonymity can be provided without compromising the integrity (or truthfulness) of the information released by using generalization and suppression techniques. We introduce the concept of minimal generalization that captures the property of the release process not to distort the data more than needed to achieve k-anonymity, and present an algorithm for the computation of such a generalization. We also discuss possible preference policies to choose among different minimal generalizations.

References

[1]
N.R. Adam and J.C. Wortman, “Security-Control Methods for Statistical Databases: A Comparative Study,” ACM Computing Surveys, vol. 21, no. 4, pp. 515-556, 1989.
[2]
R. Anderson, “A Security Policy Model for Clinical Information Systems,” Proc. IEEE Symp. Security and Privacy, pp. 30-43, May 1996.
[3]
L.H. Cox, “Suppression Methodology and Statistical Disclosure Analysis,” J.Am. Statistical Assoc., vol. 7, no. 5, pp. 377-385, 1980.
[4]
T. Dalenius, “Finding a Needle in a Haystack—or Identifying Anonymous Census Record,” J. Official Statistics, vol. 2, no. 3, pp. 329-336, 1986.
[5]
B.A. Davey and H.A. Priestley, Introduction to Lattices and Order. Cambridge Univ. Press, 1990.
[6]
D.E. Denning, Cryptography and Data Security. Addison-Wesley, 1982.
[7]
J. Dobson S. Jajodia M. Olivier P. Samarati and B. Thuraisingham, “Privacy Issues in WWW and Data Mining,” IFIP WG11. 3 Working Conf. Database Security—Panel Notes, 1998.
[8]
Private Lives and Public Policies, G.T. Duncan, T.B. Jabine, and V.A. de Wolf, eds., Nat'l Academy Press, 1993.
[9]
A. Hundepool and L. Willenborg, “μ- and τ-ARGUS: Software for Statistical Disclosure Control,” Proc. Third Int'l Seminar Statistical Confidentiality, 1996.
[10]
S. Jajodia and C. Meadows, “Inference Problems in Multilevel Secure Database Management Systems,” Information Security—An Integrated Collection of Essays. M.D. Abrams, S. Jajodia, and H.J. Podell, eds., pp. 570-584, IEEE C. S. Press, May 1989.
[11]
T. Lunt, “Aggregation and Inference: Facts and Fallacies,” Proc. IEEE Symp. Security and Privacy, pp. 102-109, May 1989.
[12]
Committee on Maintaining Privacy and Security in Health Care Application of the National Information Infrastructure, For the Record—Protecting Electronic Health Information, 1997.
[13]
Federal Committee on Statistical Methodology, “Statistical Policy Working Paper 22,” Report on Statistical Disclosure Limitation Methodology, May 1994.
[14]
X. Qian M.E. Stickel P.D. Karp T.F. Lunt and T.D. Garvey, “Detection and Elimination of Inference Channels in Multilevel Relational Database Systems,” Proc. IEEE Symp. Security and Privacy, pp. 196-205, May 1993.
[15]
P. Samarati and L. Sweeney, “Protecting Privacy when Disclosing Information: k-anonymity and Its Enforcement through Generalization and Suppression,” Technical Report, SRI Int'l, Mar. 1998.
[16]
L. Sweeney, “Guaranteeing Anonymity when Sharing Medical Data, the Datafly System,” Proc. J. Am. Medical Informatics Assoc., Washington, DC.: Hanley & Belfus, Inc., 1997.
[17]
L. Sweeney, “Weaving Technology and Policy Together to Maintain Confidentiality,” J. Law, Medicine Ethics, vol. 25, nos. 2 and 3, pp. 98-110, 1997
[18]
R. Turn, “Information Privacy Issues for the 1990's,” Proc. IEEE Symp. Security and Privacy, pp. 394-400, May 1990.
[19]
J.D. Ullman, Principles of Databases and Knowledge-Base Systems, vol. I, Computer Science Press, 1989.
[20]
L. Willenborg and T. De Waal, Statistical Disclosure Control in Practice. Springer-Verlag, 1996.
[21]
B. Woodward, “The Computer-Based Patient Record Confidentiality,” The New England J. Medicine, vol. 333, no. 21, pp. 1419-1422, 1995.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering  Volume 13, Issue 6
November 2001
198 pages

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 November 2001

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media