skip to main content
10.1145/956750.956776acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Privacy-preserving k-means clustering over vertically partitioned data

Published: 24 August 2003 Publication History

Abstract

Privacy and security concerns can prevent sharing of data, derailing data mining projects. Distributed knowledge discovery, if done correctly, can alleviate this problem. The key is to obtain valid results, while providing guarantees on the (non)disclosure of data. We present a method for k-means clustering when different sites contain different attributes for a common set of entities. Each site learns the cluster of each entity, but learns nothing about the attributes at other sites.

References

[1]
D. Agrawal and C. C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the Twentieth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 247--255, Santa Barbara, California, USA, May 21--23 2001. ACM.]]
[2]
R. Agrawal and R. Srikant. Privacy-preserving data mining. In Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, pages 439--450, Dallas, TX, May 14--19 2000. ACM.]]
[3]
J. Benaloh. Dense probabilistic encryption. In Proceedings of the Workshop on Selected Areas of Cryptography, pages 120--128, Kingston, Ontario, May 1994.]]
[4]
P. S. Bradley and U. M. Fayyad. Refining initial points for K-Means clustering. In Proc. 15th International Conf. on Machine Learning, pages 91--99. Morgan Kaufmann, San Francisco, CA, 1998.]]
[5]
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, B 39:1--38, 1977.]]
[6]
I. S. Dhillon and D. S. Modha. A data-clustering algorithm on distributed memory multiprocessors. In Proceedings of Large-scale Parallel KDD Systems Workshop, ACM SIGKDD, Aug. 15--18 1999. (Also published as Large-Scale Parallel Data Mining, Lecture Notes in Artificial Intelligence, Volume 1759, pp. 245--260, 2000).]]
[7]
W. Du and M. J. Atallah. Privacy-preserving statistical analysis. In Proceeding of the 17th Annual Computer Security Applications Conference, New Orleans, Louisiana, USA, December 10--14 2001.]]
[8]
W. Du and M. J. Atallah. Secure multi-party computation problems and their applications: A review and open problems. In New Security Paradigms Workshop, pages 11--20, Cloudcroft, New Mexico, USA, September 11--13 2001.]]
[9]
R. Duda and P. E. Hart. Pattern Classification and Scene. Analysis. John Wiley & Sons, 1973.]]
[10]
A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke. Privacy preserving mining of association rules. In The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 217--228, Edmonton, Alberta, Canada, July 23--26 2002.]]
[11]
M. Feingold, M. Corzine, M. Wyden, and M. Nelson. Data-mining moratorium act of 2003. U.S. Senate Bill (proposed), Jan. 16 2003.]]
[12]
M. Franklin and M. Yung. Varieties of secure distributed computing. In Proc. Sequences II, Methods in Communications, Security and Computer Science, pages 392--417, Positano, Italy, June 1991.]]
[13]
K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, San Diego, CA, 1990.]]
[14]
O. Goldreich. Secure multi-party computation, Sept. 1998. (working draft).]]
[15]
O. Goldreich, S. Micali, and A. Wigderson. How to play any mental game - a completeness theorem for protocols with honest majority. In 19th ACM Symposium on the Theory of Computing, pages 218--229, 1987.]]
[16]
M. Kantarcioglu and C. Clifton. Privacy-preserving distributed mining of association rules on horizontally partitioned data. In The ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD'02), pages 24--31, Madison, Wisconsin, June 2 2002.]]
[17]
M. Kantarcioĝlu and C. Clifton. Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE-TKDE, submitted.]]
[18]
M. Kantarcioglu and J. Vaidya. An architecture for privacy-preserving mining of client information. In C. Clifton and V. Estivill-Castro, editors, IEEE International Conference on Data Mining Workshop on Privacy, Security, and Data Mining, volume 14, pages 37--42, Maebashi City, Japan, Dec. 9 2002. Australian Computer Society.]]
[19]
H. Kargupta, W. Huang, K. Sivakumar, and E. Johnson. Distributed clustering using collective principal component analysis. Knowledge and Information Systems, 3(4):405--421, Nov. 2001.]]
[20]
E. M. Knorr, R. T. Ng, and R. H. Zamar. Robust space transformations for distance-based operations. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 126--135, San Francisco, California, 2001. ACM Press.]]
[21]
X. Lin and C. Clifton. Privacy preserving clustering with distributed EM mixture modeling. Knowledge and Information Systems, Submitted.]]
[22]
Y. Lindell and B. Pinkas. Privacy preserving data mining. In Advances in Cryptology - CRYPTO 2000, pages 36--54. Springer-Verlag, Aug. 20--24 2000.]]
[23]
G. J. McLachlan and T. Krishnan. The EM Algorithm and Extensions. John Wiley & Sons, 1997.]]
[24]
D. Naccache and J. Stern. A new public key cryptosystem based on higher residues. In Proceedings of the 5th ACM conference on Computer and communications security, pages 59--66, San Francisco, California, United States, 1998. ACM Press.]]
[25]
T. Okamoto and S. Uchiyama. A new public-key cryptosystem as secure as factoring. In Advances in Cryptology - Eurocrypt '98, LNCS 1403, pages 308--318. Springer-Verlag, 1998.]]
[26]
P. Paillier. Public key cryptosystems based on composite degree residuosity classes. In Advances in Cryptology - Eurocrypt '99 Proceedings, LNCS 1592, pages 223--238. Springer-Verlag, 1999.]]
[27]
S. J. Rizvi and J. R. Haritsa. Maintaining data privacy in association rule mining. In Proceedings of 28th International Conference on Very Large Data Bases, pages 682--693, Hong Kong, Aug. 20--23 2002. VLDB.]]
[28]
J. Vaidya and C. Clifton. Privacy preserving association rule mining in vertically partitioned data. In The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 639--644, Edmonton, Alberta, Canada, July 23--26 2002.]]
[29]
A. C. Yao. How to generate and exchange secrets. In Proc. of the 27th IEEE Symposium on Foundations of Computer Science, pages 162--167. IEEE, 1986.]]

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
August 2003
736 pages
ISBN:1581137370
DOI:10.1145/956750
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2003

Permissions

Request permissions for this article.

Check for updates

Author Tag

  1. privacy

Qualifiers

  • Article

Conference

KDD03
Sponsor:

Acceptance Rates

KDD '03 Paper Acceptance Rate 46 of 298 submissions, 15%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)70
  • Downloads (Last 6 weeks)2
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media