skip to main content
10.1145/1557019.1557126acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Toward autonomic grids: analyzing the job flow with affinity streaming

Published: 28 June 2009 Publication History

Abstract

The Affinity Propagation (AP) clustering algorithm proposed by Frey and Dueck (2007) provides an understandable, nearly optimal summary of a dataset, albeit with quadratic computational complexity. This paper, motivated by Autonomic Computing, extends AP to the data streaming framework. Firstly a hierarchical strategy is used to reduce the complexity to O(N1+ε); the distortion loss incurred is analyzed in relation with the dimension of the data items. Secondly, a coupling with a change detection test is used to cope with non-stationary data distribution, and rebuild the model as needed. The presented approach StrAP is applied to the stream of jobs submitted to the EGEE Grid, providing an understandable description of the job flow and enabling the system administrator to spot online some sources of failures.

Supplementary Material

JPG File (p987-zhang.jpg)
MP4 File (p987-zhang.mp4)

References

[1]
F. Bergeaud and S. Mallat. Matching pursuit of images. In ICIP, pages 53--56, 1995.
[2]
F. Cao, M. Ester, W. Qian, and A. Zhou. Density-based clustering over an evolving data stream with noise. In SIAM Conference on Data Mining (SDM), pages 326--337, 2006.
[3]
G. Cormode, S. Muthukrishnan, and W. Zhuang. Conquering the divide: Continuous clustering of distributed data streams. In ICDE, pages 1036--1045, 2007.
[4]
M. Ester. A density-based algorithm for discovering clusters in large spatial databases with noise: the uniqueness of a good optimum for k-means. In SIGKDD, pages 226--231, 1996.
[5]
U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining to knowledge discovery: An overview. In Advances in Knowledge Discovery and Data Mining, pages 1--34. MIT Press, 1996.
[6]
B. Frey and D. Dueck. Clustering by passing messages between data points. Science, 315:972--976, 2007.
[7]
J. Gama, R. Rocha, and P. Medas. Accurate decision trees for mining highspeed data streams. In SIGMOD, pages 523--528, 2003.
[8]
J. Gama and P. P. Rodrigues. Stream-based electricity load forecast. In PKDD, pages 446--453, 2007.
[9]
S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams: Theory and practice. TKDE, 15:515--528, 2003.
[10]
Z. Harchaoui, F. Bach, and E. Moulines. Kernel change-point analysis. In NIPS, 2008.
[11]
D. Hinkley. Inference about the change-point from cumulative sum tests. Biometrika, 58:509--523, 1971.
[12]
D. Judd, P. K. McKinley, and A. K. Jain. Large-scale parallel data clustering. IEEE Trans. Pattern Anal. Mach. Intell., 20:871--876, 1998.
[13]
M. Leone, Sumedha, and M. Weigt. Clustering by soft-constraint afinity propagation: Applications to gene-expression data. Bioinformatics, 23:2708, 2007.
[14]
M. Meila. The uniqueness of a good optimum for k-means. In ICML, pages 625--632, 2006.
[15]
S. Muthukrishnan, E. v. d. Berg, and Y. Wu. Sequential change detection on data streams. In ICDM Workshops, 2007.
[16]
E. Page. Continuous inspection schemes. Biometrika, 41:100--115, 1954.
[17]
N. Palatin, A. Leizarowitz, A. Schuster, and R. Wolff. Mining for misconfigured machines in grid systems. In SIGKDD, pages 687--692, 2006.
[18]
C. E. Rasmussen and C. K. Williams. Gaussian Processes for Machine Learning. MIT Press, 01 2006.
[19]
I. Rish, M. Brodie, and S. M. et al. Adaptive diagnosis in distributed systems. IEEE Trans. on Neural Networks, 16:1088--1109, 2005.
[20]
G. Schwarz. Estimating the dimension of a model. The Annals of Statistics, 6:461--464, 1978.
[21]
J. Villemonteix, E. Vazquez, M. Sidorkiewicz and E. Walter. Global optimization of expensive-to-evaluate functions: an empirical comparison of two sampling criteria. Journal of Global Optimization, vol 43 (2-3), p.373--389, 2009
[22]
X. Zhang, C. Furtlehner, and M. Sebag. Data streaming with afinity propagation. In ECML/PKDD, pages 628--643, 2008.
[23]
J. Andreeva, B. Gaidioz, J. Herrala, and et al. Dashboard for the LHC experiments. Journal of Physics: Conference Series, vol. 119, 2008.
[24]
Real Time Monitor: http://gridportal.hep.ph.ic.ac.uk/rtm/.
[25]
X. Zhang, C. Furtlehner, and M. Sebag. INRIA research report in progress.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
June 2009
1426 pages
ISBN:9781605584959
DOI:10.1145/1557019
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. affinity propagation
  2. autonomic computing
  3. online clustering

Qualifiers

  • Research-article

Conference

KDD09

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media