ACM Home Page
Please provide us with feedback. Feedback
Toward autonomic grids: analyzing the job flow with affinity streaming
Full text MovMov (11:33),  PdfPdf (773 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Paris, France
SESSION: Research track papers table of contents
Pages 987-996  
Year of Publication: 2009
ISBN:978-1-60558-495-9
Authors
Xiangliang Zhang  INRIA, Université Paris Sud, Orsay, France
Cyril Furtlehner  INRIA, Orsay, France
Julien Perez  Université Paris Sud, Orsay, France
Cecile Germain-Renaud  Université Paris Sud, Orsay, France
Michèle Sebag  CNRS, Orsay, France
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 28,   Downloads (12 Months): 80,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1557019.1557126
What is a DOI?

ABSTRACT

The Affinity Propagation (AP) clustering algorithm proposed by Frey and Dueck (2007) provides an understandable, nearly optimal summary of a dataset, albeit with quadratic computational complexity. This paper, motivated by Autonomic Computing, extends AP to the data streaming framework. Firstly a hierarchical strategy is used to reduce the complexity to O(N1+ε); the distortion loss incurred is analyzed in relation with the dimension of the data items. Secondly, a coupling with a change detection test is used to cope with non-stationary data distribution, and rebuild the model as needed. The presented approach StrAP is applied to the stream of jobs submitted to the EGEE Grid, providing an understandable description of the job flow and enabling the system administrator to spot online some sources of failures.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
F. Cao, M. Ester, W. Qian, and A. Zhou. Density-based clustering over an evolving data stream with noise. In SIAM Conference on Data Mining (SDM), pages 326--337, 2006.
 
3
G. Cormode, S. Muthukrishnan, and W. Zhuang. Conquering the divide: Continuous clustering of distributed data streams. In ICDE, pages 1036--1045, 2007.
 
4
M. Ester. A density-based algorithm for discovering clusters in large spatial databases with noise: the uniqueness of a good optimum for k-means. In SIGKDD, pages 226--231, 1996.
 
5
 
6
B. Frey and D. Dueck. Clustering by passing messages between data points. Science, 315:972--976, 2007.
7
 
8
 
9
 
10
Z. Harchaoui, F. Bach, and E. Moulines. Kernel change-point analysis. In NIPS, 2008.
 
11
D. Hinkley. Inference about the change-point from cumulative sum tests. Biometrika, 58:509--523, 1971.
 
12
 
13
14
 
15
 
16
E. Page. Continuous inspection schemes. Biometrika, 41:100--115, 1954.
17
 
18
 
19
I. Rish, M. Brodie, and S. M. et al. Adaptive diagnosis in distributed systems. IEEE Trans. on Neural Networks, 16:1088--1109, 2005.
 
20
G. Schwarz. Estimating the dimension of a model. The Annals of Statistics, 6:461--464, 1978.
 
21
 
22
 
23
J. Andreeva, B. Gaidioz, J. Herrala, and et al. Dashboard for the LHC experiments. Journal of Physics: Conference Series, vol. 119, 2008.
 
24
Real Time Monitor: http://gridportal.hep.ph.ic.ac.uk/rtm/.
 
25
X. Zhang, C. Furtlehner, and M. Sebag. INRIA research report in progress.

Collaborative Colleagues:
Xiangliang Zhang: colleagues
Cyril Furtlehner: colleagues
Julien Perez: colleagues
Cecile Germain-Renaud: colleagues
Michèle Sebag: colleagues