ACM Home Page
Please provide us with feedback. Feedback
Density-based clustering of data streams at multiple resolutions
Full text PdfPdf (1.19 MB)
Source
ACM Transactions on Knowledge Discovery from Data (TKDD) archive
Volume 3 ,  Issue 3  (July 2009) table of contents
Article No. 14  
Year of Publication: 2009
ISSN:1556-4681
Authors
Li Wan  Nanyang Technological Unviserity
Wee Keong Ng  Nanyang Technological Unviserity
Xuan Hong Dang  Institute of Infocomm Research, Singapore
Philip S. Yu  University of Illinios at Chicago
Kuan Zhang  Singapore Management University
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 86,   Downloads (12 Months): 237,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1552303.1552307
What is a DOI?

ABSTRACT

In data stream clustering, it is desirable to have algorithms that are able to detect clusters of arbitrary shape, clusters that evolve over time, and clusters with noise. Existing stream data clustering algorithms are generally based on an online-offline approach: The online component captures synopsis information from the data stream (thus, overcoming real-time and memory constraints) and the offline component generates clusters using the stored synopsis. The online-offline approach affects the overall performance of stream data clustering in various ways: the ease of deriving synopsis from streaming data; the complexity of data structure for storing and managing synopsis; and the frequency at which the offline component is used to generate clusters. In this article, we propose an algorithm that (1) computes and updates synopsis information in constant time; (2) allows users to discover clusters at multiple resolutions; (3) determines the right time for users to generate clusters from the synopsis information; (4) generates clusters of higher purity than existing algorithms; and (5) determines the right threshold function for density-based clustering based on the fading model of stream data. To the best of our knowledge, no existing data stream algorithms has all of these features. Experimental results show that our algorithm is able to detect arbitrarily shaped, evolving clusters with high quality.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
Cao, F., Ester, M., Qian, W., and Zhou, A. 2006. Density-based clustering over an evolving data stream with noise. In Proceedings of the SIAM Conference on Data Mining.
4
5
 
6
 
7
 
8
 
9
Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 226--231.
 
10
 
11
Hinneburg, E. and Keim, D. A. 1998. An efficient approach to clustering in large multimedia databases with noise. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. AAAI Press, 58--65.
12
 
13
 
14
15
 
16
 
17
Yang, J. 2003. Dynamic clustering of evolving streams with a single pass. In Proceedings of the International Conference on Data Engineering. 695.

Collaborative Colleagues:
Li Wan: colleagues
Wee Keong Ng: colleagues
Xuan Hong Dang: colleagues
Philip S. Yu: colleagues
Kuan Zhang: colleagues