ACM Home Page
Please provide us with feedback. Feedback
Stream data clustering based on grid density and attraction
Full text PdfPdf (3.85 MB)
Source
ACM Transactions on Knowledge Discovery from Data (TKDD) archive
Volume 3 ,  Issue 3  (July 2009) table of contents
Article No. 12  
Year of Publication: 2009
ISSN:1556-4681
Authors
Li Tu  Nanjing University of Aeronautics and Astronautics, Nanjing, China
Yixin Chen  Washington University in St. Louis, St. Louis, MO
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 83,   Downloads (12 Months): 244,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1552303.1552305
What is a DOI?

ABSTRACT

Clustering real-time stream data is an important and challenging problem. Existing algorithms such as CluStream are based on the k-means algorithm. These clustering algorithms have difficulties finding clusters of arbitrary shapes and handling outliers. Further, they require the knowledge of k and user-specified time window. To address these issues, this article proposes D-Stream, a framework for clustering stream data using a density-based approach.

Our algorithm uses an online component that maps each input data record into a grid and an offline component that computes the grid density and clusters the grids based on the density. The algorithm adopts a density decaying technique to capture the dynamic changes of a data stream and a attraction-based mechanism to accurately generate cluster boundaries.

Exploiting the intricate relationships among the decay factor, attraction, data density, and cluster structure, our algorithm can efficiently and effectively generate and adjust the clusters in real time. Further, a theoretically sound technique is developed to detect and remove sporadic grids mapped by outliers in order to dramatically improve the space and time efficiency of the system. The technique makes high-speed data stream clustering feasible without degrading the clustering quality. The experimental results show that our algorithm has superior quality and efficiency, can find clusters of arbitrary shapes, and can accurately recognize the evolving behaviors of real-time data streams.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Bandyopadhyay, S., Giannella, C., Maulik, U., Kargupta, H., Liu, K., and Datta, S. 2006. Clustering distributed data streams in peer-to-peer environments. Inform. Sci. 176, 14, 1952--1985.
3
 
4
 
5
Chen, Y., Dong, G., Han, J., Pei, J., Wah, B. W., and Wang, J. 2002. OLAPing stream data: Is it feasible? In Proceedings of the Workshop on Research Issues in Data Mining and Knowledge Discovery, ACM SIGMOD. 53--58.
6
 
7
 
8
Giannella, C., Han, J., Pei, J., Yan, X., and Yu, P. S. 2003. Mining frequent patterns in data streams at multiple-time granularities. Next Gen. Data Min., 191--212.
 
9
10
 
11
 
12
 
13
 
14
 
15
 
16
 
17
 
18
 
19
20
 
21
Wang, Z., Wang, B., Zhou, C., and Xu, X. 2004. Clustering data streams on the two-tier structure. Advanced Web Technologies and Applications, 416--425.
 
22