|
ABSTRACT
The rapid growth in the volume of image and video data collections motivates the research of building an index structure in image information retrieval. Constructing an index in the image database poses a very challenging problem due to the facts of image databases containing data with high dimensions, and lack of domain knowledge. ClusterTree is an indexing approach representing clusters generated by any existing clustering approach and do not need any prior knowledge. It is a hierarchy of clusters and subcluster which incorporates the cluster representation into the index structure to achieve effective and efficient retrieval. However, one disadvantage of ClusterTree is that non-clustering data points are often ignored. These non-clustering data points might represent interesting targets in an image database. In this paper, we propose a modified ClusterTree structure(called SS-ClusterTree), which is based on subspace clustering. The SS-ClusterTree includes two kinds of leaf nodes, a cluster leaf node and a noise leaf node. When a new data item is added to the SS-ClusterTree, if it belongs to a cluster, it is inserted into the corresponding the cluster leaf node, otherwise into the noise leaf node. The noise leaf node will be split while its volume is more than a certain threshold. We present a novel updating technique which optimizes the internal structure of the SS-ClusterTree by utilizing the Newton's Universal Law of Gravitation. When a noise node is split, the attraction forces are calculated between every new node and its sibling nodes. These new nodes may be merged by their sibling nodes, if the attraction force between them is the most significant. Meanwhile the nodes intersecting boundaries are updated. This approach guarantees that the SS-ClusterTree always represents the current dataset structure, and helps to identify the pattern hiding in the newly added data. SS-ClusterTree can efficiently support the dynamic insertion and manage the dataset with non-clustering data, and is highly adaptive to any kind of cluster structure. Our experiment results also show that this index structure is effective and efficient.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Ester M, Kriegel H, Sander J, Xu XW. A density-based algorithm for discovering clusters in large spatial databases with noise. Proc. of the 2nd Int'l Conf. on Knowledge Discovery and Data Mining (KDD'96). Portland: AAAI Press, 1996. 226--231.
|
| |
3
|
Hinneburg A, Keim D. An efficient approach to clustering in large multimedia databases with noise. Proc. of the 4th Int'l Conf. on Knowledge Discovery and Data Mining (KDD'98). New York: AAAI Press, 1998. 58--65.
|
| |
4
|
|
| |
5
|
|
| |
6
|
Hinneburg, A., Keim, D.A.: An optimal grid-clustering: Towards breaking the curse of dimensionality in high-dimensional clustering. The VLDB Journal.1999. 506--517
|
 |
7
|
Rakesh Agrawal , Johannes Gehrke , Dimitrios Gunopulos , Prabhakar Raghavan, Automatic subspace clustering of high dimensional data for data mining applications, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.94-105, June 01-04, 1998, Seattle, Washington, United States
|
| |
8
|
Karin Kailing, Hans-Peter Kriegel, Peer Kroger. Density-Connected Subspace Clustering for High-Dimensional Data. In Proc. 4th SIAM Int. Conf. on Data Mining, pp.246--257, Lake Buena Vista,FL,2004
|
| |
9
|
Carlotta Domeniconi, D. Papadopoulos, D. Gunopulos, Sheng Ma. Subspace Clustering of High Dimensional Data, SIAM International Conference on Data Mining (SDM), Apr. 2004, http://www.cs.ucr.edu/~dimitris/publications.html
|
| |
10
|
Hongli Xu, De Xu, and Enai Lin. An applicable hierarchical clustering algorithm for content based image retrieval, In Proc. Computer Vision / Computer Graphics Collaboration Techniques and Applications, MIRAGE'07 International Conference, France, Mar. 2007 pp.82--92
|
| |
11
|
W. Niblack, R. Barber, W. Equitz, M. Flickner, E. Glasman, D. Petkovic, P. Yanker, C. Faloutsos, and G. Taubin, "The QBIC Project: Querying Images By Content Using Color, Texture, and Shape," Proc. of the SPIE Conf. on Storage and Retrieval for Image and Video Databases II, 1993. 173--187.
|
 |
12
|
Tian Zhang , Raghu Ramakrishnan , Miron Livny, BIRCH: an efficient data clustering method for very large databases, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.103-114, June 04-06, 1996, Montreal, Quebec, Canada
|
 |
13
|
Sudipto Guha , Rajeev Rastogi , Kyuseok Shim, CURE: an efficient clustering algorithm for large databases, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.73-84, June 01-04, 1998, Seattle, Washington, United States
|
 |
14
|
|
 |
15
|
Chun-Hung Cheng , Ada Waichee Fu , Yi Zhang, Entropy-based subspace clustering for mining numerical data, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.84-93, August 15-18, 1999, San Diego, California, United States
[doi> 10.1145/312129.312199]
|
| |
16
|
Sanjay Goil, H. Nagesh, A. Choudhary. MAFIA: Efficient and Scalable Subspace Clustering Clustering for Very Large Data Sets. Technical Report CPDC-TR-9906-010, Northwestern University, 2145 Sheridan Road, Evanston IL 60208, June 1999
|
 |
17
|
|
| |
18
|
Kyoung-Gu Woo, Jeong-Hoon Lee, Myoung-Ho Kim, Yoon-Joon Lee. FINDIT: a fast and intelligent subspace clustering algorithm using dimension voting. Information & Software Technology, VOL 46(4),2004,pp255--271.
|
| |
19
|
WANG Jian-hui, SHEN Zhan, HU Yun-fa. An Applicable and Efficient Clustering Algorithm{J}.Journal of Software,2004,15(5):697--705.
|
|