|
ABSTRACT
Clustering is the process of grouping a set of objects into classes of similar objects. Although definitions of similarity vary from one clustering model to another, in most of these models the concept of similarity is based on distances, e.g., Euclidean distance or cosine distance. In other words, similar objects are required to have close values on at least a set of dimensions. In this paper, we explore a more general type of similarity. Under the pCluster model we proposed, two objects are similar if they exhibit a coherent pattern on a subset of dimensions. For instance, in DNA microarray analysis, the expression levels of two genes may rise and fall synchronously in response to a set of environmental stimuli. Although the magnitude of their expression levels may not be close, the patterns they exhibit can be very much alike. Discovery of such clusters of genes is essential in revealing significant connections in gene regulatory networks. E-commerce applications, such as collaborative filtering, can also benefit from the new model, which captures not only the closeness of values of certain leading indicators but also the closeness of (purchasing, browsing, etc.) patterns exhibited by the customers. Our paper introduces an effective algorithm to detect such clusters, and we perform tests on several real and synthetic data sets to show its effectiveness.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Ramesh C. Agarwal , Charu C. Aggarwal , V. V. V. Prasad, Depth first generation of long patterns, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.108-118, August 20-23, 2000, Boston, Massachusetts, United States
[doi> 10.1145/347090.347114]
|
 |
2
|
Charu C. Aggarwal , Joel L. Wolf , Philip S. Yu , Cecilia Procopiuc , Jong Soo Park, Fast algorithms for projected clustering, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.61-72, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
3
|
|
 |
4
|
Rakesh Agrawal , Johannes Gehrke , Dimitrios Gunopulos , Prabhakar Raghavan, Automatic subspace clustering of high dimensional data for data mining applications, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.94-105, June 01-04, 1998, Seattle, Washington, United States
|
| |
5
|
|
 |
6
|
Chun-Hung Cheng , Ada Waichee Fu , Yi Zhang, Entropy-based subspace clustering for mining numerical data, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.84-93, August 15-18, 1999, San Diego, California, United States
[doi> 10.1145/312129.312199]
|
| |
7
|
|
| |
8
|
P. D'haeseleer, S. Liang, and R. Somogyi. Gene expression analysis and genetic network modeling. In Pacific Symposium on Biocomputing, 1999.
|
| |
9
|
M. Ester, H. Kriegel, J. Sander, and X. Xu. A density-bsed algorithm for discovering clusters in large spatial databases with noise. In SIGKDD, pages 226-231, 1996.
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
R. S. Michalski and R. E. Stepp. Learning from observation: conceptual clustering. In Machine Learning: An Artificial Intelligence Approach, pages 331-363, 1983.
|
| |
14
|
F. Murtagh. A survey of recent hierarchical clustering algorithms. In The Computer Journal, 1983.
|
| |
15
|
H. Nagesh, S. Goil, and A. Choudhary. Mafia: Efficient and scalable subspace clustering for very large data sets. Technical Report 9906-010, Northwestern University, 1999.
|
| |
16
|
|
| |
17
|
J. Riedl and J. Konstan. Movielens dataset. In http://www.cs.umn.edu/Research/Group Lens.
|
| |
18
|
|
| |
19
|
S. Tavazoie, J. Hughes, M. Campbell, R. Cho, and G. Church. Yeast micro data set. In http://arep.med.harvard.edu/biclustering/yeast.matrix, 2000.
|
| |
20
|
|
 |
21
|
Tian Zhang , Raghu Ramakrishnan , Miron Livny, BIRCH: an efficient data clustering method for very large databases, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.103-114, June 04-06, 1996, Montreal, Quebec, Canada
|
CITED BY 53
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Daxin Jiang , Jian Pei , Murali Ramanathan , Chun Tang , Aidong Zhang, Mining coherent gene clusters from gene-sample-time microarray data, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Aristides Gionis , Alexander Hinneburg , Spiros Papadimitriou , Panayiotis Tsaparas, Dimension induced clustering, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Elke Achtert , Christian Böhm , Hans-Peter Kriegel , Peer Kröger , Arthur Zimek, Deriving quantitative models for correlation clusters, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
|
|
|
|
|
|
Byron J. Gao , Obi L. Griffith , Martin Ester , Steven J. M. Jones, Discovering significant OPSM subspace clusters in massive gene expression data, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Carlotta Domeniconi , Dimitrios Gunopulos , Sheng Ma , Bojun Yan , Muna Al-Razgan , Dimitris Papadopoulos, Locally adaptive metrics for clustering high dimensional data, Data Mining and Knowledge Discovery, v.14 n.1, p.63-97, February 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yuhai Zhao , Jeffrey Xu Yu , Guoren Wang , Lei Chen , Bin Wang , Ge Yu, Maximal Subspace Coregulated Gene Clustering, IEEE Transactions on Knowledge and Data Engineering, v.20 n.1, p.83-98, January 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Meghana Deodhar , Gunjan Gupta , Joydeep Ghosh , Hyuk Cho , Inderjit Dhillon, A scalable framework for discovering coherent co-clusters in noisy data, Proceedings of the 26th Annual International Conference on Machine Learning, p.241-248, June 14-18, 2009, Montreal, Quebec, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|