|
ABSTRACT
Relational clustering has attracted more and more attention due to its phenomenal impact in various important applications which involve multi-type interrelated data objects, such as Web mining, search marketing, bioinformatics, citation analysis, and epidemiology. In this paper, we propose a probabilistic model for relational clustering, which also provides a principal framework to unify various important clustering tasks including traditional attributes-based clustering, semi-supervised clustering, co-clustering and graph clustering. The proposed model seeks to identify cluster structures for each type of data objects and interaction patterns between different types of objects. Under this model, we propose parametric hard and soft relational clustering algorithms under a large number of exponential family distributions. The algorithms are applicable to relational data of various structures and at the same time unifies a number of stat-of-the-art clustering algorithms: co-clustering algorithms, the k-partite graph clustering, Bregman k-means, and semi-supervised clustering based on hidden Markov random fields.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
E. Airoldi, D. Blei, E. Xing, and S. Fienberg. Mixed membership stochastic block models for relational data with application to protein-protein interactions. In ENAR-2006.
|
 |
2
|
Arindam Banerjee , Inderjit Dhillon , Joydeep Ghosh , Srujana Merugu , Dharmendra S. Modha, A generalized maximum entropy approach to bregman co-clustering and matrix approximation, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
[doi> 10.1145/1014052.1014111]
|
| |
3
|
|
 |
4
|
|
| |
5
|
I. Bhattachrya and L. Getor. Entity resolution in graph data. Technical Report CS-TR-4758, University of Maryland, 2005.
|
| |
6
|
T. N. Bui and C. Jones. A heuristic for reducing fill-in in sparse matrix factorization. In PPSC, pages 445--452, 1993.
|
 |
7
|
Pak K. Chan , Martine D. F. Schlag , Jason Y. Zien, Spectral K-way ratio-cut partitioning and clustering, Proceedings of the 30th international conference on Design automation, p.749-754, June 14-18, 1993, Dallas, Texas, United States
[doi> 10.1145/157485.165117]
|
| |
8
|
H. Cho, I. Dhillon, Y. Guan, and S. Sra. Minimum sum squared residue co-clustering of gene expression data. In SDM, 2004.
|
| |
9
|
M. Collins, S. Dasgupta, and R. Reina. A generalizaion of principal component analysis to the exponential family. In NIPS'01, 2001.
|
| |
10
|
I. Dhillon, Y. Guan, and B. Kulis. A unified view of kernel k-means, spectral clustering and graph cuts. Technical Report TR-04-25, University of Texas at Austin, 2004.
|
 |
11
|
|
 |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
E. Erosheva, S. Fienberg, and J. Lafferty. Mixed membership models of scientific publications. In NAS.
|
| |
16
|
E. Erosheva and S. E. Fienberg. Bayesian mixed membership models for soft clustering and classification. Classification-The Ubiquitous Challenge, pages 11--26, 2005.
|
| |
17
|
S. E. Fienberg, M. M. Meyer, and S. Wasserman. Satistical analysis of multiple cociometric relations. Journal of American Satistical Association, 80:51--87, 1985.
|
 |
18
|
Bin Gao , Tie-Yan Liu , Xin Zheng , Qian-Sheng Cheng , Wei-Ying Ma, Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering, Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
[doi> 10.1145/1081870.1081879]
|
| |
19
|
L. Getoor. An introduction to probabilistic graphical models for relational data. Data Engineering Bulletin, 29, 2006.
|
 |
20
|
|
| |
21
|
P. Hoff, A. Rafery, and M. Handcock. Latent space approaches to social network analysis. Journal of American Satistical Association, 97:1090--1098, 2002.
|
| |
22
|
T. Hofmann. Probabilistic latent semantic analysis. In Proc. of Uncertainty in Artificial Intelligence, UAI'99, Stockholm, 1999.
|
| |
23
|
|
 |
24
|
|
 |
25
|
Hongyuan Zha , Xiaofeng He , Chris Ding , Horst Simon , Ming Gu, Bipartite graph partitioning and data clustering, Proceedings of the tenth international conference on Information and knowledge management, October 05-10, 2001, Atlanta, Georgia, USA
[doi> 10.1145/502585.502591]
|
 |
26
|
|
| |
27
|
G. Karypis. A clustering toolkit, 2002.
|
| |
28
|
|
| |
29
|
M. Kearns, Y. Mansour, and A. Ng. An information-theoretic analysis of hard and soft assignment methods for clustering. In UAI'97, pages 282--293, 2004.
|
| |
30
|
B. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. The Bell System Technical Journal, 49(2):291--307, 1970.
|
| |
31
|
M. Kirsten and S. Wrobel. Relational distance-based clustering. In Proc. Fachgruppentreffen Maschinelles Lernen (FGML-98), pages 119--124, 1998.
|
| |
32
|
K. Lang. News weeder: Learning to filter netnews. In ICML, 1995.
|
 |
33
|
|
 |
34
|
Bo Long , Xiaoyun Wu , Zhongfei (Mark) Zhang , Philip S. Yu, Unsupervised learning on k-partite graphs, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
[doi> 10.1145/1150402.1150439]
|
 |
35
|
|
| |
36
|
A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems 14, 2001.
|
| |
37
|
|
| |
38
|
|
| |
39
|
N. Rosenberg, J. Pritchard, J. Weber, and H. Cann. Genetic structure of human population. Science, 298, 2002.
|
| |
40
|
J. S. D. Pietra, V. D. Pietera. Duality and auxiliary functions for bregman distances. Technical Report CMU-CS-01-109, Carnegie Mellon University, 2001.
|
| |
41
|
S. Geman and D. Geman. Stochastic relaxation, gibbs distribution, and the bayesian restoration of images. Pattern Analysis and Machine Intelligence, 6:721--742, 1984.
|
| |
42
|
|
| |
43
|
T. Snijders. Markov chain monte carlo estimation of exponential random graph models. Journal of Ssocial Structure, 2002.
|
| |
44
|
|
| |
45
|
B. Taskar, E. Segal, and D. Koller. Probabilistic classification and clustering in relational data. In Proceeding of IJCAI-01, 2001.
|
| |
46
|
|
 |
47
|
Jidong Wang , Huajun Zeng , Zheng Chen , Hongjun Lu , Li Tao , Wei-Ying Ma, ReCoM: reinforcement clustering of multi-type interrelated data objects, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, July 28-August 01, 2003, Toronto, Canada
[doi> 10.1145/860435.860486]
|
| |
48
|
E. Xing, A. Ng, M. Jorda, and S. Russel. Distance metric learning with applications to clustering with side information. In NIPS'03, volume 16, 2003.
|
 |
49
|
|
| |
50
|
|
| |
51
|
|
CITED BY 7
|
|
|
|
|
Lei Tang , Huan Liu , Jianping Zhang , Zohreh Nazeri, Community evolution in dynamic multi-mode networks, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
Huajing Li , Zaiqing Nie , Wang-Chien Lee , Lee Giles , Ji-Rong Wen, Scalable community discovery on textual data with relations, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
Huajing Li , Zaiqing Nie , Wang-Chien Lee , Lee Giles , Ji-Rong Wen, Scalable community discovery on textual data with relations, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
Huajing Li , Zhisheng Li , Wang-Chien Lee , Dik Lun Lee, A probabilistic topic-based ranking framework for location-sensitive domain information retrieval, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|
|
Yu-Ru Lin , Jimeng Sun , Paul Castro , Ravi Konuru , Hari Sundaram , Aisling Kelliher, MetaFac: community discovery via relational hypergraph factorization, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, June 28-July 01, 2009, Paris, France
|
|