ACM Home Page
Please provide us with feedback. Feedback
A probabilistic framework for relational clustering
Full text PdfPdf (1.07 MB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
San Jose, California, USA
SESSION: Research track papers table of contents
Pages: 470 - 479  
Year of Publication: 2007
ISBN:978-1-59593-609-7
Authors
Bo Long  SUNY Binghamton
Zhongfei Mark Zhang  SUNY Binghamton
Philip S. Yu  IBM Watson Research Center
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 23,   Downloads (12 Months): 218,   Citation Count: 7
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1281192.1281244
What is a DOI?

ABSTRACT

Relational clustering has attracted more and more attention due to its phenomenal impact in various important applications which involve multi-type interrelated data objects, such as Web mining, search marketing, bioinformatics, citation analysis, and epidemiology. In this paper, we propose a probabilistic model for relational clustering, which also provides a principal framework to unify various important clustering tasks including traditional attributes-based clustering, semi-supervised clustering, co-clustering and graph clustering. The proposed model seeks to identify cluster structures for each type of data objects and interaction patterns between different types of objects. Under this model, we propose parametric hard and soft relational clustering algorithms under a large number of exponential family distributions. The algorithms are applicable to relational data of various structures and at the same time unifies a number of stat-of-the-art clustering algorithms: co-clustering algorithms, the k-partite graph clustering, Bregman k-means, and semi-supervised clustering based on hidden Markov random fields.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
E. Airoldi, D. Blei, E. Xing, and S. Fienberg. Mixed membership stochastic block models for relational data with application to protein-protein interactions. In ENAR-2006.
2
 
3
4
 
5
I. Bhattachrya and L. Getor. Entity resolution in graph data. Technical Report CS-TR-4758, University of Maryland, 2005.
 
6
T. N. Bui and C. Jones. A heuristic for reducing fill-in in sparse matrix factorization. In PPSC, pages 445--452, 1993.
7
 
8
H. Cho, I. Dhillon, Y. Guan, and S. Sra. Minimum sum squared residue co-clustering of gene expression data. In SDM, 2004.
 
9
M. Collins, S. Dasgupta, and R. Reina. A generalizaion of principal component analysis to the exponential family. In NIPS'01, 2001.
 
10
I. Dhillon, Y. Guan, and B. Kulis. A unified view of kernel k-means, spectral clustering and graph cuts. Technical Report TR-04-25, University of Texas at Austin, 2004.
11
12
 
13
 
14
 
15
E. Erosheva, S. Fienberg, and J. Lafferty. Mixed membership models of scientific publications. In NAS.
 
16
E. Erosheva and S. E. Fienberg. Bayesian mixed membership models for soft clustering and classification. Classification-The Ubiquitous Challenge, pages 11--26, 2005.
 
17
S. E. Fienberg, M. M. Meyer, and S. Wasserman. Satistical analysis of multiple cociometric relations. Journal of American Satistical Association, 80:51--87, 1985.
18
 
19
L. Getoor. An introduction to probabilistic graphical models for relational data. Data Engineering Bulletin, 29, 2006.
20
 
21
P. Hoff, A. Rafery, and M. Handcock. Latent space approaches to social network analysis. Journal of American Satistical Association, 97:1090--1098, 2002.
 
22
T. Hofmann. Probabilistic latent semantic analysis. In Proc. of Uncertainty in Artificial Intelligence, UAI'99, Stockholm, 1999.
 
23
24
25
26
 
27
G. Karypis. A clustering toolkit, 2002.
 
28
 
29
M. Kearns, Y. Mansour, and A. Ng. An information-theoretic analysis of hard and soft assignment methods for clustering. In UAI'97, pages 282--293, 2004.
 
30
B. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. The Bell System Technical Journal, 49(2):291--307, 1970.
 
31
M. Kirsten and S. Wrobel. Relational distance-based clustering. In Proc. Fachgruppentreffen Maschinelles Lernen (FGML-98), pages 119--124, 1998.
 
32
K. Lang. News weeder: Learning to filter netnews. In ICML, 1995.
33
34
35
 
36
A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems 14, 2001.
 
37
 
38
 
39
N. Rosenberg, J. Pritchard, J. Weber, and H. Cann. Genetic structure of human population. Science, 298, 2002.
 
40
J. S. D. Pietra, V. D. Pietera. Duality and auxiliary functions for bregman distances. Technical Report CMU-CS-01-109, Carnegie Mellon University, 2001.
 
41
S. Geman and D. Geman. Stochastic relaxation, gibbs distribution, and the bayesian restoration of images. Pattern Analysis and Machine Intelligence, 6:721--742, 1984.
 
42
 
43
T. Snijders. Markov chain monte carlo estimation of exponential random graph models. Journal of Ssocial Structure, 2002.
 
44
 
45
B. Taskar, E. Segal, and D. Koller. Probabilistic classification and clustering in relational data. In Proceeding of IJCAI-01, 2001.
 
46
47
 
48
E. Xing, A. Ng, M. Jorda, and S. Russel. Distance metric learning with applications to clustering with side information. In NIPS'03, volume 16, 2003.
49
 
50
 
51

CITED BY  7

Collaborative Colleagues:
Bo Long: colleagues
Zhongfei Mark Zhang: colleagues
Philip S. Yu: colleagues