| Bipartite graph partitioning and data clustering |
| Full text |
Pdf
(1.45 MB)
|
| Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the tenth international conference on Information and knowledge management
table of contents
Atlanta, Georgia, USA
Session: Clustering
table of contents
Pages: 25 - 32
Year of Publication: 2001
ISBN:1-58113-436-3
|
|
Authors
|
|
Hongyuan Zha
|
Penn State Univ., State College, PA
|
|
Xiaofeng He
|
Penn State Univ., State College, PA
|
|
Chris Ding
|
Berkeley National Lab., Berkeley, CA
|
|
Horst Simon
|
Berkeley National Lab., Berkeley, CA
|
|
Ming Gu
|
U.C. Berkeley, Berkeley, CA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 44, Downloads (12 Months): 261, Citation Count: 34
|
|
|
ABSTRACT
Many data types arising from data mining applications can be modeled as bipartite graphs, examples include terms and documents in a text corpus, customers and purchasing items in market basket analysis and reviewers and movies in a movie recommender system. In this paper, we propose a new data clustering method based on partitioning the underlying bipartite graph. The partition is constructed by minimizing a normalized sum of edge weights between unmatched pairs of vertices of the bipartite graph. We show that an approximate solution to the minimization problem can be obtained by computing a partial singular value decomposition (SVD) of the associated edge weight matrix of the bipartite graph. We point out the connection of our clustering algorithm to correspondence analysis used in multivariate analysis. We also briefly discuss the issue of assigning data objects to multiple clusters. In the experimental results, we apply our clustering algorithm to the problem of document clustering to illustrate its effectiveness and efficiency.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
J.-P. Benzecri. Correspondence analysis handbook. Marcel Dekker, 1992.
|
| |
3
|
|
| |
4
|
F.R.K. Chung. Spectral Graph Theory. American Mathematical Society, 1997.
|
| |
5
|
|
| |
6
|
B. Everitt. Cluster Analysis. Edward Arnold, 1993.
|
| |
7
|
M. Fiedler. Algebraic connectivity of graphs. Czechoslovak Math J., 23:298-305, 1973.
|
| |
8
|
|
| |
9
|
A. D. Gordon. Classficataon. Second Edition, Chapman and Hall, 2000.
|
| |
10
|
M.J. Greenacre. Correspondence analysis an practice. Academic Press, 1993.
|
| |
11
|
|
 |
12
|
|
| |
13
|
A. McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://wvu.cs.cmu.edu/ mccallum/bow, 1996.
|
| |
14
|
|
 |
15
|
|
| |
16
|
W.N. Venables and B.D. Ripley. Modern Applied Statistics with S-plus. Springer-verlag, 1999.
|
CITED BY 34
|
|
|
|
|
Hui Han , Lee Giles , Hongyuan Zha , Cheng Li , Kostas Tsioutsiouliklis, Two supervised learning approaches for name disambiguation in author citations, Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, June 07-11, 2004, Tuscon, AZ, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bin Gao , Tie-Yan Liu , Xin Zheng , Qian-Sheng Cheng , Wei-Ying Ma, Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
|
|
|
|
|
|
Bin Gao , Tie-Yan Liu , Tao Qin , Xin Zheng , Qian-Sheng Cheng , Wei-Ying Ma, Web image clustering by consistent utilization of visual features and surrounding texts, Proceedings of the 13th annual ACM international conference on Multimedia, November 06-11, 2005, Hilton, Singapore
|
|
|
Bin Gao , Tie-Yan Liu , Guang Feng , Tao Qin , Qian-Sheng Cheng , Wei-Ying Ma, Hierarchical Taxonomy Preparation for Text Categorization Using Consistent Bipartite Spectral Graph Copartitioning, IEEE Transactions on Knowledge and Data Engineering, v.17 n.9, p.1263-1273, September 2005
|
|
|
Chris Ding , Tao Li , Wei Peng , Haesun Park, Orthogonal nonnegative matrix t-factorizations for clustering, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
Bo Long , Zhongfei (Mark) Zhang , Xiaoyun Wú , Philip S. Yu, Spectral clustering for multi-type relational data, Proceedings of the 23rd international conference on Machine learning, p.585-592, June 25-29, 2006, Pittsburgh, Pennsylvania
|
|
|
Bo Long , Xiaoyun Wu , Zhongfei (Mark) Zhang , Philip S. Yu, Unsupervised learning on k-partite graphs, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bo Long , Zhongfei (Mark) Zhang , Xiaoyun Wu , Philip S. Yu, Relational clustering by symmetric convex coding, Proceedings of the 24th international conference on Machine learning, p.569-576, June 20-24, 2007, Corvalis, Oregon
|
|
|
|
|
|
|
|
|
|
|
|
Yang Song , Ziming Zhuang , Huajing Li , Qiankun Zhao , Jia Li , Wang-Chien Lee , C. Lee Giles, Real-time automatic tag recommendation, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, July 20-24, 2008, Singapore, Singapore
|
|
|
|
|
|
|
|
|
|
|
|
Lei Tang , Huan Liu , Jianping Zhang , Zohreh Nazeri, Community evolution in dynamic multi-mode networks, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
|
|
|
|
|
|
Bo Long , Mark Zhang , Philip S. Yu, Graph partitioning based on link distributions, Proceedings of the 22nd national conference on Artificial intelligence, p.578-583, July 22-26, 2007, Vancouver, British Columbia, Canada
|
|
|
|
|
|
Lin Li , Zhenglu Yang , Ling Liu , Masaru Kitsuregawa, Query-URL bipartite based approach to personalized query recommendation, Proceedings of the 23rd national conference on Artificial intelligence, p.1189-1194, July 13-17, 2008, Chicago, Illinois
|
|
|
|
|