|
ABSTRACT
Both document clustering and word clustering are well studied problems. Most existing algorithms cluster documents and words separately but not simultaneously. In this paper we present the novel idea of modeling the document collection as a bipartite graph between documents and words, using which the simultaneous clustering problem can be posed as a bipartite graph partitioning problem. To solve the partitioning problem, we use a new spectral co-clustering algorithm that uses the second left and right singular vectors of an appropriately scaled word-document matrix to yield good bipartitionings. The spectral algorithm enjoys some optimality properties; it can be shown that the singular vectors solve a real relaxation to the NP-complete graph bipartitioning problem. We present experimental results to verify that the resulting co-clustering algorithm works well in practice.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
D. Boley. Hierarchical taxonomies using divisive partitioning. Technical Report TR-98-012, University of Minnesota, 1998.
|
| |
3
|
D. Boley, M. Gini, R. Gross, E.-H. Has, K. Hastings, G. Karypis, V. Kumar, B. Mobasher, and J. Moore. Document categorization and query generation on the World Wide Web using WebACE. AI Review, 1998.
|
 |
4
|
|
 |
5
|
Douglass R. Cutting , David R. Karger , Jan O. Pedersen , John W. Tukey, Scatter/Gather: a cluster-based approach to browsing large document collections, Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, p.318-329, June 21-24, 1992, Copenhagen, Denmark
[doi> 10.1145/133160.133214]
|
| |
6
|
I. S. Dhillon, J. Fan, and Y. Guan. Efficient clustering of very large document collections. In V. K. R. Grossman, C. Kamath and R. Namburu, editors, Data Mining for Scientific and Engineering Applications. Kluwer Academic Publishers, 2001.
|
| |
7
|
|
| |
8
|
W. E. Donath and A. J. Hoffman. Lower bounds for the partitioning of graphs. IBM Journal of Research and Development, 17:420-425, 1973.
|
| |
9
|
|
| |
10
|
C. M. Fiduccia and R. M. Mattheyses. A linear time heuristic for improving network partitions. Technical Report 82CRD130, GE Corporate Research, 1982.
|
| |
11
|
M. Fiedler. Algebraic connectivity of graphs. Czecheslovak Mathematical Journal, 23:298-305, 1973.
|
| |
12
|
|
| |
13
|
G. H. Golub and C. F. V. Loan. Matrix computations. Johns Hopkins University Press, 3rd edition, 1996.
|
| |
14
|
L. Hagen and A. B. Kahng. New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on CAD, 11:1074-1085, 1992.
|
| |
15
|
K. M. Hall. An r-dimensional quadratic placement algorithm. Management Science, 11(3):219-229, 1970.
|
| |
16
|
R. V. Katter. Study of document representations: Multidimensional scaling of indexing terms. System Development Corporation, Santa Monica, CA, 1967.
|
| |
17
|
B. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. The Bell System Technical Journal, 29(2):291-307, 1970.
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
 |
21
|
|
| |
22
|
|
| |
23
|
A. Strehl, J. Ghosh, and R. Mooney. Impact of similarity measures on web-page clustering. In AAAI 2000 Workshop on AI for Web Search, 2000.
|
| |
24
|
|
| |
25
|
|
CITED BY 91
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bin Gao , Tie-Yan Liu , Xin Zheng , Qian-Sheng Cheng , Wei-Ying Ma, Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bin Gao , Tie-Yan Liu , Tao Qin , Xin Zheng , Qian-Sheng Cheng , Wei-Ying Ma, Web image clustering by consistent utilization of visual features and surrounding texts, Proceedings of the 13th annual ACM international conference on Multimedia, November 06-11, 2005, Hilton, Singapore
|
|
|
|
|
|
Bin Gao , Tie-Yan Liu , Guang Feng , Tao Qin , Qian-Sheng Cheng , Wei-Ying Ma, Hierarchical Taxonomy Preparation for Text Categorization Using Consistent Bipartite Spectral Graph Copartitioning, IEEE Transactions on Knowledge and Data Engineering, v.17 n.9, p.1263-1273, September 2005
|
|
|
Chris Ding , Tao Li , Wei Peng , Haesun Park, Orthogonal nonnegative matrix t-factorizations for clustering, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
|
|
|
Lu Liu , Lifeng Sun , Yong Rui , Yao Shi , Shiqiang Yang, Web video topic discovery and tracking via bipartite graph reinforcement model, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
|
|
|
|
|
|
Tao Qin , Tie-Yan Liu , Xu-Dong Zhang , De-Sheng Wang , Wen-Ying Xiong , Hang Li, Learning to rank relational objects and its application to web search, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bo Long , Zhongfei (Mark) Zhang , Xiaoyun Wú , Philip S. Yu, Spectral clustering for multi-type relational data, Proceedings of the 23rd international conference on Machine learning, p.585-592, June 25-29, 2006, Pittsburgh, Pennsylvania
|
|
|
Alice X. Zheng , Michael I. Jordan , Ben Liblit , Mayur Naik , Alex Aiken, Statistical debugging: simultaneous identification of multiple bugs, Proceedings of the 23rd international conference on Machine learning, p.1105-1112, June 25-29, 2006, Pittsburgh, Pennsylvania
|
|
|
|
|
|
Bo Long , Xiaoyun Wu , Zhongfei (Mark) Zhang , Philip S. Yu, Unsupervised learning on k-partite graphs, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
|
|
|
|
|
|
Zheng-Yu Niu , Dong-Hong Ji , Chew Lim Tan, A semi-supervised feature clustering algorithm with application to word sense disambiguation, Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, p.907-914, October 06-08, 2005, Vancouver, British Columbia, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bo Long , Zhongfei (Mark) Zhang , Xiaoyun Wu , Philip S. Yu, Relational clustering by symmetric convex coding, Proceedings of the 24th international conference on Machine learning, p.569-576, June 20-24, 2007, Corvalis, Oregon
|
|
|
|
|
|
Carlotta Domeniconi , Dimitrios Gunopulos , Sheng Ma , Bojun Yan , Muna Al-Razgan , Dimitris Papadopoulos, Locally adaptive metrics for clustering high dimensional data, Data Mining and Knowledge Discovery, v.14 n.1, p.63-97, February 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yang Song , Ziming Zhuang , Huajing Li , Qiankun Zhao , Jia Li , Wang-Chien Lee , C. Lee Giles, Real-time automatic tag recommendation, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, July 20-24, 2008, Singapore, Singapore
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lei Tang , Huan Liu , Jianping Zhang , Zohreh Nazeri, Community evolution in dynamic multi-mode networks, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bo Long , Mark Zhang , Philip S. Yu, Graph partitioning based on link distributions, Proceedings of the 22nd national conference on Artificial intelligence, p.578-583, July 22-26, 2007, Vancouver, British Columbia, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Haofen Wang , Yan Liang , Linyun Fu , Gui-Rong Xue , Yong Yu, Efficient query expansion for advertisement search, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|
|
Bo Long , Mark Zhang , Philip S. Yu , Tianbing Xu, Clustering on complex graphs, Proceedings of the 23rd national conference on Artificial intelligence, p.659-664, July 13-17, 2008, Chicago, Illinois
|
|
|
|
|
|
|
|
|
|
|