ACM Home Page
Please provide us with feedback. Feedback
Bipartite graph partitioning and data clustering
Full text PdfPdf (1.45 MB)
Source Conference on Information and Knowledge Management archive
Proceedings of the tenth international conference on Information and knowledge management table of contents
Atlanta, Georgia, USA
Session: Clustering table of contents
Pages: 25 - 32  
Year of Publication: 2001
ISBN:1-58113-436-3
Authors
Hongyuan Zha  Penn State Univ., State College, PA
Xiaofeng He  Penn State Univ., State College, PA
Chris Ding  Berkeley National Lab., Berkeley, CA
Horst Simon  Berkeley National Lab., Berkeley, CA
Ming Gu  U.C. Berkeley, Berkeley, CA
Sponsors
SIGMIS: ACM Special Interest Group on Management Information Systems
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 44,   Downloads (12 Months): 261,   Citation Count: 34
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/502585.502591
What is a DOI?

ABSTRACT

Many data types arising from data mining applications can be modeled as bipartite graphs, examples include terms and documents in a text corpus, customers and purchasing items in market basket analysis and reviewers and movies in a movie recommender system. In this paper, we propose a new data clustering method based on partitioning the underlying bipartite graph. The partition is constructed by minimizing a normalized sum of edge weights between unmatched pairs of vertices of the bipartite graph. We show that an approximate solution to the minimization problem can be obtained by computing a partial singular value decomposition (SVD) of the associated edge weight matrix of the bipartite graph. We point out the connection of our clustering algorithm to correspondence analysis used in multivariate analysis. We also briefly discuss the issue of assigning data objects to multiple clusters. In the experimental results, we apply our clustering algorithm to the problem of document clustering to illustrate its effectiveness and efficiency.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
J.-P. Benzecri. Correspondence analysis handbook. Marcel Dekker, 1992.
 
3
 
4
F.R.K. Chung. Spectral Graph Theory. American Mathematical Society, 1997.
 
5
 
6
B. Everitt. Cluster Analysis. Edward Arnold, 1993.
 
7
M. Fiedler. Algebraic connectivity of graphs. Czechoslovak Math J., 23:298-305, 1973.
 
8
 
9
A. D. Gordon. Classficataon. Second Edition, Chapman and Hall, 2000.
 
10
M.J. Greenacre. Correspondence analysis an practice. Academic Press, 1993.
 
11
12
 
13
A. McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://wvu.cs.cmu.edu/ mccallum/bow, 1996.
 
14
15
 
16
W.N. Venables and B.D. Ripley. Modern Applied Statistics with S-plus. Springer-verlag, 1999.

CITED BY  34

Collaborative Colleagues:
Hongyuan Zha: colleagues
Xiaofeng He: colleagues
Chris Ding: colleagues
Horst Simon: colleagues
Ming Gu: colleagues