ACM Home Page
Please provide us with feedback. Feedback
Co-clustering documents and words using bipartite spectral graph partitioning
Full text PdfPdf (538 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
San Francisco, California
Pages: 269 - 274  
Year of Publication: 2001
ISBN:1-58113-391-X
Author
Inderjit S. Dhillon  University of Texas, Austin, TX
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
AAAI : American Association for Artificial Intelligence
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 51,   Downloads (12 Months): 356,   Citation Count: 91
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/502512.502550
What is a DOI?

ABSTRACT

Both document clustering and word clustering are well studied problems. Most existing algorithms cluster documents and words separately but not simultaneously. In this paper we present the novel idea of modeling the document collection as a bipartite graph between documents and words, using which the simultaneous clustering problem can be posed as a bipartite graph partitioning problem. To solve the partitioning problem, we use a new spectral co-clustering algorithm that uses the second left and right singular vectors of an appropriately scaled word-document matrix to yield good bipartitionings. The spectral algorithm enjoys some optimality properties; it can be shown that the singular vectors solve a real relaxation to the NP-complete graph bipartitioning problem. We present experimental results to verify that the resulting co-clustering algorithm works well in practice.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
D. Boley. Hierarchical taxonomies using divisive partitioning. Technical Report TR-98-012, University of Minnesota, 1998.
 
3
D. Boley, M. Gini, R. Gross, E.-H. Has, K. Hastings, G. Karypis, V. Kumar, B. Mobasher, and J. Moore. Document categorization and query generation on the World Wide Web using WebACE. AI Review, 1998.
4
5
 
6
I. S. Dhillon, J. Fan, and Y. Guan. Efficient clustering of very large document collections. In V. K. R. Grossman, C. Kamath and R. Namburu, editors, Data Mining for Scientific and Engineering Applications. Kluwer Academic Publishers, 2001.
 
7
 
8
W. E. Donath and A. J. Hoffman. Lower bounds for the partitioning of graphs. IBM Journal of Research and Development, 17:420-425, 1973.
 
9
 
10
C. M. Fiduccia and R. M. Mattheyses. A linear time heuristic for improving network partitions. Technical Report 82CRD130, GE Corporate Research, 1982.
 
11
M. Fiedler. Algebraic connectivity of graphs. Czecheslovak Mathematical Journal, 23:298-305, 1973.
 
12
 
13
G. H. Golub and C. F. V. Loan. Matrix computations. Johns Hopkins University Press, 3rd edition, 1996.
 
14
L. Hagen and A. B. Kahng. New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on CAD, 11:1074-1085, 1992.
 
15
K. M. Hall. An r-dimensional quadratic placement algorithm. Management Science, 11(3):219-229, 1970.
 
16
R. V. Katter. Study of document representations: Multidimensional scaling of indexing terms. System Development Corporation, Santa Monica, CA, 1967.
 
17
B. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. The Bell System Technical Journal, 29(2):291-307, 1970.
 
18
 
19
 
20
21
 
22
 
23
A. Strehl, J. Ghosh, and R. Mooney. Impact of similarity measures on web-page clustering. In AAAI 2000 Workshop on AI for Web Search, 2000.
 
24
 
25

CITED BY  91

Collaborative Colleagues:
Inderjit S. Dhillon: colleagues