| Document clustering by concept factorization |
| Full text |
Pdf
(220 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Sheffield, United Kingdom
SESSION: Clustering
table of contents
Pages: 202 - 209
Year of Publication: 2004
ISBN:1-58113-881-4
|
|
Authors
|
|
Wei Xu
|
NEC Laboratories America, Inc., Cupertino, CA
|
|
Yihong Gong
|
NEC Laboratories America, Inc., Cupertino, CA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 17, Downloads (12 Months): 131, Citation Count: 14
|
|
|
ABSTRACT
In this paper, we propose a new data clustering method called concept factorization that models each concept as a linear combination of the data points, and each data point as a linear combination of the concepts. With this model, the data clustering task is accomplished by computing the two sets of linear coefficients, and this linear coefficients computation is carried out by finding the non-negative solution that minimizes the reconstruction error of the data points. The cluster label of each data point can be easily derived from the obtained linear coefficients. This method differs from the method of clustering based on non-negative matrix factorization (NMF) \citeXu03 in that it can be applied to data containing negative values and the method can be implemented in the kernel space. Our experimental results show that the proposed data clustering method and its variations performs best among 11 algorithms and their variations that we have evaluated on both TDT2 and Reuters-21578 corpus. In addition to its good performance, the new method also has the merit in its easy and reliable derivation of the clustering results.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
P. K. Chan, D. F. Schlag, and J. Y. Zien. Spectral k-way ratio-cut partitioning an clustering. IEEE Trans. Computer-Aided Design, 13:1088--1096, Sep. 1994.
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
 |
6
|
|
| |
7
|
D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788--791, 1999.
|
 |
8
|
|
| |
9
|
K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda, and B. Scholkopf. An introduction to kernel-based learning algorithms. IEEE Trans. Neural Networks, 12:181--202, 3 2001.
|
| |
10
|
A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems, volume 14, 2002.
|
| |
11
|
F. Sha, L. K. Saul, and D. D. Lee. Multiplicative updates for nonnegative quadratic programming in support vector machines. In Advances in Neural Information Processing Systems, volume 14, 2002.
|
| |
12
|
|
 |
13
|
|
| |
14
|
H. Zha, C. Ding, M. Gu, X. He, and H. Simon. Spectral relaxation for k-means clustering. In Advances in Neural Information Processing Systems, volume 14, 2002.
|
 |
15
|
Hongyuan Zha , Xiaofeng He , Chris Ding , Horst Simon , Ming Gu, Bipartite graph partitioning and data clustering, Proceedings of the tenth international conference on Information and knowledge management, October 05-10, 2001, Atlanta, Georgia, USA
[doi> 10.1145/502585.502591]
|
| |
16
|
J. Y. Zien, M. D. F. Schlag, and P. K. Chan. Multilevel spectral hypergraph partitioning with artibary vertex sizes. IEEE Trans. Computer-Aided Design, 18:1389--1399, Sep. 1999.
|
CITED BY 14
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yun Chi , Shenghuo Zhu , Xiaodan Song , Junichi Tatemura , Belle L. Tseng, Structural and temporal analysis of the blogosphere through community factorization, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bin Cao , Dou Shen , Jian-Tao Sun , Xuanhui Wang , Qiang Yang , Zheng Chen, Detect and track latent factors with online nonnegative matrix factorization, Proceedings of the 20th international joint conference on Artifical intelligence, p.2689-2694, January 06-12, 2007, Hyderabad, India
|
|
|
|
|