| Document clustering based on non-negative matrix factorization |
| Full text |
Pdf
(216 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
table of contents
Toronto, Canada
SESSION: Clustering
table of contents
Pages: 267 - 273
Year of Publication: 2003
ISBN:1-58113-646-3
|
|
Authors
|
|
Wei Xu
|
NEC Laboratories America, Inc., Cupertino, CA
|
|
Xin Liu
|
NEC Laboratories America, Inc., Cupertino, CA
|
|
Yihong Gong
|
NEC Laboratories America, Inc., Cupertino, CA
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 53, Downloads (12 Months): 530, Citation Count: 61
|
|
|
ABSTRACT
In this paper, we propose a novel document clustering method based on the non-negative factorization of the term-document matrix of the given document corpus. In the latent semantic space derived by the non-negative matrix factorization (NMF), each axis captures the base topic of a particular document cluster, and each document is represented as an additive combination of the base topics. The cluster membership of each document can be easily determined by finding the base topic (the axis) with which the document has the largest projection value. Our experimental evaluations show that the proposed document clustering method surpasses the latent semantic indexing and the spectral clustering methods not only in the easy and reliable derivation of document clustering results, but also in document clustering accuracies.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
P. K. Chan, D. F. Schlag, and J. Y. Zien. Spectral k-way ratio-cut partitioning an clustering. IEEE Trans. Computer-Aided Design, 13:1088--1096, Sep. 1994.
|
 |
3
|
Douglass R. Cutting , David R. Karger , Jan O. Pedersen , John W. Tukey, Scatter/Gather: a cluster-based approach to browsing large document collections, Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, p.318-329, June 21-24, 1992, Copenhagen, Denmark
[doi> 10.1145/133160.133214]
|
| |
4
|
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.
|
| |
5
|
|
| |
6
|
P. O. Hoyer. Non-negative sparse coding. In Proc. IEEE Workshop on Neural Networks for Signal Processing, Martigny, Switzerland, 2002.
|
| |
7
|
D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788--791, 1999.
|
| |
8
|
D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems, volume 13, pages 556--562, 2001.
|
 |
9
|
|
| |
10
|
L. Lovasz and M. Plummer. Matching Theory. Akadémiai Kiadó, North Holland, Budapest, 1986.
|
| |
11
|
|
| |
12
|
P. Willett. Document clustering using an inverted file approach. Journal of Information Science, 2:223--231, 1990.
|
| |
13
|
H. Zha, C. Ding, M. Gu, X. He, and H. Simon. Spectral relaxation for k-means clustering. In Advances in Neural Information Processing Systems, volume 14, 2002.
|
CITED BY 61
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Chris Ding , Tao Li , Wei Peng , Haesun Park, Orthogonal nonnegative matrix t-factorizations for clustering, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
Xuanhui Wang , Jian-Tao Sun , Zheng Chen , ChengXiang Zhai, Latent semantic analysis for multiple-type interrelated data objects, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yun Chi , Shenghuo Zhu , Xiaodan Song , Junichi Tatemura , Belle L. Tseng, Structural and temporal analysis of the blogosphere through community factorization, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
Dominik Schnitzer , Tim Pohle , Peter Knees , Gerhard Widmer, One-touch access to music on mobile devices, Proceedings of the 6th international conference on Mobile and ubiquitous multimedia, p.103-109, December 12-14, 2007, Oulu, Finland
|
|
|
|
|
|
|
|
|
Deng Cai , Qiaozhu Mei , Jiawei Han , Chengxiang Zhai, Modeling hidden topics on document manifold, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ka Cheung Sia , Junghoo Cho , Yun Chi , Belle L. Tseng, Efficient computation of personal aggregate queries on blogs, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
Dingding Wang , Shenghuo Zhu , Tao Li , Yun Chi , Yihong Gong, Integrating clustering and multi-document summarization to improve document understanding, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
Deng Cai , Xuanhui Wang , Xiaofei He, Probabilistic dyadic data analysis with local and global consistency, Proceedings of the 26th Annual International Conference on Machine Learning, p.105-112, June 14-18, 2009, Montreal, Quebec, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Chris Ding , Tao Li , Wei Peng, Nonnegative matrix factorization and probabilistic latent semantic indexing: equivalence, chi-square statistic, and a hybrid method, Proceedings of the 21st national conference on Artificial intelligence, p.342-347, July 16-20, 2006, Boston, Massachusetts
|
|
|
Zhen Guo , Shenghuo Zhu , Yun Chi , Zhongfei Zhang , Yihong Gong, A latent topic model for linked documents, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kenji Hosoda , Masataka Watanabe , Heiko Wersing , Edgar Körner , Hiroshi Tsujino , Hiroshi Tamura , Ichiro Fujita, A model for learning topographically organized parts-based representations of objects in visual cortex: Topographic nonnegative matrix factorization, Neural Computation, v.21 n.9, p.2605-2633, September 2009
|
|