| Keyword-based document clustering |
| Full text |
Publisher Site
,
Pdf
(146 KB)
|
| Source
|
Annual Meeting of the ACL
archive
Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11
table of contents
Sapporo, Japan
Pages: 132 - 137
Year of Publication: 2003
|
|
Author
|
|
Seung-Shik Kang
|
Kookmin University & AITrc, Chungnung-dong, Songbuk-gu, Seoul, Korea
|
|
| Publisher |
Association for Computational Linguistics
Morristown, NJ, USA
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 55, Citation Count: 1
|
|
|
ABSTRACT
Document clustering is an aggregation of related documents to a cluster based on the similarity evaluation task between documents and the representatives of clusters. Terms and their discriminating features of terms are the clue to the clustering and the discriminating features are based on the term and document frequencies. Feature selection method on the basis of frequency statistics has a limitation to the enhancement of the clustering algorithm because it does not consider the contents of the cluster objects. In this paper, we adopt a content-based analytic approach to refine the similarity computation and propose a keyword-based clustering algorithm. Experimental results show that content-based keyword weighting outperforms frequency-based weighting method.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Anderberg, M. R., "Cluster Analysis for Applications", New York: Academic, 1973.
|
| |
2
|
|
| |
3
|
Dubes, R., and A. K. Jain, "Clustering Methodologies in Exploratory Data Analysis", Advances in Computers, Vol. 19, pp. 113--227, 1980.
|
| |
4
|
Frakes, W. B. and R. Baeza-Yates, Information Retrieval, Prentice Hall, 1992.
|
| |
5
|
Kang, S. S., H. G. Lee, S. H. Son, G. C. Hong, and B. J. Moon, "Term Weighting Method by Postposition and Compound Noun Recognition", Proceedings of 13th Conference on Korean Language Computing, pp. 196--198, 2001.
|
| |
6
|
Murtagh, F., "Complexities of Hierarchic Clustering Algorithms: State of the Art", Computational Statistics Quarterly, Vol. 1, pp. 101--113, 1984.
|
| |
7
|
Perry, S. A., and P. Willett, "A Review of the Use of Inverted Files for Best Match Searching in Information Retrieval Systems", Journal of Information Science, Vol. 6, pp. 59--66, 1983.
|
| |
8
|
Sibson, R. "SLINK: an Optimally Efficient Algorithm for the Single-Link Cluster Method", Computer Journal, Vol. 16, pp. 328--342, 1973.
|
| |
9
|
Willett, P., "Document Clustering Using an Inverted File Approach", Journal of Information Science, Vol. 2, pp. 223--231, 1980.
|
| |
10
|
|
|