ACM Home Page
Please provide us with feedback. Feedback
Keyword-based document clustering
Full text Publisher SitePublisher Site PdfPdf (146 KB)
Source Annual Meeting of the ACL archive
Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11 table of contents
Sapporo, Japan
Pages: 132 - 137  
Year of Publication: 2003
Author
Seung-Shik Kang  Kookmin University & AITrc, Chungnung-dong, Songbuk-gu, Seoul, Korea
Publisher
Association for Computational Linguistics  Morristown, NJ, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 55,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms  

Tools and Actions: Review this Article  
DOI Bookmark: 10.3115/1118935.1118952

ABSTRACT

Document clustering is an aggregation of related documents to a cluster based on the similarity evaluation task between documents and the representatives of clusters. Terms and their discriminating features of terms are the clue to the clustering and the discriminating features are based on the term and document frequencies. Feature selection method on the basis of frequency statistics has a limitation to the enhancement of the clustering algorithm because it does not consider the contents of the cluster objects. In this paper, we adopt a content-based analytic approach to refine the similarity computation and propose a keyword-based clustering algorithm. Experimental results show that content-based keyword weighting outperforms frequency-based weighting method.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Anderberg, M. R., "Cluster Analysis for Applications", New York: Academic, 1973.
 
2
 
3
Dubes, R., and A. K. Jain, "Clustering Methodologies in Exploratory Data Analysis", Advances in Computers, Vol. 19, pp. 113--227, 1980.
 
4
Frakes, W. B. and R. Baeza-Yates, Information Retrieval, Prentice Hall, 1992.
 
5
Kang, S. S., H. G. Lee, S. H. Son, G. C. Hong, and B. J. Moon, "Term Weighting Method by Postposition and Compound Noun Recognition", Proceedings of 13th Conference on Korean Language Computing, pp. 196--198, 2001.
 
6
Murtagh, F., "Complexities of Hierarchic Clustering Algorithms: State of the Art", Computational Statistics Quarterly, Vol. 1, pp. 101--113, 1984.
 
7
Perry, S. A., and P. Willett, "A Review of the Use of Inverted Files for Best Match Searching in Information Retrieval Systems", Journal of Information Science, Vol. 6, pp. 59--66, 1983.
 
8
Sibson, R. "SLINK: an Optimally Efficient Algorithm for the Single-Link Cluster Method", Computer Journal, Vol. 16, pp. 328--342, 1973.
 
9
Willett, P., "Document Clustering Using an Inverted File Approach", Journal of Information Science, Vol. 2, pp. 223--231, 1980.
 
10