ACM Home Page
Please provide us with feedback. Feedback
An effective document clustering method using user-adaptable distance metrics
Full text PdfPdf (479 KB)
Source Symposium on Applied Computing archive
Proceedings of the 2002 ACM symposium on Applied computing table of contents
Madrid, Spain
SESSION: A.I. and computational logic table of contents
Pages: 16 - 20  
Year of Publication: 2002
ISBN:1-58113-445-2
Authors
Han-joon Kim  Seoul National University, San 56-1, Shillim-dong, Gwanak-gu, Seoul, Korea
Sang-goo Lee  Seoul National University, San 56-1, Shillim-dong, Gwanak-gu, Seoul, Korea
Sponsor
SIGAPP: ACM Special Interest Group on Applied Computing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 40,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/508791.508796
What is a DOI?

ABSTRACT

Document clustering is inherently an unsupervised learning process that organizes document (or text) data into distinct groups without depending on pre-specified knowledge. However, real-world applications, such as building a topical hierarchy for a large document collection, need to perform clustering under various kinds of constraints. This paper presents a new type of supervised clustering to organize information in a way that reflects knowledge provided by a user. As a means by which external human knowledge can be incorporated into the clustering process, a quadratic form distance metric is employed that contains a weight matrix. Also, we propose a way of representing knowledge to guide the clustering process and a variant of the gradient descent search technique to find a user-specific weight matrix under the hierarchical clustering strategy.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
D. R. Fuhrmann, and M. I. Miller. On the Existence of Positive-Definite Maximum-Likelihood Estimates of Structured Covariance Matrices. IEEE Transaction on Information Theory, 34(4):722-729, 1988.
 
4
M. D. Gordon. User-based Document clustering by Redescribing Subject Description with a Genetic Algorithm, Journal of the American Society for Information Science, 42(5):311-322, 1991.
 
5
6
 
7
T. Labzour, A. Bensaid, and J. Bezdek. Improved Semi-Supervised Point-Prototype Clustering Algorithms. In Proc. of Int'l Conf. on Fuzzy Systems, pp.1383-1387, 1998.
8
 
9
 
10
11
 
12
 
13


Collaborative Colleagues:
Han-joon Kim: colleagues
Sang-goo Lee: colleagues