| An effective document clustering method using user-adaptable distance metrics |
| Full text |
Pdf
(479 KB)
|
| Source
|
Symposium on Applied Computing
archive
Proceedings of the 2002 ACM symposium on Applied computing
table of contents
Madrid, Spain
SESSION: A.I. and computational logic
table of contents
Pages: 16 - 20
Year of Publication: 2002
ISBN:1-58113-445-2
|
|
Authors
|
|
Han-joon Kim
|
Seoul National University, San 56-1, Shillim-dong, Gwanak-gu, Seoul, Korea
|
|
Sang-goo Lee
|
Seoul National University, San 56-1, Shillim-dong, Gwanak-gu, Seoul, Korea
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 39, Citation Count: 5
|
|
|
ABSTRACT
Document clustering is inherently an unsupervised learning process that organizes document (or text) data into distinct groups without depending on pre-specified knowledge. However, real-world applications, such as building a topical hierarchy for a large document collection, need to perform clustering under various kinds of constraints. This paper presents a new type of supervised clustering to organize information in a way that reflects knowledge provided by a user. As a means by which external human knowledge can be incorporated into the clustering process, a quadratic form distance metric is employed that contains a weight matrix. Also, we propose a way of representing knowledge to guide the clustering process and a variant of the gradient descent search technique to find a user-specific weight matrix under the hierarchical clustering strategy.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Javed Aslam , Katya Pelekhov , Daniela Rus, Static and dynamic information organization with star clusters, Proceedings of the seventh international conference on Information and knowledge management, p.208-217, November 02-07, 1998, Bethesda, Maryland, United States
[doi> 10.1145/288627.288659]
|
| |
2
|
|
| |
3
|
D. R. Fuhrmann, and M. I. Miller. On the Existence of Positive-Definite Maximum-Likelihood Estimates of Structured Covariance Matrices. IEEE Transaction on Information Theory, 34(4):722-729, 1988.
|
| |
4
|
M. D. Gordon. User-based Document clustering by Redescribing Subject Description with a Genetic Algorithm, Journal of the American Society for Information Science, 42(5):311-322, 1991.
|
| |
5
|
|
 |
6
|
|
| |
7
|
T. Labzour, A. Bensaid, and J. Bezdek. Improved Semi-Supervised Point-Prototype Clustering Algorithms. In Proc. of Int'l Conf. on Fuzzy Systems, pp.1383-1387, 1998.
|
 |
8
|
|
| |
9
|
|
| |
10
|
|
 |
11
|
Mehran Sahami , Salim Yusufali , Michelle Q. W. Baldonaldo, SONIA: a service for organizing networked information autonomously, Proceedings of the third ACM conference on Digital libraries, p.200-209, June 23-26, 1998, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/276675.276697]
|
| |
12
|
|
| |
13
|
|
|