|
ABSTRACT
Clustering of very large document databases is useful for both searching and browsing. The periodic updating of clusters is required due to the dynamic nature of databases. An algorithm for incremental clustering is introduced. The complexity and cost analysis of the algorithm together with an investigation of its expected behavior are presented. Through empirical testing it is shown that the algorithm achieves cost effectiveness and generates statistically valid clusters that are compatible with those of reclustering. The experimental evidence shows that the algorithm creates an effective and efficient retrieval environment.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
ANDERBERG, M.R. Clustering Analysis for Applications. Academic Press, New York, 1973.
|
| |
2
|
|
 |
3
|
|
| |
4
|
CAN, F. Validation of clustering structures in information retrieval. In Proceedings of the Canadian Conference on Electrical and Computer Engineering (Montreal, Que., Sept., 1989). EIC, Montreal, Que., 1989, 572-575.
|
| |
5
|
CAN, F. Experiments of incremental clustering. Working Paper 91-002, Dept. of Systems Analysis, Miami Univ., Oxford, Oh., Aug. 1991.
|
| |
6
|
CAN, F. On the efficiency of best-match cluster searches. (Submitted for publication.)
|
| |
7
|
CAN, F., AND DROCHAK II, N.D. Incremental clustering for dynamic document databases. In Proceedtngs of the 1990 Symposium on Applied Computing (Fayetteville, Ark., April 1990). IEEE, Los Alamitos, Calif., 1990, 61 67.
|
 |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
DIALOG. Dialog Database Catalog. Dialog Information Services Inc., 1989.
|
| |
12
|
|
 |
13
|
|
| |
14
|
GRIFFITHS, h., LUCKHURST, C., AND WILLETT, P. Using interdocument similarity information in document retrieval systems. J. Am. Soc. Inf. Sci. 37, i (1986), 3 11.
|
| |
15
|
HALL, J. L. Online Bibliographic Databases: A Directory and Sourcebook, 4th ed. Aslib, Great Britain, 1986.
|
| |
16
|
|
| |
17
|
HODGES, J. L., ANn LEHMANN, E.L. Basic Concepts of Probabd~ty and Statistics. Holden-Day, San Fransisco, Calif., 1964.
|
| |
18
|
|
| |
19
|
J^RDINE, N., ~D V~a'~ RIJSBERGEN, C.J. The use of hierarchical clustering in information retrieval. Inf. Storage RetrLeval 7 (1971), 217-240.
|
| |
20
|
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
 |
24
|
|
 |
25
|
|
| |
26
|
|
| |
27
|
|
 |
28
|
|
| |
29
|
|
 |
30
|
|
 |
31
|
C. T. Yu , Y. T. Wang , C. H. Chen, Adaptive document clustering, Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval, p.197-203, June 05-07, 1985, Montreal, Quebec, Canada
[doi> 10.1145/253495.253525]
|
CITED BY 24
|
|
|
|
|
|
|
|
Moses Charikar , Chandra Chekuri , Tomás Feder , Rajeev Motwani, Incremental clustering and dynamic information retrieval, Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, p.626-635, May 04-06, 1997, El Paso, Texas, United States
|
|
|
Javed Aslam , Katya Pelekhov , Daniela Rus, Using star clusters for filtering, Proceedings of the ninth international conference on Information and knowledge management, p.306-313, November 06-11, 2000, McLean, Virginia, United States
|
|
|
Edward A. Fox , Robert K. France , Eskinder Sahle , Amjad Daoud , Ben E. Cline, Development of a modern OPAC: from REVTOLC to MARIAN, Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, p.248-259, June 27-July 01, 1993, Pittsburgh, Pennsylvania, United States
|
|
|
|
|
|
|
|
|
|
|
|
Javed Aslam , Katya Pelekhov , Daniela Rus, Static and dynamic information organization with star clusters, Proceedings of the seventh international conference on Information and knowledge management, p.208-217, November 02-07, 1998, Bethesda, Maryland, United States
|
|
|
Javed Aslam , Katya Pelekhov , Daniela Rus, A practical clustering algorithm for static and dynamic information organization, Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms, p.51-60, January 17-19, 1999, Baltimore, Maryland, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fazli Can , Seyit Kocberber , Ozgur Baglioglu , Suleyman Kardas , Huseyin Cagdas Ocalan , Erkan Uyar, Bilkent news portal: a personalizable system with new event detection and tracking capabilities, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, July 20-24, 2008, Singapore, Singapore
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
REVIEW
"Caroline Merriam Eastman : Reviewer"
An algorithm that incrementally revises clusters of documents as
additions are made to a database is described. It is based on the use of
the cover coefficient concept to measure similarities among documents.
The terminology used i
more...
|