|
ABSTRACT
In this paper, we discuss the application of a recent hierarchic clustering algorithm to the automatic classification of files of documents. Whereas most hierarchic clustering algorithms involve the generation and updating of an inter-object dissimilarity matrix, this new algorithm is based upon a series of nearest neighbor searches. Such an approach is appropriate to several clustering methods, including Ward's method which has been shown to perform well in experimental studies of hierarchic document clustering. A description is given of heuristics which can increase the efficiency of the new algorithm when it is used to cluster three document collections by Ward's method.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
BENT80
|
|
| |
CROF77
|
Croft, W.B. (1977), "Clustering Large Files of Documents Usln8 the 5i,81 e-L#nk Hethod," Journal of the #merlcan Soetety for Infoneaf.:Lon S#tence, 28, 3" 1-3#q.
|
| |
CROF80
|
Croft, W.B. (1980), "A Hodel of (:luster Searching Using Classsi~1cation," information S st..#,,.#_#, 5, 189-1 95.
|
| |
DAYE84
|
Day, W.B.E and Edelabrunner, H. (198#), "Efftolent Algorithms for #glomerattve Hlerarchioal ClusterlnE Hethoda,e Journal of Classification, 1, 7-2q.
|
| |
DEFA77
|
} De lays, O. (1977), "# F#ft~ lent A1gorlthm for a Ccmplete Link Hethod," Journa______11, 20, 93-95.
|
 |
FRIE77
|
|
| |
GOFF69
|
Goff#en, W. (1969), "# Indlreot Hethod of Znfomation Retrieval," #nformatlon $toral#e and Retr i ev al, q, 363-3 73.
|
| |
GRIF84
|
Grifflths, A. , Robinson, L.A. and WilIett, P. (198JI), "Hierarehle Agglomerative C1 ustering #Mthods for #utomatto #eument Classification ," Journal of Documentation, qO, 175-205.
|
| |
GRIF86
|
|
| |
JARD71a
|
Jard ine, N. and Sib son, R. (1971), Ha%hematlcal Taxonomy New York: Wiley.
|
| |
JARD71b
|
Jardine, tl. and Van RlJsbergen, C.J. (1971), "The Use of Hierarchic Clustering in Information Retrieval," Information #orage and Retrieval, 7, 217-2q0.
|
| |
KITT78
|
Kittler, J. (1978), "A Method for Determining K-Nearest NeIEhbotrs ," K ybernetes, 7, 313-315.
|
| |
LANC67
|
Lance, G.N. and Williams, W.T. (1967), "A General Theory of Classificatory Sorting Strat#les. I. Hierarchical Systems," Jo ur na__#l, 9, 373-3 80.
|
| |
MURT82
|
HurtaKh, F. (1982), "A Very Fast, Exaat Nearest NelEhbour Algorithm for Use in Information Retrieval," Information Technology: Research and Development, I, 275-283.
|
| |
MURT83
|
Murtagh, F. (1983), "A SUrvey of Peoent Advances in Hierarchical Clustering Algorithms," Com#uter Journal, 26, 354-359.
|
| |
MURT84a
|
rlurtagh, F. (198q), "Complexities of Hierarchic Clustering Algorithms: State of the Art," Computational Statistlcs Quarterly, 1, 101-1 13.
|
| |
MURT84b
|
MurtsEh, F. (1984), "A Revlew of Fast Techniques for Nearest Neighbour Searching," in ~(#F3TAT 198q, Vienna: Physloa-Verl#.
|
| |
NORE77
|
Moreault, T., F#I1, H. and MeGill, H.J. (1977), "Autcmatle Ranked Output from Boolean Searches in SIRE," Journal of the #eriean Soelety for Information Science, 28, 333#339.
|
| |
PERR83
|
Perry, S.A. and Willett, P. (1983), "A Review of the Use of Inverted Files for Best Match Retr lev al in In form ation Retr iev al Systems," Journal of Infomatlon Science 6, 59- 66.
|
| |
POGU84
|
Pogue, C.A. and Willett, P. (198q),-An Evaluatt.on of Doeumen# Retrieval from Serial Files Using the ICL I# sir Ib u# ed #ray Processor ," Online Review 8, 569-58q.
|
| |
SALT71
|
Salton, G. (1971), The S1ART Retrieval System, New Jersey: Prentlee-4lall.
|
| |
SALT81
|
|
| |
SALT83
|
|
| |
SIBS73
|
Sibson, R. (1973), "BLINK: an Optimally Efflclent A#gorlthm for the Slngle-bink C#usber Method ," Computer Journal, 1 6, #0-3#.
|
 |
SMEA81
|
|
| |
VANR74
|
Van R#j#beraen, C.J. (1974), "Further Experiments with Hierarehie Clustering in Document Retrieval ," Information Storage and Retrieval, 10, 1-1#.
|
| |
VANR75
|
Van Pdjsbergen, C.J. and Croft, W.B. (1975), "Document Clustering: an Evaluation of some Experiments with the Cran field 1gO0 Collection," Information StorsEe and Retrieval, 11, 171-182.
|
| |
VANR79
|
|
| |
WEIS81
|
|
CITED BY 15
|
|
|
|
|
Douglass R. Cutting , David R. Karger , Jan O. Pedersen , John W. Tukey, Scatter/Gather: a cluster-based approach to browsing large document collections, Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, p.318-329, June 21-24, 1992, Copenhagen, Denmark
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|