ACM Home Page
Please provide us with feedback. Feedback
Hierarchic document classification using Ward's clustering method
Full text PdfPdf (974 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Palazzo dei Congressi, Pisa, Italy
Pages: 149 - 156  
Year of Publication: 1986
ISBN:0-89791-187-3
Authors
A. El-Hamdouchi  Sheffield University, Western Bank, Sheffield, S10 2TN, UK
P. Willett  Sheffield University, Western Bank, Sheffield, S10 2TN, UK
Sponsor
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 18,   Downloads (12 Months): 127,   Citation Count: 15
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/253168.253200
What is a DOI?

ABSTRACT

In this paper, we discuss the application of a recent hierarchic clustering algorithm to the automatic classification of files of documents. Whereas most hierarchic clustering algorithms involve the generation and updating of an inter-object dissimilarity matrix, this new algorithm is based upon a series of nearest neighbor searches. Such an approach is appropriate to several clustering methods, including Ward's method which has been shown to perform well in experimental studies of hierarchic document clustering. A description is given of heuristics which can increase the efficiency of the new algorithm when it is used to cluster three document collections by Ward's method.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

BENT80
 
CROF77
Croft, W.B. (1977), "Clustering Large Files of Documents Usln8 the 5i,81 e-L#nk Hethod," Journal of the #merlcan Soetety for Infoneaf.:Lon S#tence, 28, 3" 1-3#q.
 
CROF80
Croft, W.B. (1980), "A Hodel of (:luster Searching Using Classsi~1cation," information S st..#,,.#_#, 5, 189-1 95.
 
DAYE84
Day, W.B.E and Edelabrunner, H. (198#), "Efftolent Algorithms for #glomerattve Hlerarchioal ClusterlnE Hethoda,e Journal of Classification, 1, 7-2q.
 
DEFA77
} De lays, O. (1977), "# F#ft~ lent A1gorlthm for a Ccmplete Link Hethod," Journa______11, 20, 93-95.
FRIE77
 
GOFF69
Goff#en, W. (1969), "# Indlreot Hethod of Znfomation Retrieval," #nformatlon $toral#e and Retr i ev al, q, 363-3 73.
 
GRIF84
Grifflths, A. , Robinson, L.A. and WilIett, P. (198JI), "Hierarehle Agglomerative C1 ustering #Mthods for #utomatto #eument Classification ," Journal of Documentation, qO, 175-205.
 
GRIF86
 
JARD71a
Jard ine, N. and Sib son, R. (1971), Ha%hematlcal Taxonomy New York: Wiley.
 
JARD71b
Jardine, tl. and Van RlJsbergen, C.J. (1971), "The Use of Hierarchic Clustering in Information Retrieval," Information #orage and Retrieval, 7, 217-2q0.
 
KITT78
Kittler, J. (1978), "A Method for Determining K-Nearest NeIEhbotrs ," K ybernetes, 7, 313-315.
 
LANC67
Lance, G.N. and Williams, W.T. (1967), "A General Theory of Classificatory Sorting Strat#les. I. Hierarchical Systems," Jo ur na__#l, 9, 373-3 80.
 
MURT82
HurtaKh, F. (1982), "A Very Fast, Exaat Nearest NelEhbour Algorithm for Use in Information Retrieval," Information Technology: Research and Development, I, 275-283.
 
MURT83
Murtagh, F. (1983), "A SUrvey of Peoent Advances in Hierarchical Clustering Algorithms," Com#uter Journal, 26, 354-359.
 
MURT84a
rlurtagh, F. (198q), "Complexities of Hierarchic Clustering Algorithms: State of the Art," Computational Statistlcs Quarterly, 1, 101-1 13.
 
MURT84b
MurtsEh, F. (1984), "A Revlew of Fast Techniques for Nearest Neighbour Searching," in ~(#F3TAT 198q, Vienna: Physloa-Verl#.
 
NORE77
Moreault, T., F#I1, H. and MeGill, H.J. (1977), "Autcmatle Ranked Output from Boolean Searches in SIRE," Journal of the #eriean Soelety for Information Science, 28, 333#339.
 
PERR83
Perry, S.A. and Willett, P. (1983), "A Review of the Use of Inverted Files for Best Match Retr lev al in In form ation Retr iev al Systems," Journal of Infomatlon Science 6, 59- 66.
 
POGU84
Pogue, C.A. and Willett, P. (198q),-An Evaluatt.on of Doeumen# Retrieval from Serial Files Using the ICL I# sir Ib u# ed #ray Processor ," Online Review 8, 569-58q.
 
SALT71
Salton, G. (1971), The S1ART Retrieval System, New Jersey: Prentlee-4lall.
 
SALT81
 
SALT83
 
SIBS73
Sibson, R. (1973), "BLINK: an Optimally Efflclent A#gorlthm for the Slngle-bink C#usber Method ," Computer Journal, 1 6, #0-3#.
SMEA81
 
VANR74
Van R#j#beraen, C.J. (1974), "Further Experiments with Hierarehie Clustering in Document Retrieval ," Information Storage and Retrieval, 10, 1-1#.
 
VANR75
Van Pdjsbergen, C.J. and Croft, W.B. (1975), "Document Clustering: an Evaluation of some Experiments with the Cran field 1gO0 Collection," Information StorsEe and Retrieval, 11, 171-182.
 
VANR79
 
WEIS81

CITED BY  15
Collaborative Colleagues:
A. El-Hamdouchi: colleagues
P. Willett: colleagues