ACM Home Page
Please provide us with feedback. Feedback
Scatter/Gather: a cluster-based approach to browsing large document collections
Full text PdfPdf (1.08 MB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Copenhagen, Denmark
Pages: 318 - 329  
Year of Publication: 1992
ISBN:0-89791-523-2
Authors
Douglass R. Cutting  Xerox Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, CA
David R. Karger  Xerox Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, CA and Stanford University
Jan O. Pedersen  Xerox Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, CA
John W. Tukey  Xerox Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, CA and Princeton University
Sponsors
Royal School of Lib. : Royal School of Lib.
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 75,   Downloads (12 Months): 482,   Citation Count: 242
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/133160.133214
What is a DOI?

ABSTRACT

Document clustering has not been well received as an information retrieval tool. Objections to its use fall into two main categories: first, that clustering is too slow for large corpora (with running time often quadratic in the number of documents); and second, that clustering does not appreciably improve retrieval. We argue that these problems arise only when clustering is used in an attempt to improve conventional search techniques. However, looking at clustering as an information access tool in its own right obviates these objections, and provides a powerful new access paradigm. We present a document browsing technique that employs document clustering as its primary operation. We also present fast (linear time) clustering algorithms which support this interactive browsing paradigm.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
W.B. Croft. Clustering large files of documents using the single-link method. Journal of the Amemcan Soczety for Informatzon Science, 28:341-344, 1977.
3
 
4
A. Grifiiths, H.C. Luckhurst, and P. Willett. Using inter-document similarity information in document retrieval systems. Journal of the American Society for Information Sczence, 37:3-11, 1986.
 
5
 
6
N. aardine and C.J. van Rijsbergen. The use of hierarchical clustering in information retrieval. Informatzon Storage and Retrzeval, 7:217-240, 1971.
 
7
O. Pedersen, D. R. Cutting, and a. w. Tukey. Snippet search: a single phrase approach to text access. In Proceedings of the 1991 Yoznt Statistical Meetings. American Statistical Association, 1991. Also available as Xerox PARC technical report SSL- 91-08.
 
8
G. Salton. The SMART Retmeval System. Prentice- Hall, Englewood Cliffs, N.J., 1971.
 
9
 
10
R. Sibson. SLINK: an optimally efficient algorithm for the single link cluster method. Computer Journal, 16:30-34, 1973.
 
11
 
12
C.j. van Rijsbergen and W.B. Croft. Document clustering: An evaluation of some experiments with the Cranfield 1400 collection. Information Processing Management, 11:171-182, 1975.
 
13
P. Willett. Document clustering using an inverted file approach. Journal of Informatzon Sczence, 2:223- 231, 1980.
 
14
P. Willett. A fast procedure for the calculation of similarity coefficients in automatic classification. Informatzon Processzng ~ Management, 17:53-60, 1981.
 
15

CITED BY  242

Collaborative Colleagues:
Douglass R. Cutting: colleagues
David R. Karger: colleagues
Jan O. Pedersen: colleagues
John W. Tukey: colleagues