ACM Home Page
Please provide us with feedback. Feedback
Constant interaction-time scatter/gather browsing of very large document collections
Full text PdfPdf (799 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Pittsburgh, Pennsylvania, United States
Pages: 126 - 134  
Year of Publication: 1993
ISBN:0-89791-605-0
Authors
Sponsor
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 67,   Citation Count: 60
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/160688.160706
What is a DOI?

ABSTRACT

The Scatter/Gather document browsing method uses fast document clustering to produce table-of-contents-like outlines of large document collections. Previous work [1] developed linear-time document clustering algorithms to establish the feasibility of this method over moderately large collections. However, even linear-time algorithms are too slow to support interactive browsing of very large collections such as Tipster, the DARPA standard text retrieval evaluation collection. We present a scheme that supports constant interaction-time Scatter/Gather of arbitrarily large collections after near-linear time preprocessing. This involves the construction of a cluster hierarchy. A modification of Scatter/Gather employing this scheme, and an example of its use over the Tipster collection are presented.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
Donna Harman. The TIPSTER evaluation corpus. CDROM disks of computer readable text, 1992. Available from the Linguistic Data Consortium.
 
3
G. Salton. The SMART RetmevaI System. Prentice- Hall, Englewood Cliffs, N.J., 1971.
 
4
R. Sibson. SLINK: an optimally efficient algorithm for the single link cluster method. Computer Journal, 16:30-34, 1973.
 
5
P. Willett. Document clustering using an inverted file approach. Journal of Information Sczence, 2:223-231, 1980.

CITED BY  60

Collaborative Colleagues:
Douglass R. Cutting: colleagues
David R. Karger: colleagues
Jan O. Pedersen: colleagues