|
ABSTRACT
The Scatter/Gather document browsing method uses fast document clustering to produce table-of-contents-like outlines of large document collections. Previous work [1] developed linear-time document clustering algorithms to establish the feasibility of this method over moderately large collections. However, even linear-time algorithms are too slow to support interactive browsing of very large collections such as Tipster, the DARPA standard text retrieval evaluation collection. We present a scheme that supports constant interaction-time Scatter/Gather of arbitrarily large collections after near-linear time preprocessing. This involves the construction of a cluster hierarchy. A modification of Scatter/Gather employing this scheme, and an example of its use over the Tipster collection are presented.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Douglass R. Cutting , David R. Karger , Jan O. Pedersen , John W. Tukey, Scatter/Gather: a cluster-based approach to browsing large document collections, Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, p.318-329, June 21-24, 1992, Copenhagen, Denmark
[doi> 10.1145/133160.133214]
|
| |
2
|
Donna Harman. The TIPSTER evaluation corpus. CDROM disks of computer readable text, 1992. Available from the Linguistic Data Consortium.
|
| |
3
|
G. Salton. The SMART RetmevaI System. Prentice- Hall, Englewood Cliffs, N.J., 1971.
|
| |
4
|
R. Sibson. SLINK: an optimally efficient algorithm for the single link cluster method. Computer Journal, 16:30-34, 1973.
|
| |
5
|
P. Willett. Document clustering using an inverted file approach. Journal of Information Sczence, 2:223-231, 1980.
|
CITED BY 60
|
|
|
|
|
Javed Aslam , Katya Pelekhov , Daniela Rus, A practical clustering algorithm for static and dynamic information organization, Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms, p.51-60, January 17-19, 1999, Baltimore, Maryland, United States
|
|
|
Javed Aslam , Katya Pelekhov , Daniela Rus, Using star clusters for filtering, Proceedings of the ninth international conference on Information and knowledge management, p.306-313, November 06-11, 2000, McLean, Virginia, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lucy Terry Nowell , Robert K. France , Deborah Hix , Lenwood S. Heath , Edward A. Fox, Visualizing search results: some alternatives to query-document similarity, Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, p.67-75, August 18-22, 1996, Zurich, Switzerland
|
|
|
Moses Charikar , Chandra Chekuri , Tomás Feder , Rajeev Motwani, Incremental clustering and dynamic information retrieval, Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, p.626-635, May 04-06, 1997, El Paso, Texas, United States
|
|
|
|
|
|
|
|
|
|
|
|
Peter Pirolli , Patricia Schank , Marti Hearst , Christine Diehl, Scatter/gather browsing communicates the topic structure of a very large text collection, Proceedings of the SIGCHI conference on Human factors in computing systems: common ground, p.213-220, April 13-18, 1996, Vancouver, British Columbia, Canada
|
|
|
Yahiko Kambayashi , Kaoru Katayama , Toshihiro Kakimoto , Hajime Iwamoto, Flexible search functions for multimedia data with text and other auxiliary data, Proceedings of the 1998 ACM symposium on Applied Computing, p.498-504, February 27-March 01, 1998, Atlanta, Georgia, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Anton Leuski , Chin-Yew Lin , Liang Zhou , Ulrich Germann , Franz Josef Och , Eduard Hovy, Cross-lingual C*ST*RD: English access to Hindi information, ACM Transactions on Asian Language Information Processing (TALIP), v.2 n.3, p.245-269, September 2003
|
|
|
|
|
|
Javed Aslam , Katya Pelekhov , Daniela Rus, Static and dynamic information organization with star clusters, Proceedings of the seventh international conference on Information and knowledge management, p.208-217, November 02-07, 1998, Bethesda, Maryland, United States
|
|
|
Ramana Rao , Jan O. Pedersen , Marti A. Hearst , Jock D. Mackinlay , Stuart K. Card , Larry Masinter , Per-Kristian Halvorsen , George C. Robertson, Rich interaction in the digital library, Communications of the ACM, v.38 n.4, p.29-39, April 1995
|
|
|
|
|
|
Hua-Jun Zeng , Qi-Cai He , Zheng Chen , Wei-Ying Ma , Jinwen Ma, Learning to cluster web search results, Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, July 25-29, 2004, Sheffield, United Kingdom
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ron Weiss , Bienvenido Vélez , Mark A. Sheldon, HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering, Proceedings of the the seventh ACM conference on Hypertext, p.180-193, March 16-20, 1996, Bethesda, Maryland, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Soumen Chakrabarti , Byron Dom , Rakesh Agrawal , Prabhakar Raghavan, Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases, Proceedings of the 23rd International Conference on Very Large Data Bases, p.446-455, August 25-29, 1997
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|