| Dynamicity vs. effectiveness: studying online clustering for scatter/gather |
| Full text |
Pdf
(732 KB)
|
Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
table of contents
Boston, MA, USA
SESSION: Classification and clustering
table of contents
Pages 19-26
Year of Publication: 2009
ISBN:978-1-60558-483-6
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 109, Downloads (12 Months): 283, Citation Count: 0
|
|
|
ABSTRACT
We proposed and implemented a novel clustering algorithm called LAIR2, which has constant running time average for on-the-fly Scatter/Gather browsing [4]. Our experiments showed that when running on a single processor, the LAIR2 on-line clustering algorithm was several hundred times faster than a parallel Buckshot algorithm running on multiple processors [11]. This paper reports on a study that examined the effectiveness of the LAIR2 algorithm in terms of clustering quality and its impact on retrieval performance. We conducted a user study on 24 subjects to evaluate on-the-fly LAIR2 clustering in Scatter/Gather search tasks by comparing its performance to the Buckshot algorithm, a classic method for Scatter/Gather browsing [4]. Results showed significant differences in terms of subjective perceptions of clustering quality. Subjects perceived that the LAIR2 algorithm produced significantly better quality clusters than the Buckshot method did. Subjects felt that it took less effort to complete the tasks with the LAIR2 system, which was more effective in helping them in the tasks. Interesting patterns also emerged from subjects' comments in the final open-ended questionnaire. We discuss implications and future research.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Allan. Hard track overview in TREC 2005: High accuracy retrieval from documents. In TREC '05: Proceedings of the Text REtrieval Conference, 2005.
|
| |
2
|
|
 |
3
|
|
 |
4
|
Douglass R. Cutting , David R. Karger , Jan O. Pedersen , John W. Tukey, Scatter/Gather: a cluster-based approach to browsing large document collections, Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, p.318-329, June 21-24, 1992, Copenhagen, Denmark
[doi> 10.1145/133160.133214]
|
 |
5
|
Douglass R. Cutting , David R. Karger , Jan O. Pedersen, Constant interaction-time scatter/gather browsing of very large document collections, Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, p.126-134, June 27-July 01, 1993, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/160688.160706]
|
| |
6
|
J. Han, M. Kamber, and A. L. H. Tung. Spatial Clustering methods in data mining: a survey. New York, 2001.
|
| |
7
|
M. A. Hearst. Modern Information Retrieval, chapter 10, pages 257--324. Addison-Wesley Longman Publishing, 2004.
|
| |
8
|
M. A. Hearst, D. R. Karger, and J. O. Pedersen. Scatter/Gather as a tool for the navigation of retrieval results. In Working Notes AAAI Fall Symp. AI Applications in Knowledge Navigation, 1995.
|
 |
9
|
|
 |
10
|
Eric C. Jensen , Steven M. Beitzel , Angelo J. Pilotto , Nazli Goharian , Ophir Frieder, Parallelizing the buckshot algorithm for efficient document clustering, Proceedings of the eleventh international conference on Information and knowledge management, November 04-09, 2002, McLean, Virginia, USA
[doi> 10.1145/584792.584919]
|
| |
11
|
W. Ke, J. Mostafa, and Y. Liu. Toward responsive visualization services for Scatter/Gather browsing. In ASIS&T '08: Proceedings of the annual meeting of the American Society for Information Science and Technology 2008, 2008.
|
| |
12
|
A. J. Kleiboemer, M. B. Lazear, and J. O. Pedersen. Tailoring a retrieval system for naive users. In Proceedings of the Fifth Annual Symposium on Document Analysis and Information Retrieval (SDAIR), Las Vegas, NV, 1996.
|
 |
13
|
|
| |
14
|
T. Liu, S. Liu, Z. Cheng, and W.-Y. Ma. An evaluation on feature selection for text clustering. In Proceedings of the Twentieth International Conference on Machine Learning (ICML--2003), Washington DC, 2003.
|
| |
15
|
P. Ogilvie and J. P. Callan. Experiments using the lemur toolkit. In Text REtrieval Conference (TREC), 2001.
|
 |
16
|
Peter Pirolli , Patricia Schank , Marti Hearst , Christine Diehl, Scatter/gather browsing communicates the topic structure of a very large text collection, Proceedings of the SIGCHI conference on Human factors in computing systems: common ground, p.213-220, April 13-18, 1996, Vancouver, British Columbia, Canada
[doi> 10.1145/238386.238489]
|
 |
17
|
Nachiketa Sahoo , Jamie Callan , Ramayya Krishnan , George Duncan , Rema Padman, Incremental hierarchical clustering of text documents, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
[doi> 10.1145/1183614.1183667]
|
| |
18
|
M. Steinbach, G. Karypis, and V. Kumar. A comparison of document clustering techniques. In KDD Workshop on Text Mining, 2000.
|
| |
19
|
|
| |
20
|
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.3
Information Search and Retrieval
Subjects:
Clustering
Additional Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.3
Information Search and Retrieval
Subjects:
Clustering;
Search process
General Terms:
Algorithms,
Experimentation,
Human Factors,
Performance
Keywords:
clustering,
effectiveness,
efficiency,
exploratory search,
interactive visualization,
scalability,
scatter/gather,
user study
|