ACM Home Page
Please provide us with feedback. Feedback
Dynamicity vs. effectiveness: studying online clustering for scatter/gather
Full text PdfPdf (732 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval table of contents
Boston, MA, USA
SESSION: Classification and clustering table of contents
Pages 19-26  
Year of Publication: 2009
ISBN:978-1-60558-483-6
Authors
Weimao Ke  University of North Carolina, Chapel Hill, NC, USA
Cassidy R. Sugimoto  University of North Carolina, Chapel Hill, NC, USA
Javed Mostafa  University of North Carolina, Chapel Hill, NC, USA
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 109,   Downloads (12 Months): 283,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1571941.1571947
What is a DOI?

ABSTRACT

We proposed and implemented a novel clustering algorithm called LAIR2, which has constant running time average for on-the-fly Scatter/Gather browsing [4]. Our experiments showed that when running on a single processor, the LAIR2 on-line clustering algorithm was several hundred times faster than a parallel Buckshot algorithm running on multiple processors [11]. This paper reports on a study that examined the effectiveness of the LAIR2 algorithm in terms of clustering quality and its impact on retrieval performance. We conducted a user study on 24 subjects to evaluate on-the-fly LAIR2 clustering in Scatter/Gather search tasks by comparing its performance to the Buckshot algorithm, a classic method for Scatter/Gather browsing [4]. Results showed significant differences in terms of subjective perceptions of clustering quality. Subjects perceived that the LAIR2 algorithm produced significantly better quality clusters than the Buckshot method did. Subjects felt that it took less effort to complete the tasks with the LAIR2 system, which was more effective in helping them in the tasks. Interesting patterns also emerged from subjects' comments in the final open-ended questionnaire. We discuss implications and future research.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Allan. Hard track overview in TREC 2005: High accuracy retrieval from documents. In TREC '05: Proceedings of the Text REtrieval Conference, 2005.
 
2
3
4
5
 
6
J. Han, M. Kamber, and A. L. H. Tung. Spatial Clustering methods in data mining: a survey. New York, 2001.
 
7
M. A. Hearst. Modern Information Retrieval, chapter 10, pages 257--324. Addison-Wesley Longman Publishing, 2004.
 
8
M. A. Hearst, D. R. Karger, and J. O. Pedersen. Scatter/Gather as a tool for the navigation of retrieval results. In Working Notes AAAI Fall Symp. AI Applications in Knowledge Navigation, 1995.
9
10
 
11
W. Ke, J. Mostafa, and Y. Liu. Toward responsive visualization services for Scatter/Gather browsing. In ASIS&T '08: Proceedings of the annual meeting of the American Society for Information Science and Technology 2008, 2008.
 
12
A. J. Kleiboemer, M. B. Lazear, and J. O. Pedersen. Tailoring a retrieval system for naive users. In Proceedings of the Fifth Annual Symposium on Document Analysis and Information Retrieval (SDAIR), Las Vegas, NV, 1996.
13
 
14
T. Liu, S. Liu, Z. Cheng, and W.-Y. Ma. An evaluation on feature selection for text clustering. In Proceedings of the Twentieth International Conference on Machine Learning (ICML--2003), Washington DC, 2003.
 
15
P. Ogilvie and J. P. Callan. Experiments using the lemur toolkit. In Text REtrieval Conference (TREC), 2001.
16
17
 
18
M. Steinbach, G. Karypis, and V. Kumar. A comparison of document clustering techniques. In KDD Workshop on Text Mining, 2000.
 
19
 
20

Collaborative Colleagues:
Weimao Ke: colleagues
Cassidy R. Sugimoto: colleagues
Javed Mostafa: colleagues