ACM Home Page
Please provide us with feedback. Feedback
Incremental cluster-based retrieval using compressed cluster-skipping inverted files
Full text PdfPdf (516 KB)
Source
ACM Transactions on Information Systems (TOIS) archive
Volume 26 ,  Issue 3  (June 2008) table of contents
Article No. 15  
Year of Publication: 2008
ISSN:1046-8188
Authors
Ismail Sengor Altingovde  Bilkent University, Ankara, Turkey
Engin Demir  Bilkent University, Ankara, Turkey
Fazli Can  Bilkent University, Ankara, Turkey
Özgür Ulusoy  Bilkent University, Ankara, Turkey
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 18,   Downloads (12 Months): 217,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1361684.1361688
What is a DOI?

ABSTRACT

We propose a unique cluster-based retrieval (CBR) strategy using a new cluster-skipping inverted file for improving query processing efficiency. The new inverted file incorporates cluster membership and centroid information along with the usual document information into a single structure. In our incremental-CBR strategy, during query evaluation, both best(-matching) clusters and the best(-matching) documents of such clusters are computed together with a single posting-list access per query term. As we switch from term to term, the best clusters are recomputed and can dynamically change. During query-document matching, only relevant portions of the posting lists corresponding to the best clusters are considered and the rest are skipped. The proposed approach is essentially tailored for environments where inverted files are compressed, and provides substantial efficiency improvement while yielding comparable, or sometimes better, effectiveness figures. Our experiments with various collections show that the incremental-CBR strategy using a compressed cluster-skipping inverted file significantly improves CPU time efficiency, regardless of query length. The new compressed inverted file imposes an acceptable storage overhead in comparison to a typical inverted file. We also show that our approach scales well with the collection size.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Altingovde, I. S., Can, F., and Ulusoy, Ö. 2006. Algorithms for within-cluster searches using inverted files. In Proceedings of the International Symposium on Computer and Information Sciences (ISCIS). 707--716.
2
3
 
4
5
6
7
 
8
 
9
10
11
12
 
13
Cacheda, F. and Baeza-Yates, R. 2004. An optimistic model for searching Web directories. In Proceedings of the 26th European Conference on IR Research (ECIR). 364--377.
 
14
Cacheda, F., Carneiro, V., Guerrero, C., and Viña, Á. 2003. Optimization of restricted searches in Web directories using hybrid data structures. In Proceedings of the 25th European Conference on IR Research (ECIR). 436--451.
 
15
Cambazoglu, B. B. 2006. Models and algorithms for parallel text retrieval. Ph.D. thesis, Bilkent University, Ankara, Turkey.
 
16
17
 
18
 
19
20
 
21
Croft, W. B. 1980. A model of cluster searching based on classification. Inf. Syst. 5, 3, 189--195.
 
22
 
23
 
24
Jardine, N. and van Rijsbergen, C. J. 1971. The use of hierarchical clustering in information retrieval. Inf. Storage Retr. 7, 217--240.
 
25
Lester, N., Moffat, A., Webber, W., and Zobel, J. 2005. Space-Limited ranked query evaluation using adaptive pruning. In Proceedings of the 6th International Conference on Web Information Systems Engineering, New York, 470--477.
26
 
27
28
 
29
 
30
 
31
 
32
 
33
 
34
 
35
 
36
37
 
38
Silvestri, F. 2007. Sorting out the document identifier assignment problem. In Proceedings of the 29th European Conference on IR Research (ECIR). 101--112.
 
39
Stanford University. 2007. Webbase project Web site. www-diglib.stanford.edu/~testbed/doc2/WebBase.
40
 
41
Trec 2006. Trec. Web site. http://trec.nist.gov.
 
42
Tombros, A. 2002. The effectiveness of query-based hierarchic clustering of documents for information retrieval. Ph.D. thesis, University of Glasgow, Glasgow, UK.
 
43
44
 
45
46
 
47
 
48
 
49
Zettair 2007. The Zettair search engine. http://www.seg.rmit.edu.au/zettair/.
50


Collaborative Colleagues:
Ismail Sengor Altingovde: colleagues
Engin Demir: colleagues
Fazli Can: colleagues
Özgür Ulusoy: colleagues