ACM Home Page
Please provide us with feedback. Feedback
Fast phrase querying with combined indexes
Full text PdfPdf (192 KB)
Purchase a print copy Purchase a print copy

Source ACM Transactions on Information Systems (TOIS) archive
Volume 22 ,  Issue 4  (October 2004) table of contents
Pages: 573 - 594  
Year of Publication: 2004
ISSN:1046-8188
Authors
Hugh E. Williams  RMIT University, Melbourne, Australia
Justin Zobel  RMIT University, Melbourne, Australia
Dirk Bahle  RMIT University, Melbourne, Australia
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 137,   Citation Count: 10
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1028099.1028102
What is a DOI?

ABSTRACT

Search engines need to evaluate queries extremely fast, a challenging task given the quantities of data being indexed. A significant proportion of the queries posed to search engines involve phrases. In this article we consider how phrase queries can be efficiently supported with low disk overheads. Our previous research has shown that phrase queries can be rapidly evaluated using nextword indexes, but these indexes are twice as large as conventional inverted files. Alternatively, special-purpose phrase indexes can be used, but it is not feasible to index all phrases. We propose combinations of nextword indexes and phrase indexes with inverted files as a solution to this problem. Our experiments show that combined use of a partial nextword, partial phrase, and conventional inverted index allows evaluation of phrase queries in a quarter the time required to evaluate such queries with an inverted file alone; the additional space overhead is only 26% of the size of the inverted file.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
Bahle, D. 2003. Efficient phrase querying. Ph.D. dissertation. School of Computer Science and Information Technology, RMIT, Melbourne, Australia.
 
3
Bahle, D., Williams, H. E., and Zobel, J. 2001a. Compaction techniques for nextword indexes. In Proceedings of the String Processing and Information Retrieval Symposium (San Rafael, Chile). IEEE Computer Society Press, Los Alamitos, CA, 33--45.
 
4
 
5
 
6
Clarke, C. L., Cormack, G. V., and Tudhope, E. A. 1997. Relevance ranking for one- to three-term queries. In Proceedings of the Fifth RIAO International Conference "Recherche d'Information Assistee par Ordinateur" (Montreal P.Q., Canada). 388--400.
7
8
 
9
 
10
 
11
12
13
14
 
15
16
17
 
18
 
19
Spink, A. and Xu, J. 2000. Selected results from a large study of Web searching: The Excite study. Informat. Res. 6, 1. Available online at: http://InformationR.net/ir/6-1/paper90.html.
 
20
Voorhees, E. M. and Harman, D. K. 2001. Overview of TREC 2001. In The Tenth Text REtrieval Conference (TREC 2001), E. M. Voorhees and D. K. Harman, Eds. NIST Spec. pub. 500-250. National Institute of Standards and Technology, Gaithersburg, MD, 1--15.
 
21
Williams, H. E. and Zobel, J. 1999. Compressing integers for fast file access. Comput. J. 42, 3, 193--201.
 
22
Williams, H. E., Zobel, J., and Anderson, P. 1999. What's next? Index structures for efficient phrase querying. In Proceedings of the Australasian Database Conference (Auckland, New Zealand), M. Orlowska, Ed. Springer-Verlag, Berlin, Germany, 141--152.
 
23
24
 
25

CITED BY  10
 
 
 

Collaborative Colleagues:
Hugh E. Williams: colleagues
Justin Zobel: colleagues
Dirk Bahle: colleagues