ACM Home Page
Please provide us with feedback. Feedback
Pruning policies for two-tiered inverted index with correctness guarantee
Full text PdfPdf (232 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Amsterdam, The Netherlands
SESSION: Managing memory table of contents
Pages: 191 - 198  
Year of Publication: 2007
ISBN:978-1-59593-597-7
Authors
Alexandros Ntoulas  Microsoft
Junghoo Cho  UCLA
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 101,   Citation Count: 9
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1277741.1277776
What is a DOI?

ABSTRACT

The Web search engines maintain large-scale inverted indexes which are queried thousands of times per second by users eager for information. In order to cope with the vast amounts of query loads, search engines prune their index to keep documents that are likely to be returned as top results, and use this pruned index to compute the first batches of results. While this approach can improve performance by reducing the size of the index, if we compute the top results only from the pruned index we may notice a significant degradation in the result quality: if a document should be in the top results but was not included in the pruned index, it will be placed behind the results computed from the pruned index. Given the fierce competition in the online search market, this phenomenon is clearly undesirable.

In this paper, we study how we can avoid any degradation of result quality due to the pruning-based performance optimization, while still realizing most of its benefit. Our contribution is a number of modifications in the pruning techniques for creating the pruned index and a new result computation algorithm that guarantees that the top-matching pages are always placed at the top search results, even though we are computing the first batch from the pruned index most of the time. We also show how to determine the optimal size of a pruned index and we experimentally evaluate our algorithms on a collection of 130 million Web pages.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
 
4
N. Bruno, L. Gravano, and A. Marian. Evaluating top-k queries over web-accessible databases. In ICDE, 2002.
 
5
S. Büttcher and C. L. A. Clarke. A document-centric approach to static index pruning in text retrieval systems. In CIKM, 2006.
6
7
8
 
9
 
10
Open directory. http://www.dmoz.org.
11
12
13
 
14
 
15
 
16
B. J. Jansen and A. Spink. An analysis of web documents retrieved and viewed. In International Conf. on Internet Computing, 2003.
17
18
19
 
20
21
 
22
Looksmart inc. http://www.looksmart.com.
23
24
25
 
26
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University.
 
27
 
28
M. Richardson and P. Domingos. The intelligent surfer: Probabilistic combination of link and content information in pagerank. In Advances in Neural Information Processing Systems, 2002.
 
29
S. Robertson and K. Spärck-Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129--46, 1976.
 
30
31
 
32
 
33

CITED BY  9

Collaborative Colleagues:
Alexandros Ntoulas: colleagues
Junghoo Cho: colleagues