ACM Home Page
Please provide us with feedback. Feedback
Design trade-offs for search engine caching
Full text PdfPdf (1.36 MB)
Source
ACM Transactions on the Web (TWEB) archive
Volume 2 ,  Issue 4  (October 2008) table of contents
Article No. 20  
Year of Publication: 2008
ISSN:1559-1131
Authors
Ricardo Baeza-Yates  Yahoo! Research, Barcelona, Spain
Aristides Gionis  Yahoo! Research, Barcelona, Spain
Flavio P. Junqueira  Yahoo! Research, Barcelona, Spain
Vanessa Murdock  Yahoo! Research, Barcelona, Spain
Vassilis Plachouras  Yahoo! Research, Barcelona, Spain
Fabrizio Silvestri  ISTI -- CNR, Pisa, Italy
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 29,   Downloads (12 Months): 399,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1409220.1409223
What is a DOI?

ABSTRACT

In this article we study the trade-offs in designing efficient caching systems for Web search engines. We explore the impact of different approaches, such as static vs. dynamic caching, and caching query results vs. caching posting lists. Using a query log spanning a whole year, we explore the limitations of caching and we demonstrate that caching posting lists can achieve higher hit rates than caching query answers. We propose a new algorithm for static caching of posting lists, which outperforms previous methods. We also study the problem of finding the optimal way to split the static cache between answers and posting lists. Finally, we measure how the changes in the query log influence the effectiveness of static caching, given our observation that the distribution of the queries changes slowly over time. Our results and observations are applicable to different levels of the data-access hierarchy, for instance, for a memory/disk layer or a broker/remote server layer.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
Baeza-Yates, R., Junqueira, F., Plachouras, V., and Witschel, H. F. 2007. Admission policies for caches of search engine results. In Proceedings of the 14th International Symposium on String Processing and Information Retrieval (SPIRE'07). Lecture Notes in Computer Science, Vol. 4726, 74--85.
 
4
Baeza-Yates, R. and Saint-Jean, F. 2003. A three level search engine index based in query log distribution. In Proceedings of the 10th International Symposium on String Processing and Information Retrieval (SPIRE'03). Lecture Notes in Computer Science, Vol. 2857, 56--65.
5
 
6
7
8
 
9
10
 
11
12
 
13
14
15
16
 
17
Markatos, E. P. 2001. On caching search engine query results. Comput. Commun. 24, 2, 137--143.
18
 
19
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., and Lioma, C. 2006. Terrier: a high performance and scalable information retrieval platform. In SIGIR Workshop on Open Source Information Retrieval.
20
21
22
23
24
25
26
 
27
 
28
Xie, Y. and O'Hallaron, D. R. 2002. Locality in search engine queries and its implications for caching. In Proceedings of the 21st Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM'02).
 
29
Young, N. E. 2002. On-line file caching. Algorithmica 33, 3, 371--383.
30

Collaborative Colleagues:
Ricardo Baeza-Yates: colleagues
Aristides Gionis: colleagues
Flavio P. Junqueira: colleagues
Vanessa Murdock: colleagues
Vassilis Plachouras: colleagues
Fabrizio Silvestri: colleagues