| Design trade-offs for search engine caching |
| Full text |
Pdf
(1.36 MB)
|
Source
|
ACM Transactions on the Web (TWEB)
archive
Volume 2 , Issue 4 (October 2008)
table of contents
Article No. 20
Year of Publication: 2008
ISSN:1559-1131
|
|
Authors
|
|
Ricardo Baeza-Yates
|
Yahoo! Research, Barcelona, Spain
|
|
Aristides Gionis
|
Yahoo! Research, Barcelona, Spain
|
|
Flavio P. Junqueira
|
Yahoo! Research, Barcelona, Spain
|
|
Vanessa Murdock
|
Yahoo! Research, Barcelona, Spain
|
|
Vassilis Plachouras
|
Yahoo! Research, Barcelona, Spain
|
|
Fabrizio Silvestri
|
ISTI -- CNR, Pisa, Italy
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 29, Downloads (12 Months): 399, Citation Count: 0
|
|
|
ABSTRACT
In this article we study the trade-offs in designing efficient caching systems for Web search engines. We explore the impact of different approaches, such as static vs. dynamic caching, and caching query results vs. caching posting lists. Using a query log spanning a whole year, we explore the limitations of caching and we demonstrate that caching posting lists can achieve higher hit rates than caching query answers. We propose a new algorithm for static caching of posting lists, which outperforms previous methods. We also study the problem of finding the optimal way to split the static cache between answers and posting lists. Finally, we measure how the changes in the query log influence the effectiveness of static caching, given our observation that the distribution of the queries changes slowly over time. Our results and observations are applicable to different levels of the data-access hierarchy, for instance, for a memory/disk layer or a broker/remote server layer.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Ricardo Baeza-Yates , Aristides Gionis , Flavio Junqueira , Vanessa Murdock , Vassilis Plachouras , Fabrizio Silvestri, The impact of caching on search engines, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
[doi> 10.1145/1277741.1277775]
|
| |
3
|
Baeza-Yates, R., Junqueira, F., Plachouras, V., and Witschel, H. F. 2007. Admission policies for caches of search engine results. In Proceedings of the 14th International Symposium on String Processing and Information Retrieval (SPIRE'07). Lecture Notes in Computer Science, Vol. 4726, 74--85.
|
| |
4
|
Baeza-Yates, R. and Saint-Jean, F. 2003. A three level search engine index based in query log distribution. In Proceedings of the 10th International Symposium on String Processing and Information Retrieval (SPIRE'03). Lecture Notes in Computer Science, Vol. 2857, 56--65.
|
 |
5
|
Steven M. Beitzel , Eric C. Jensen , Abdur Chowdhury , David Grossman , Ophir Frieder, Hourly analysis of a very large topically categorized web query log, Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, July 25-29, 2004, Sheffield, United Kingdom
[doi> 10.1145/1008992.1009048]
|
| |
6
|
|
 |
7
|
|
 |
8
|
|
| |
9
|
|
 |
10
|
Carlos Castillo , Debora Donato , Luca Becchetti , Paolo Boldi , Stefano Leonardi , Massimo Santini , Sebastiano Vigna, A reference collection for web spam, ACM SIGIR Forum, v.40 n.2, p.11-24, December 2006
[doi> 10.1145/1189702.1189703]
|
| |
11
|
|
 |
12
|
|
| |
13
|
|
 |
14
|
|
 |
15
|
|
 |
16
|
|
| |
17
|
Markatos, E. P. 2001. On caching search engine query results. Comput. Commun. 24, 2, 137--143.
|
 |
18
|
|
| |
19
|
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., and Lioma, C. 2006. Terrier: a high performance and scalable information retrieval platform. In SIGIR Workshop on Open Source Information Retrieval.
|
 |
20
|
|
 |
21
|
|
 |
22
|
Paricia Correia Saraiva , Edleno Silva de Moura , Novio Ziviani , Wagner Meira , Rodrigo Fonseca , Berthier Riberio-Neto, Rank-preserving two-level caching for scalable search engines, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.51-58, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383959]
|
 |
23
|
|
 |
24
|
|
 |
25
|
|
 |
26
|
|
| |
27
|
|
| |
28
|
Xie, Y. and O'Hallaron, D. R. 2002. Locality in search engine queries and its implications for caching. In Proceedings of the 21st Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM'02).
|
| |
29
|
Young, N. E. 2002. On-line file caching. Algorithmica 33, 3, 371--383.
|
 |
30
|
|
|