| Improved techniques for result caching in web search engines |
| Full text |
Pdf
(1.02 MB)
|
Source
|
International World Wide Web Conference
archive
Proceedings of the 18th international conference on World wide web
table of contents
Madrid, Spain
SESSION: Search/session: caching and indices
table of contents
Pages 431-440
Year of Publication: 2009
ISBN:978-1-60558-487-4
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 69, Downloads (12 Months): 248, Citation Count: 1
|
|
|
ABSTRACT
Query processing is a major cost factor in operating large web search engines. In this paper, we study query result caching, one of the main techniques used to optimize query processing performance. Our first contribution is a study of result caching as a weighted caching problem. Most previous work has focused on optimizing cache hit ratios, but given that processing costs of queries can vary very significantly we argue that total cost savings also need to be considered. We describe and evaluate several algorithms for weighted result caching, and study the impact of Zipf-based query distributions on result caching. Our second and main contribution is a new set of feature-based cache eviction policies that achieve significant improvements over all previous methods, substantially narrowing the existing performance gap to the theoretically optimal (clairvoyant) method. Finally, using the same approach, we also obtain performance gains for the related problem of inverted list caching.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Ricardo Baeza-Yates , Aristides Gionis , Flavio Junqueira , Vanessa Murdock , Vassilis Plachouras , Fabrizio Silvestri, The impact of caching on search engines, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
[doi> 10.1145/1277741.1277775]
|
| |
2
|
R.Baeza-Yates, F.Junqueira, V.Plachouras, and H.Witschel. Admission policies for caches of search engine results. In Proc. of the 14th String Processing and Information Retrieval Symposium, Sept. 2007.
|
| |
3
|
R.Baeza-Yates and F.Saint-Jean. A three-level search-engine index based in query log distribution. In Proc. of the 10th String Processing and Information Retrieval Symposium, Sept. 2003.
|
| |
4
|
|
| |
5
|
|
| |
6
|
S.F. Chen and J.Goodman. An empirical study of smoothing techniques for language modeling, 1996.
|
 |
7
|
|
| |
8
|
S.Garcia. Search Engine Optimisation Using Past Queries. Ph.D Thesis, Department of Computer Science and Information Technology, RMIT University, 2007.
|
 |
9
|
Björn T. Jónsson , Michael J. Franklin , Divesh Srivastava, Interaction of query evaluation and buffer management for information retrieval, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.118-129, June 01-04, 1998, Seattle, Washington, United States
|
 |
10
|
|
 |
11
|
|
 |
12
|
|
 |
13
|
|
| |
14
|
|
| |
15
|
E.Markatos. On caching search engine query results. In 5th International Web Caching and Content Delivery Workshop, May 2000.
|
 |
16
|
Paricia Correia Saraiva , Edleno Silva de Moura , Novio Ziviani , Wagner Meira , Rodrigo Fonseca , Berthier Riberio-Neto, Rank-preserving two-level caching for scalable search engines, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.51-58, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383959]
|
| |
17
|
C.Silverstein, M.Henzinger, H.Marais, and M.Moricz. Analysis of a Very Large AltaVista Query Log. Technical Report 014, SRC (Digital, Palo Alto), Oct. 1998.
|
| |
18
|
Y.Xie and D.O'Hallaron. Locality in search engine queries and its implications for caching. In Proc. IEEE Infocom, 2002.
|
| |
19
|
|
 |
20
|
|
 |
21
|
|
|