|
ABSTRACT
This article discusses efficiency and effectiveness issues in caching the results of queries submitted to a Web search engine (WSE). We propose SDC (Static Dynamic Cache), a new caching strategy aimed to efficiently exploit the temporal and spatial locality present in the stream of processed queries. SDC extracts from historical usage data the results of the most frequently submitted queries and stores them in a static, read-only portion of the cache. The remaining entries of the cache are dynamically managed according to a given replacement policy and are used for those queries that cannot be satisfied by the static portion. Moreover, we improve the hit ratio of SDC by using an adaptive prefetching strategy, which anticipates future requests by introducing a limited overhead over the back-end WSE. We experimentally demonstrate the superiority of SDC over purely static and dynamic policies by measuring the hit ratio achieved on three large query logs by varying the cache parameters and the replacement policy used for managing the dynamic part of the cache. Finally, we deploy and measure the throughput achieved by a concurrent version of our caching system. Our tests show how the SDC cache can be efficiently exploited by many threads that concurrently serve the queries of different users.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Steven M. Beitzel , Eric C. Jensen , Abdur Chowdhury , David Grossman , Ophir Frieder, Hourly analysis of a very large topically categorized web query log, Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, July 25-29, 2004, Sheffield, United Kingdom
[doi> 10.1145/1008992.1009048]
|
| |
3
|
Hölscher, C. 1998. How Internet experts search for information on the Web. In Proceedings of WebNet 98---World Conference on the WWW and Internet & Intranet (Orlando, FL, Nov. 7--12).
|
| |
4
|
|
| |
5
|
|
 |
6
|
|
 |
7
|
|
| |
8
|
Markatos, E. P. 2000. On caching search engine results. In Proceedings of the 5th International Web Caching and Content Delivery Workshop. Go online to http://www.iwcw.org/2000/Proceedings/proceedings.html.
|
| |
9
|
Markatos, E. P. 2001. On caching search engine results. Comput. Commun. 24, 2, 137--143.
|
| |
10
|
Moffat, A. and Zobel, J. 2004. What does it mean to “measure performance”? In Proceedings of the International Conference on Web Informations Systems, X. Zhou, S. Su, M. P. Papazoglou, M. E. Owlowska, and K. Jeffrey, Eds. Lecture Notes in Computer Science, vol. 3306. Springer, Berlin, Germany, 1--12.
|
 |
11
|
Elizabeth J. O'Neil , Patrick E. O'Neil , Gerhard Weikum, The LRU-K page replacement algorithm for database disk buffering, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.297-306, May 25-28, 1993, Washington, D.C., United States
|
| |
12
|
Orlando, S., Perego, R., and Silvestri, F. 2001. Design of a parallel and distributed Web search engine. In ParCo2001: Proceedings of the International Conference Parallel Computing: Advances and Current Issues. Imperial College Press, London, U.K., 197--204.
|
 |
13
|
|
 |
14
|
|
 |
15
|
|
 |
16
|
Paricia Correia Saraiva , Edleno Silva de Moura , Novio Ziviani , Wagner Meira , Rodrigo Fonseca , Berthier Riberio-Neto, Rank-preserving two-level caching for scalable search engines, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.51-58, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383959]
|
 |
17
|
|
| |
18
|
Silvestri, F. 2004. High performance issues in Web search engines: Algorithms and techniques. Ph.D. dissertation. Università degli Studi di Pisa---Facoltà di Informatica, Pisa, Italy.
|
| |
19
|
|
| |
20
|
|
| |
21
|
Xie, Y. and O'Hallaron, D. 2002. Locality in search engine queries and its implications for caching. In Proceedings of IEEE INFOCOM 2002: The 21st Annual Joint Conference of the IEEE Computer and Communications Societies.
|
CITED BY 19
|
|
|
|
|
|
|
|
|
|
|
Ricardo Baeza-Yates , Aristides Gionis , Flavio Junqueira , Vanessa Murdock , Vassilis Plachouras , Fabrizio Silvestri, The impact of caching on search engines, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
|
|
|
Ricardo Baeza-Yates , Aristides Gionis , Flavio P. Junqueira , Vanessa Murdock , Vassilis Plachouras , Fabrizio Silvestri, Design trade-offs for search engine caching, ACM Transactions on the Web (TWEB), v.2 n.4, p.1-28, October 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fabrizio Falchi , Claudio Lucchese , Salvatore Orlando , Raffaele Perego , Fausto Rabitti, A metric cache for similarity search, Proceeding of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval, October 30-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
Fabrizio Falchi , Claudio Lucchese , Salvatore Orlando , Raffaele Perego , Fausto Rabitti, Caching content-based queries for robust and efficient image retrieval, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, March 24-26, 2009, Saint Petersburg, Russia
|
|
|
|
|
|
Sandeep Pandey , Andrei Broder , Flavio Chierichetti , Vanja Josifovski , Ravi Kumar , Sergei Vassilvitskii, Nearest-neighbor caching for content-match applications, Proceedings of the 18th international conference on World wide web, April 20-24, 2009, Madrid, Spain
|
|
|
|
|