|
ABSTRACT
We study the caching of query result pages in Web search engines. Popular search engines receive millions of queries per day, and efficient policies for caching query results may enable them to lower their response time and reduce their hardware requirements. We present PDC (probability driven cache), a novel scheme tailored for caching search results, that is based on a probabilistic model of search engine users. We then use a trace of over seven million queries submitted to the search engine AltaVista to evaluate PDC, as well as traditional LRU and SLRU based caching schemes. The trace driven simulations show that PDC outperforms the other policies. We also examine the prefetching of search results, and demonstrate that prefetching can increase cache hit ratios by 50% for large caches, and can double the hit ratios of small caches. When integrating prefetching into PDC, we attain hit ratios of over 0.53.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
L. A. Adamic and B. A. Huberman. The nature of markets in the world wide web. Quarterly Journal of Economic Commerce, 1:5--12, 2000.
|
| |
2
|
|
| |
3
|
Andrei Broder , Ravi Kumar , Farzin Maghoul , Prabhakar Raghavan , Sridhar Rajagopalan , Raymie Stata , Andrew Tomkins , Janet Wiener, Graph structure in the Web, Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking, p.309-320, June 2000, Amsterdam, The Netherlands
|
| |
4
|
P. Cao, J. Zhang, and K. Beach. Active cache: Caching dynamic contents on the web. In Proc. of the IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing (Middleware '98), pages 373--388, 1998.
|
| |
5
|
|
| |
6
|
|
 |
7
|
Björn T. Jónsson , Michael J. Franklin , Divesh Srivastava, Interaction of query evaluation and buffer management for information retrieval, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.118-129, June 01-04, 1998, Seattle, Washington, United States
|
| |
8
|
|
| |
9
|
|
| |
10
|
R. Lempel and S. Moran. Optimizing result prefetching in web search engines with segmented indices. In Proc. 28th International Conference on Very Large Data Bases, Hong Kong, China, 2002.
|
| |
11
|
E. P. Markatos. On caching search engine query results. Proceedings of the 5th International Web Caching and Content Delivery Workshop, May 2000.
|
| |
12
|
M. Mitzenmacher. A brief history of generative models for power law and lognormal distributions. Invited Talk in the 39th Annual Allerton Conference on Communication, Control and Computing, October 2001.
|
 |
13
|
Paricia Correia Saraiva , Edleno Silva de Moura , Novio Ziviani , Wagner Meira , Rodrigo Fonseca , Berthier Riberio-Neto, Rank-preserving two-level caching for scalable search engines, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.51-58, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383959]
|
| |
14
|
C. Silverstein, M. Henzinger, H. Marais, and M. Moricz. Analysis of a very large altavista query log. Technical Report 1998-014, Compaq Systems Research Center, October 1998.
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
CITED BY 34
|
|
|
|
|
|
|
|
|
|
|
Sandeep Pandey , Sourashis Roy , Christopher Olston , Junghoo Cho , Soumen Chakrabarti, Shuffling a stacked deck: the case for partially randomized ranking of search engine results, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
|
|
Steven M. Beitzel , Eric C. Jensen , Abdur Chowdhury , David Grossman , Ophir Frieder, Hourly analysis of a very large topically categorized web query log, Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, July 25-29, 2004, Sheffield, United Kingdom
|
|
|
|
|
|
|
|
|
Anirban Dasgupta , Ravi Kumar , Prabhakar Raghavan , Andrew Tomkins, Variable latent semantic indexing, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ricardo Baeza-Yates , Aristides Gionis , Flavio Junqueira , Vanessa Murdock , Vassilis Plachouras , Fabrizio Silvestri, The impact of caching on search engines, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
|
|
|
|
|
|
|
|
|
Ricardo Baeza-Yates , Aristides Gionis , Flavio P. Junqueira , Vanessa Murdock , Vassilis Plachouras , Fabrizio Silvestri, Design trade-offs for search engine caching, ACM Transactions on the Web (TWEB), v.2 n.4, p.1-28, October 2008
|
|
|
|
|
|
|
|
|
|
|
|
Ronny Lempel , Yosi Mass , Shila Ofek-Koifman , Dafna Sheinwald , Yael Petruschka , Ron Sivan, Just in time indexing for up to the second search, Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, November 06-10, 2007, Lisbon, Portugal
|
|
|
|
|
|
Fabrizio Falchi , Claudio Lucchese , Salvatore Orlando , Raffaele Perego , Fausto Rabitti, A metric cache for similarity search, Proceeding of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval, October 30-30, 2008, Napa Valley, California, USA
|
|
|
Doug Downey , Susan Dumais , Dan Liebling , Eric Horvitz, Understanding the relationship between searchers' queries and information goals, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
Fabrizio Falchi , Claudio Lucchese , Salvatore Orlando , Raffaele Perego , Fausto Rabitti, Caching content-based queries for robust and efficient image retrieval, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, March 24-26, 2009, Saint Petersburg, Russia
|
|
|
|
|
|
Sandeep Pandey , Andrei Broder , Flavio Chierichetti , Vanja Josifovski , Ravi Kumar , Sergei Vassilvitskii, Nearest-neighbor caching for content-match applications, Proceedings of the 18th international conference on World wide web, April 20-24, 2009, Madrid, Spain
|
|
|
|
|
|
|
|