|
ABSTRACT
The proliferation of the World Wide Web has brought information retrieval (IR) techniques to the forefront of search technology. To the average computer user, “searching” now means using IR-based systems for finding information on the WWW or in other document collections. IR query evaluation methods and workloads differ significantly from those found in database systems. In this paper, we focus on three such differences. First, due to the inherent fuzziness of the natural language used in IR queries and documents, an additional degree of flexibility is permitted in evaluating queries. Second, IR query evaluation algorithms tend to have access patterns that cause problems for traditional buffer replacement policies. Third, IR search is often an iterative process, in which a query is repeatedly refined and resubmitted by the user. Based on these differences, we develop two complementary techniques to improve the efficiency of IR queries: 1) Buffer-aware query evaluation, which alters the query evaluation process based on the current contents of buffers; and 2) Ranking-aware buffer replacement, which incorporates knowledge of the query processing strategy into replacement decisions. In a detailed performance study we show that using either of these techniques yields significant performance benefits and that in many cases, combining them produces even further improvements.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
ABGM90
|
|
 |
Bro95
|
|
| |
Bro97
|
K.P. Brown. DBGuide industrial presentation. ACM SIG- MOD Conf., Tucson, AZ, 1997.
|
| |
CD85
|
H.-T. Chou and D.J. DeWitt. An evaluation of buffer management strategies for relational database systems. Proc. of the VLDB Conf., Stockholm, Sweden, 1985.
|
 |
CK97
|
|
| |
CR93
|
|
| |
DFJ+96
|
|
 |
EH84
|
|
 |
Fal85
|
|
| |
Fid91
|
R. Fidel. Searchers' selection of search keys: III. Searching styles. Journal of the American Society of lnformation Science, 42(7), 1991.
|
 |
FJK96
|
Michael J. Franklin , Björn Thór Jónsson , Donald Kossmann, Performance tradeoffs for client-server query processing, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.149-160, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
Fox92
|
|
| |
Fra92
|
|
| |
Har96
|
D. Harman. Overview of the fourth Text REtrieval Conference (TREC-4). The fourth Text REtrieval Conference (TREC- 4), NIST, Gaithersburg, MD, 1996.
|
 |
HHW97
|
Joseph M. Hellerstein , Peter J. Haas , Helen J. Wang, Online aggregation, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.171-182, May 11-15, 1997, Tucson, Arizona, United States
|
| |
Ink
|
Inktomi. The lnktomi Technology Behind Hotbot. See http ://www.inktomi.com/Tech/CoupClustWhitePap. html.
|
| |
JS94
|
|
| |
KK94
|
|
| |
KQCB94
|
J. Koenemann, R. Quatrain, C. Cool, and N.J. Belkin~ New tools and old habits: The interactive searching behavior of expert online searchers using INQUERY. The Third Text REtrieval Conference (TREC-3), NIST, Gaithersburg, MD, 1994.
|
| |
MZ94
|
|
 |
NFS91
|
Raymond Ng , Christos Faloutsos , Timos Sellis, Flexible buffer allocation based on marginal gains, Proceedings of the 1991 ACM SIGMOD international conference on Management of data, p.387-396, May 29-31, 1991, Denver, Colorado, United States
|
 |
OOW93
|
Elizabeth J. O'Neil , Patrick E. O'Neil , Gerhard Weikum, The LRU-K page replacement algorithm for database disk buffering, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.297-306, May 25-28, 1993, Washington, D.C., United States
|
| |
Per94
|
|
| |
PZSD96
|
|
| |
SA87
|
R Simpson and R. Alonso. Data caching in information retrieval systems. Proc. ACM SIGIR Conf., New Orleans, LA, 1987.
|
| |
SB88
|
|
| |
SB90
|
G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. Journal of the American Society of Information Science, 41 (4), 1990.
|
| |
SS96
|
|
 |
Sto81
|
|
| |
TF95
|
|
 |
TGM93a
|
|
| |
TGM93b
|
|
| |
Tra95
|
Transaction Processing Performance Council (TPC), 777 N. First Street, Suite 600, San Jose, CA 95112, USA. TPC Benchmark D (Decision Support), May 1995.
|
| |
Tur94
|
|
| |
VH97
|
E.M. Voorhees and D. Harman. Overview of the fifth Text REtrieval Conference (TREC-5). The fifth Text REtrieval Conference (TREC-5), NIST, Gaithersburg, MD, 1997.
|
| |
WL93
|
|
| |
ZMSD92
|
|
CITED BY 11
|
|
Henk Ernst Blok , Djoerd Hiemstra , Sunil Choenni , Franciska de Jong , Henk M. Blanken , Peter M.G. Apers, Predicting the cost-quality trade-off for information retrieval queries: facilitating database design and query optimization, Proceedings of the tenth international conference on Information and knowledge management, October 05-10, 2001, Atlanta, Georgia, USA
|
|
|
Paricia Correia Saraiva , Edleno Silva de Moura , Novio Ziviani , Wagner Meira , Rodrigo Fonseca , Berthier Riberio-Neto, Rank-preserving two-level caching for scalable search engines, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.51-58, September 2001, New Orleans, Louisiana, United States
|
|
|
Stephane Bressan , Chong Leng Goh , Beng Chin Ooi , Kian-Lee Tan, A framework for modeling buffer replacement strategies, Proceedings of the ninth international conference on Information and knowledge management, p.62-69, November 06-11, 2000, McLean, Virginia, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ronny Lempel , Yosi Mass , Shila Ofek-Koifman , Dafna Sheinwald , Yael Petruschka , Ron Sivan, Just in time indexing for up to the second search, Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, November 06-10, 2007, Lisbon, Portugal
|
|
|
|
|