|
ABSTRACT
Our current concern is a scalable infrastructure for information retrieval (IR) with up-to-date retrieval results in the presence of frequent, continuous updates. Timely processing of updates is important with novel application domains, e.g., e-commerce. We want to use off-the-self hardware and software as much as possible. These issues are challenging, given the additional requirement that the resulting system must scale well. We have built PowerDB-IR, a system that has the characteristics sought. This paper describes its design, implementation, and evaluation. PowerDB-IR is a coordination layer for a database cluster. The rationale behind a database cluster is to 'scale-out', i.e., to add further cluster nodes, whenever necessary for better performance. We build on IR-to-database mappings and service decomposition to support high-level parallelism. We follow a three-tier architecture with the database cluster as the bottom layer for storage management. The middle tier provides IR-specific processing and update services. PowerDB-IR has the following features: It allows to insert and retrieve documents concurrently, and it ensures freshness with almost no overhead. Alternative physical data organization schemes provide adequate performance for different workloads. Query processing techniques for the different data organizations efficiently integrate the ranked retrieval results from the cluster nodes. We have run extensive experiments with our prototype using commercial database systems and middleware software products. The main result is that PowerDB-IR shows surprisingly ideal scalability and low response times.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
 |
5
|
George Copeland , William Alexander , Ellen Boughter , Tom Keller, Data placement in Bubba, Proceedings of the 1988 ACM SIGMOD international conference on Management of data, p.99-108, June 01-03, 1988, Chicago, Illinois, United States
|
| |
6
|
S. DeFazio. Overview of the Full-Text Document Retrieval Benchmark (In: The Benchmark Handbook-Jim Gray (ed.)), pages 435487. Morgan Kaufmann, 199 1.
|
 |
7
|
Armando Fox , Steven D. Gribble , Yatin Chawathe , Eric A. Brewer , Paul Gauthier, Cluster-based scalable network services, Proceedings of the sixteenth ACM symposium on Operating systems principles, p.78-91, October 05-08, 1997, Saint Malo, France
|
| |
8
|
O. Frieder, A. Chowdhury, D. Grossman, and M. McCabe. On the Integration of Structured Data and Text: A Review of the SIRE Architecture. In Proceedings of the First DELOS Network of Excellence Workshop on Information Seeking, Searching and Querying in Digital Libraries, Zurich, Switzerland, 2000, pages 53-58. ERCIM, 2000.
|
| |
9
|
|
| |
10
|
T. Grabs, K. Bohm, and H.-J. Schek. PowerDB-IR- Information Retrieval on Top of a Database Cluster. Technical report, Database Research Group, ETH Zurich, 200 1. Available at: http://www.dbs.ethz.ch/~grabs/papers/cikm2001 long.pdf.
|
| |
11
|
|
| |
12
|
J. Gray. How High is High Performance Transaction Processing. In High Performance Transaction Systems Workshop, Asilomar. USA, 1999. Available at: http://research.microsoftt.com/~gray/hpts99/talks/-GrayJim. ppt.
|
| |
13
|
|
| |
14
|
Inktomi Corp. The Inktomi Technology behind HotBot. Technical report, Inktomi Corp., 1996.
|
 |
15
|
|
| |
16
|
Microsoft Corp. Building High-Performance Databases Using Microsoft SQL Server 2000 Federated Database Servers. Technical report, Microsoft Corp., 2000.
|
| |
17
|
|
 |
18
|
Anthony Tomasic , Héctor García-Molina , Kurt Shoens, Incremental updates of inverted lists for text document retrieval, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.289-300, May 24-27, 1994, Minneapolis, Minnesota, United States
|
CITED BY 5
|
|
Matthias Bender , Sebastian Michel , Peter Triantafillou , Gerhard Weikum , Christian Zimmer, Improving collection selection with overlap awareness in P2P search engines, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
|
|
|
|
|
|
|
|
|
|
|
|
|
|