ACM Home Page
Please provide us with feedback. Feedback
PowerDB-IR: information retrieval on top of a database cluster
Full text PdfPdf (1.95 MB)
Source Conference on Information and Knowledge Management archive
Proceedings of the tenth international conference on Information and knowledge management table of contents
Atlanta, Georgia, USA
Session: Similarity Measures table of contents
Pages: 411 - 418  
Year of Publication: 2001
ISBN:1-58113-436-3
Authors
Torsten Grabs  ETH Zurich, Zurich, Switzerland
Klemens Böhm  ETH Zurich, Zurich, Switzerland
Hans-Jörg Schek  ETH Zurich, Zurich, Switzerland
Sponsors
SIGMIS: ACM Special Interest Group on Management Information Systems
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 41,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/502585.502655
What is a DOI?

ABSTRACT

Our current concern is a scalable infrastructure for information retrieval (IR) with up-to-date retrieval results in the presence of frequent, continuous updates. Timely processing of updates is important with novel application domains, e.g., e-commerce. We want to use off-the-self hardware and software as much as possible. These issues are challenging, given the additional requirement that the resulting system must scale well. We have built PowerDB-IR, a system that has the characteristics sought. This paper describes its design, implementation, and evaluation. PowerDB-IR is a coordination layer for a database cluster. The rationale behind a database cluster is to 'scale-out', i.e., to add further cluster nodes, whenever necessary for better performance. We build on IR-to-database mappings and service decomposition to support high-level parallelism. We follow a three-tier architecture with the database cluster as the bottom layer for storage management. The middle tier provides IR-specific processing and update services. PowerDB-IR has the following features: It allows to insert and retrieve documents concurrently, and it ensures freshness with almost no overhead. Alternative physical data organization schemes provide adequate performance for different workloads. Query processing techniques for the different data organizations efficiently integrate the ranked retrieval results from the cluster nodes. We have run extensive experiments with our prototype using commercial database systems and middleware software products. The main result is that PowerDB-IR shows surprisingly ideal scalability and low response times.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
 
4
5
 
6
S. DeFazio. Overview of the Full-Text Document Retrieval Benchmark (In: The Benchmark Handbook-Jim Gray (ed.)), pages 435487. Morgan Kaufmann, 199 1.
7
 
8
O. Frieder, A. Chowdhury, D. Grossman, and M. McCabe. On the Integration of Structured Data and Text: A Review of the SIRE Architecture. In Proceedings of the First DELOS Network of Excellence Workshop on Information Seeking, Searching and Querying in Digital Libraries, Zurich, Switzerland, 2000, pages 53-58. ERCIM, 2000.
 
9
 
10
T. Grabs, K. Bohm, and H.-J. Schek. PowerDB-IR- Information Retrieval on Top of a Database Cluster. Technical report, Database Research Group, ETH Zurich, 200 1. Available at: http://www.dbs.ethz.ch/~grabs/papers/cikm2001 long.pdf.
 
11
 
12
J. Gray. How High is High Performance Transaction Processing. In High Performance Transaction Systems Workshop, Asilomar. USA, 1999. Available at: http://research.microsoftt.com/~gray/hpts99/talks/-GrayJim. ppt.
 
13
 
14
Inktomi Corp. The Inktomi Technology behind HotBot. Technical report, Inktomi Corp., 1996.
15
 
16
Microsoft Corp. Building High-Performance Databases Using Microsoft SQL Server 2000 Federated Database Servers. Technical report, Microsoft Corp., 2000.
 
17
18


Collaborative Colleagues:
Torsten Grabs: colleagues
Klemens Böhm: colleagues
Hans-Jörg Schek: colleagues