|
ABSTRACT
This paper addresses the problem of evaluating ranked top-k queries with expensive predicates. As major DBMSs now all support expensive user-defined predicates for Boolean queries, we believe such support for ranked queries will be even more important: First ranked queries often need to model user-specific concepts of preference, relevance, or similarity, which call for dynamic user-defined functions. Second, middleware systems must incorporate external predicates for integrating autonomous sources typically accessible only by per-object queries. Third, fuzzy joins are inherently expensive, as they are essentially user-defined operations that dynamically associate multiple relations. These predicates, being dynamically defined or externally accessed, cannot rely on index mechanisms to provide zero-time sorted output, and must instead require per-object probe to evaluate. The current standard sort-merge framework for ranked queries cannot efficiently handle such predicates because it must completely probe all objects, before sorting and merging them to produce top-k answers. To minimize expensive probes, we thus develop the formal principle of "necessary probes," which determines if a probe is absolutely required. We then propose Algorithm MPro which, by implementing the principle, is provably optimal with minimal probe cost. Further, we show that MPro can scale well and can be easily parallelized. Our experiments using both a real-estate benchmark database and synthetic datasets show that MPro enables significant probe reduction, which can be orders of magnitude faster than the standard scheme using complete probing.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
 |
3
|
A. Kemper , G. Moerkotte , K. Peithner , M. Steinbrunn, Optimizing disjunctive queries with expensive predicates, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.336-347, May 24-27, 1994, Minneapolis, Minnesota, United States
|
 |
4
|
|
| |
5
|
|
 |
6
|
Vagelis Hristidis , Nick Koudas , Yannis Papakonstantinou, PREFER: a system for the efficient execution of multi-parametric ranked queries, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.259-270, May 21-24, 2001, Santa Barbara, California, United States
|
 |
7
|
|
| |
8
|
|
| |
9
|
S. Nepal and M. Ramakrishna. Query processing issues in image (multimedia) databases. ICDE 1999, pages 22-29, 1999.
|
 |
10
|
|
| |
11
|
K. Chang and S. Hwang. Minimal probing: Supporting expensive predicates for top-k queries. Technical Report UIUCDCS-R-2001-2258, University of Illinois, December 2001.
|
| |
12
|
|
 |
13
|
|
 |
14
|
|
| |
15
|
|
 |
16
|
|
| |
17
|
N. Bruno, L. Gravano, and A. Marian. Evaluating top-k queries over web-accessible databases. ICDE 2002, 2002
|
| |
18
|
|
| |
19
|
L. Zadeh. Fuzzy sets. Information and Control, 8:338-353, 1965.
|
| |
20
|
|
| |
21
|
|
| |
22
|
The PostgreSQL Global Development Group. The PostgreSQL 7.1. Reference Manual.
|
CITED BY 55
|
|
|
|
|
|
|
|
|
|
|
Ihab F. Ilyas , Rahul Shah , Walid G. Aref , Jeffrey Scott Vitter , Ahmed K. Elmagarmid, Rank-aware query optimization, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Gültekin Özsoyoǧlu , Ismail Sengör Altingövde , Abdullah Al-Hamdani , Selma Ayşe Özel , Özgür Ulusoy , Zehra Meral özsoyoǧlu, Querying web metadata: Native score management and text support in databases, ACM Transactions on Database Systems (TODS), v.29 n.4, p.581-634, December 2004
|
|
|
|
|
|
Arjun Dasgupta , Nan Zhang , Gautam Das , Surajit Chaudhuri, Privacy preservation of aggregates in hidden databases: why and how?, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|
|
|
|
|
Ihab F. Ilyas , Walid G. Aref , Ahmed K. Elmagarmid , Hicham G. Elmongui , Rahul Shah , Jeffrey Scott Vitter, Adaptive rank-aware query optimization in relational databases, ACM Transactions on Database Systems (TODS), v.31 n.4, p.1257-1304, December 2006
|
|
|
|
|
|
|
|
|
|
|
|
Holger Bast , Debapriyo Majumdar , Ralf Schenkel , Martin Theobald , Gerhard Weikum, IO-Top-k: index-access optimized top-k query processing, Proceedings of the 32nd international conference on Very large data bases, September 12-15, 2006, Seoul, Korea
|
|
|
|
|
|
Zhen Zhang , Seung-won Hwang , Kevin Chen-Chuan Chang , Min Wang , Christian A. Lang , Yuan-chi Chang, Boolean + ranking: querying a database by k-constrained optimization, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
Caetano Traina, Jr. , Agma J. M. Traina , Marcos R. Vieira , Adriano S. Arantes , Christos Faloutsos, Efficient processing of complex similarity queries in RDBMS through query rewriting, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
G. Özsoyoǧlu , A. Al-Hamdani , I. S. Altingövde , S. A. Özel , Ö. Ulusoy , Z. M. Özsoyoǧlu, Sideway value algebra for object-relational databases, Proceedings of the 28th international conference on Very Large Data Bases, p.59-70, August 20-23, 2002, Hong Kong, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Thomas Neumann , Matthias Bender , Sebastian Michel , Ralf Schenkel , Peter Triantafillou , Gerhard Weikum, Distributed top-k aggregation queries at large, Distributed and Parallel Databases, v.26 n.1, p.3-27, August 2009
|
|