|
ABSTRACT
In many applications, users specify target values for certain attributes, without requiring exact matches to these values in return. Instead, the result to such queries is typically a rank of the "top k" tuples that best match the given attribute values. In this paper, we study the advantages and limitations of processing a top-k query by translating it into a single range query that a traditional relational database management system (RDBMS) can process efficiently. In particular, we study how to determine a range query to evaluate a top-k query by exploiting the statistics available to an RDBMS, and the impact of the quality of these statistics on the retrieval efficiency of the resulting scheme. We also report the first experimental evaluation of the mapping strategies over a real RDBMS, namely over Microsoft's SQL Server 7.0. The experiments show that our new techniques are robust and significantly more efficient than previously known strategies requiring at least one sequential scan of the data sets.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Ron Avnur , Joseph M. Hellerstein , Bruce Lo , Chris Olston , Bhaskaran Raman , Vijayshankar Raman , Tali Roth , Kirk Wylie, CONTROL: continuous output and navigation technology with refinement on-line, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.567-569, June 01-04, 1998, Seattle, Washington, United States
|
| |
3
|
|
| |
4
|
|
| |
5
|
Blake, C. and Merz, C. 1998. UCI repository of machine learning databases.
|
| |
6
|
Bruno, N., Chaudhuri, S., and Gravano, L. 2000. Performance of multiattribute top-k queries on relational systems. Tech. Rep. CUCS-021-00, Columbia Univ., New York.
|
 |
7
|
Nicolas Bruno , Surajit Chaudhuri , Luis Gravano, STHoles: a multidimensional workload-aware histogram, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.211-222, May 21-24, 2001, Santa Barbara, California, United States
|
 |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
 |
16
|
|
 |
17
|
Dimitrios Gunopulos , George Kollios , Vassilis J. Tsotras , Carlotta Domeniconi, Approximating multi-dimensional aggregate range queries over real attributes, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.463-474, May 15-18, 2000, Dallas, Texas, United States
|
 |
18
|
|
 |
19
|
Vagelis Hristidis , Nick Koudas , Yannis Papakonstantinou, PREFER: a system for the efficient execution of multi-parametric ranked queries, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.259-270, May 21-24, 2001, Santa Barbara, California, United States
|
| |
20
|
|
 |
21
|
|
 |
22
|
Yossi Matias , Jeffrey Scott Vitter , Min Wang, Wavelet-based histograms for selectivity estimation, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.448-459, June 01-04, 1998, Seattle, Washington, United States
|
 |
23
|
|
| |
24
|
Muralikrishna, M. and DeWitt, D. J. 1988. Equi-depth histograms for estimating selectivity factors for multidimensional queries. In Proceedings of the 1988 ACM International Conference on Management of Data (SIGMOD'88). ACM, New York.
|
 |
25
|
|
| |
26
|
|
 |
27
|
Bernd-Uwe Pagel , Hans-Werner Six , Heinrich Toben , Peter Widmayer, Towards an analysis of range query performance in spatial data structures, Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.214-221, May 25-28, 1993, Washington, D.C., United States
[doi> 10.1145/153850.153878]
|
 |
28
|
|
| |
29
|
|
 |
30
|
Viswanath Poosala , Peter J. Haas , Yannis E. Ioannidis , Eugene J. Shekita, Improved histograms for selectivity estimation of range predicates, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.294-305, June 04-06, 1996, Montreal, Quebec, Canada
|
 |
31
|
|
| |
32
|
|
 |
33
|
|
| |
34
|
|
CITED BY 43
|
|
|
|
|
|
|
|
Ihab F. Ilyas , Rahul Shah , Walid G. Aref , Jeffrey Scott Vitter , Ahmed K. Elmagarmid, Rank-aware query optimization, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ihab F. Ilyas , Walid G. Aref , Ahmed K. Elmagarmid , Hicham G. Elmongui , Rahul Shah , Jeffrey Scott Vitter, Adaptive rank-aware query optimization in relational databases, ACM Transactions on Database Systems (TODS), v.31 n.4, p.1257-1304, December 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Surajit Chaudhuri , Gautam Das , Vagelis Hristidis , Gerhard Weikum, Probabilistic ranking of database query results, Proceedings of the Thirtieth international conference on Very large data bases, p.888-899, August 31-September 03, 2004, Toronto, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Carina F. Dorneles , Carlos A. Heuser , Viviane Moreira Orengo , Altigran S. da Silva , Edleno S. de Moura, A strategy for allowing meaningful and comparable scores in approximate matching, Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, November 06-10, 2007, Lisbon, Portugal
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Thomas Neumann , Matthias Bender , Sebastian Michel , Ralf Schenkel , Peter Triantafillou , Gerhard Weikum, Distributed top-k aggregation queries at large, Distributed and Parallel Databases, v.26 n.1, p.3-27, August 2009
|
|
|
Carina F. Dorneles , Marcos Freitas Nunes , Carlos A. Heuser , Viviane P. Moreira , Altigran S. da Silva , Edleno S. de Moura, A strategy for allowing meaningful and comparable scores in approximate matching, Information Systems, v.34 n.8, p.740-756, December, 2009
|
|