|
ABSTRACT
A query to a web search engine usually consists of a list of keywords, to which the search engine responds with the best or "top" k pages for the query. This top-k query model is prevalent over multimedia collections in general, but also over plain relational data for certain applications. For example, consider a relation with information on available restaurants, including their location, price range for one diner, and overall food rating. A user who queries such a relation might simply specify the user's location and target price range, and expect in return the best 10 restaurants in terms of some combination of proximity to the user, closeness of match to the target price range, and overall food rating. Processing top-k queries efficiently is challenging for a number of reasons. One critical such reason is that, in many web applications, the relation attributes might not be available other than through external web-accessible form interfaces, which we will have to query repeatedly for a potentially large set of candidate objects. In this article, we study how to process top-k queries efficiently in this setting, where the attributes for which users specify target values might be handled by external, autonomous sources with a variety of access interfaces. We present a sequential algorithm for processing such queries, but observe that any sequential top-k query processing strategy is bound to require unnecessarily long query processing times, since web accesses exhibit high and variable latency. Fortunately, web sources can be probed in parallel, and each source can typically process concurrent requests, although sources may impose some restrictions on the type and number of probes that they are willing to accept. We adapt our sequential query processing technique and introduce an efficient algorithm that maximizes source-access parallelism to minimize query response time, while satisfying source-access constraints. We evaluate our techniques experimentally using both synthetic and real web-accessible data and show that parallel algorithms can be significantly more efficient than their sequential counterparts.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
Blake, C. and Merz, C. 1998. UCI repository of machine learning databases. ftp://ftp.ics.uci.edu/pub/machine-learning-databases/covtype.
|
 |
3
|
|
| |
4
|
|
 |
5
|
|
| |
6
|
|
 |
7
|
|
 |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
 |
12
|
|
 |
13
|
|
| |
14
|
|
| |
15
|
Freedman, D., Pisani, R., and Purves, R. 1997. Statistics. W.W. Norton & Company; 3rd edition.
|
 |
16
|
|
| |
17
|
Gravano, L., Marian, A., and Bruno, N. 2002. Evaluating top-k queries over web-accessible databases. Tech. Rep., Columbia Univ., New York.
|
| |
18
|
|
 |
19
|
|
 |
20
|
Vagelis Hristidis , Nick Koudas , Yannis Papakonstantinou, PREFER: a system for the efficient execution of multi-parametric ranked queries, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.259-270, May 21-24, 2001, Santa Barbara, California, United States
|
 |
21
|
A. Kemper , G. Moerkotte , K. Peithner , M. Steinbrunn, Optimizing disjunctive queries with expensive predicates, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.336-347, May 24-27, 1994, Minneapolis, Minnesota, United States
|
| |
22
|
Mills, D. 1983. Internet delay experiments; RFC 889. In ARPANET Working Group Requests for Comments. Number 889. SRI International, Menlo Park, Calif.
|
| |
23
|
|
| |
24
|
|
| |
25
|
Michael Ortega , Yong Rui , Kaushik Chakrabarti , Kriengkrai Porkaew , Sharad Mehrotra , Thomas S. Huang, Supporting Ranked Boolean Similarity Queries in MARS, IEEE Transactions on Knowledge and Data Engineering, v.10 n.6, p.905-925, November 1998
[doi> 10.1109/69.738357]
|
| |
26
|
|
CITED BY 30
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Holger Bast , Debapriyo Majumdar , Ralf Schenkel , Martin Theobald , Gerhard Weikum, IO-Top-k: index-access optimized top-k query processing, Proceedings of the 32nd international conference on Very large data bases, September 12-15, 2006, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Partha Pratim Talukdar , Marie Jacob , Muhammad Salman Mehmood , Koby Crammer , Zachary G. Ives , Fernando Pereira , Sudipto Guha, Learning to create data-integrating queries, Proceedings of the VLDB Endowment, v.1 n.1, August 2008
|
|
|
|
|
|
Demetrios Zeinalipour-Yazti , Zografoula Vagena , Vana Kalogeraki , Dimitrios Gunopulos , Vassilis J. Tsotras , Michail Vlachos , Nick Koudas , Divesh Srivastava, Finding the K highest-ranked answers in a distributed network, Computer Networks: The International Journal of Computer and Telecommunications Networking, v.53 n.9, p.1431-1449, June, 2009
|
|
|
Thomas Neumann , Matthias Bender , Sebastian Michel , Ralf Schenkel , Peter Triantafillou , Gerhard Weikum, Distributed top-k aggregation queries at large, Distributed and Parallel Databases, v.26 n.1, p.3-27, August 2009
|
|
|
|
|