|
ABSTRACT
Joins are among the most frequently executed operations. Several fast join algorithms have been developed and extensively studied; these can be categorized as sort-merge, hash-based, and index-based algorithms. While all three types of algorithms exhibit excellent performance over most data, ameliorating the performance degradation in the presence of skew has been investigated only for hash-based algorithms. However, for sort-merge join, even a small amount of skew present in realistic data can result in a significant performance hit on a commercial DBMS. This paper examines the negative ramifications of skew in sort-merge join and proposes several refinements that deal effectively with data skew. Experiments show that some of these algorithms also impose virtually no penalty in the absence of data skew and are thus suitable for replacing existing sort-merge implementations. We also show how sort-merge band join performance is significantly enhanced with these refinements.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
 |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
Roger N. Kline and Michael D. Soo, "The TIMEIT Temporal Database Testbed," 1998. www.cs.auc.dk/TimeCenter/software.htm.
|
| |
12
|
|
| |
13
|
T. Y. Cliff Leung and Richard R. Muntz, "Generalized Data Stream Indexing and Temporal Query Processing," in International Workshop on Research Issues in Data Engineering: Transaction and Query Processing, Philip S. Yu (ed.), pp. 124-131, Tempe, AZ, February, 1992.
|
| |
14
|
|
| |
15
|
Wei Li, Dengfeng Gao and Richard T. Snodgrass, "Skew Handling Techniques in Sort-Merge Join," 2001. http://www.cs.auc.dk/research/DP/tdb/TimeCenter/TimeCenterPublications/TR-62.pdf.
|
 |
16
|
|
| |
17
|
|
 |
18
|
P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts
[doi> 10.1145/582095.582099]
|
| |
19
|
|
| |
20
|
|
 |
21
|
Chun Zhang , Jeffrey Naughton , David DeWitt , Qiong Luo , Guy Lohman, On supporting containment queries in relational database management systems, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.425-436, May 21-24, 2001, Santa Barbara, California, United States
|
| |
22
|
Thomas Zurek, Parallel Temporal Nested-Loop Joins, Ph.D. Dissertation, Dept. of Computer Science, Edinburgh University, 1996.
|
CITED BY 3
|
|
|
|
|
Jens-Peter Dittrich , Bernhard Seeger , David Scot Taylor , Peter Widmayer, On producing join results early, Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.134-142, June 09-11, 2003, San Diego, California
|
|
|
|
|