| Optimal splitters for database partitioning with size bounds |
| Full text |
Pdf
(688 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 361
archive
Proceedings of the 12th International Conference on Database Theory
table of contents
St. Petersburg, Russia
SESSION: Data structures and algorithms
table of contents
Pages 98-110
Year of Publication: 2009
ISBN:978-1-60558-423-2
|
|
Authors
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 64, Citation Count: 0
|
|
|
ABSTRACT
Partitioning is an important step in several database algorithms, including sorting, aggregation, and joins. Partitioning is also fundamental for dividing work into equal-sized (or balanced) parallel subtasks. In this paper, we aim to find, materialize and maintain a set of partitioning elements (splitters) for a data set. Unlike traditional partitioning elements, our splitters define both inequality and equality partitions, which allows us to bound the size of the inequality partitions. We provide an algorithm for determining an optimal set of splitters from a sorted data set and show that it has time complexity O(k lg2 N), where k is the number of splitters requested and N is the size of the data set. We show how the algorithm can be extended to pairs of tables, so that joins can be partitioned into work units that have balanced cost. We demonstrate experimentally (a) that finding the optimal set of splitters can be done efficiently, and (b) that using the precomputed splitters can improve the time to sort a data set by up to 76%, with particular benefits in the presence of a few heavy hitters.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
R. Agrawal and A. Swami. A one-pass space-efficient algorithm for finding quantiles. In COMAD, 1995.
|
 |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
J. Doweck. Inside intel core microarchitecture. In Hot Chips, August 2006.
|
 |
6
|
|
| |
7
|
|
 |
8
|
Jim Gray , Prakash Sundaresan , Susanne Englert , Ken Baclawski , Peter J. Weinberger, Quickly generating billion-record synthetic databases, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.243-252, May 24-27, 1994, Minneapolis, Minnesota, United States
|
| |
9
|
A. P. Gurajada and J. Srivastava. Equidepth partitioning of a data set based on finding its medians. In Applied Computing, pages 92--101, 1991.
|
| |
10
|
|
| |
11
|
|
| |
12
|
J. Huang and Y. Chow. Parallel sorting and data partitioning by sampling. In Proc. of Symposium on Parallel Algorithms and Architectures, pages 627--631, 1983.
|
| |
13
|
|
 |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
H. V. Jagadish , Nick Koudas , S. Muthukrishnan , Viswanath Poosala , Kenneth C. Sevcik , Torsten Suel, Optimal Histograms with Quality Guarantees, Proceedings of the 24rd International Conference on Very Large Data Bases, p.275-286, August 24-27, 1998
|
| |
18
|
|
| |
19
|
|
 |
20
|
Gurmeet Singh Manku , Sridhar Rajagopalan , Bruce G. Lindsay, Approximate medians and other quantiles in one pass and with limited memory, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.426-435, June 01-04, 1998, Seattle, Washington, United States
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
 |
25
|
Viswanath Poosala , Peter J. Haas , Yannis E. Ioannidis , Eugene J. Shekita, Improved histograms for selectivity estimation of range predicates, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.294-305, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
26
|
P. Sanders and S. Winkel. Super scalar sample sort. In European Symposium on Algorithms, pages 784--796, 2004.
|
| |
27
|
Mike Stonebraker , Daniel J. Abadi , Adam Batkin , Xuedong Chen , Mitch Cherniack , Miguel Ferreira , Edmond Lau , Amerson Lin , Sam Madden , Elizabeth O'Neil , Pat O'Neil , Alex Rasin , Nga Tran , Stan Zdonik, C-store: a column-oriented DBMS, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
|