ACM Home Page
Please provide us with feedback. Feedback
Optimal splitters for database partitioning with size bounds
Full text PdfPdf (688 KB)
Source ACM International Conference Proceeding Series; Vol. 361 archive
Proceedings of the 12th International Conference on Database Theory table of contents
St. Petersburg, Russia
SESSION: Data structures and algorithms table of contents
Pages 98-110  
Year of Publication: 2009
ISBN:978-1-60558-423-2
Authors
Kenneth A. Ross  Columbia University, New York, NY
John Cieslewicz  Columbia University, New York, NY
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 64,   Citation Count: 0
Additional Information:

abstract   references   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1514894.1514907
What is a DOI?

ABSTRACT

Partitioning is an important step in several database algorithms, including sorting, aggregation, and joins. Partitioning is also fundamental for dividing work into equal-sized (or balanced) parallel subtasks. In this paper, we aim to find, materialize and maintain a set of partitioning elements (splitters) for a data set. Unlike traditional partitioning elements, our splitters define both inequality and equality partitions, which allows us to bound the size of the inequality partitions. We provide an algorithm for determining an optimal set of splitters from a sorted data set and show that it has time complexity O(k lg2 N), where k is the number of splitters requested and N is the size of the data set. We show how the algorithm can be extended to pairs of tables, so that joins can be partitioned into work units that have balanced cost. We demonstrate experimentally (a) that finding the optimal set of splitters can be done efficiently, and (b) that using the precomputed splitters can improve the time to sort a data set by up to 76%, with particular benefits in the presence of a few heavy hitters.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
R. Agrawal and A. Swami. A one-pass space-efficient algorithm for finding quantiles. In COMAD, 1995.
2
 
3
 
4
 
5
J. Doweck. Inside intel core microarchitecture. In Hot Chips, August 2006.
6
 
7
8
 
9
A. P. Gurajada and J. Srivastava. Equidepth partitioning of a data set based on finding its medians. In Applied Computing, pages 92--101, 1991.
 
10
 
11
 
12
J. Huang and Y. Chow. Parallel sorting and data partitioning by sampling. In Proc. of Symposium on Parallel Algorithms and Architectures, pages 627--631, 1983.
 
13
14
 
15
 
16
 
17
 
18
 
19
20
 
21
 
22
 
23
 
24
25
 
26
P. Sanders and S. Winkel. Super scalar sample sort. In European Symposium on Algorithms, pages 784--796, 2004.
 
27
Collaborative Colleagues:
Kenneth A. Ross: colleagues
John Cieslewicz: colleagues