| Iceberg-cube computation with PC clusters |
| Full text |
Pdf
(278 KB)
|
| Source
|
International Conference on Management of Data
archive
Proceedings of the 2001 ACM SIGMOD international conference on Management of data
table of contents
Santa Barbara, California, United States
Pages: 25 - 36
Year of Publication: 2001
ISBN:1-58113-332-4
Also published in ...
|
|
Authors
|
|
Raymond T. Ng
|
Univ British Columbia, 2366 Main Mall, UBC, Vancouver, BC
|
|
Alan Wagner
|
Univ British Columbia, 2366 Main Mall, UBC, Vancouver, BC
|
|
Yu Yin
|
Univ British Columbia, 2366 Main Mall, UBC, Vancouver, BC
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 1, Downloads (12 Months): 47, Citation Count: 11
|
|
|
ABSTRACT
In this paper, we investigate the approach of using low cost PC cluster to parallelize the computation of iceberg-cube queries. We concentrate on techniques directed towards online querying of large, high-dimensional datasets where it is assumed that the total cube has net been precomputed. The algorithmic space we explore considers trade-offs between parallelism, computation and I/0. Our main contribution is the development and a comprehensive evaluation of various novel, parallel algorithms. Specifically: (1) Algorithm RP is a straightforward parallel version of BUC [BR99]; (2) Algorithm BPP attempts to reduce I/0 by outputting results in a more efficient way; (3) Algorithm ASL, which maintains cells in a cuboid in a skiplist, is designed to put the utmost priority on load balancing; and (4) alternatively, Algorithm PT load-balances by using binary partitioning to divide the cube lattice as evenly as possible.
We present a thorough performance evaluation on all these algorithms on a variety of parameters, including the dimensionality of the cube, the sparseness of the cube, the selectivity of the constraints, the number of processors, and the size of the dataset. A key finding is that it is not a one-algorithm-fit-all situation. We recommend a “recipe” which uses PT as the default algorithm, but may also deploy ASL under specific circumstances.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Sameet Agarwal , Rakesh Agrawal , Prasad Deshpande , Ashish Gupta , Jeffrey F. Naughton , Raghu Ramakrishnan , Sunita Sarawagi, On the Computation of Multidimensional Aggregates, Proceedings of the 22th International Conference on Very Large Data Bases, p.506-521, September 03-06, 1996
|
| |
2
|
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
Jim Gray , Adam Bosworth , Andrew Layman , Hamid Pirahesh, Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total, Proceedings of the Twelfth International Conference on Data Engineering, p.152-159, February 26-March 01, 1996
|
| |
8
|
|
 |
9
|
Venky Harinarayan , Anand Rajaraman , Jeffrey D. Ullman, Implementing data cubes efficiently, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.205-216, June 04-06, 1996, Montreal, Quebec, Canada
|
 |
10
|
Joseph M. Hellerstein , Peter J. Haas , Helen J. Wang, Online aggregation, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.171-182, May 11-15, 1997, Tucson, Arizona, United States
|
| |
11
|
M. Kamber, J. Han and J. Chiang. Metarule-guided mining of multi-dimensional association rules using data cubes. In Proc. 1997 KDD, pp. 207-210.
|
 |
12
|
Raymond T. Ng , Laks V. S. Lakshmanan , Jiawei Han , Alex Pang, Exploratory mining and pruning optimizations of constrained associations rules, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.13-24, June 01-04, 1998, Seattle, Washington, United States
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
 |
19
|
Yihong Zhao , Prasad M. Deshpande , Jeffrey F. Naughton, An array-based algorithm for simultaneous multidimensional aggregates, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.159-170, May 11-15, 1997, Tucson, Arizona, United States
|
CITED BY 11
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ying Chen , Frank Dehne , Todd Eavis , Andrew Rau-Chaplin, PnP: sequential, external memory, and parallel iceberg cube computation, Distributed and Parallel Databases, v.23 n.2, p.99-126, April 2008
|
|
|
Laks V. S. Lakshmanan , Raymond T. Ng , Christine Xing Wang , Xiaodong Zhou , Theodore J. Johnson, The generalized MDL approach for summarization, Proceedings of the 28th international conference on Very Large Data Bases, p.766-777, August 20-23, 2002, Hong Kong, China
|
|
|
|
|
|
|
|
|
|
|