|
ABSTRACT
A range query applies an aggregation operation over all selected cells of an OLAP data cube where the selection is specified by providing ranges of values for numeric dimensions. We present fast algorithms for range queries for two types of aggregation operations: SUM and MAX. These two operations cover techniques required for most popular aggregation operations, such as those supported by SQL.
For range-sum queries, the essential idea is to precompute some auxiliary information (prefix sums) that is used to answer ad hoc queries at run-time. By maintaining auxiliary information which is of the same size as the data cube, all range queries for a given cube can be answered in constant time, irrespective of the size of the sub-cube circumscribed by a query. Alternatively, one can keep auxiliary information which is 1/bd of the size of the d-dimensional data cube. Response to a range query may now require access to some cells of the data cube in addition to the access to the auxiliary information, but the overall time complexity is typically reduced significantly. We also discuss how the precomputed information is incrementally updated by batching updates to the data cube. Finally, we present algorithms for choosing the subset of the data cube dimensions for which the auxiliary information is computed and the blocking factor to use for each such subset.
Our approach to answering range-max queries is based on precomputed max over balanced hierarchical tree structures. We use a branch-and-bound-like procedure to speed up the finding of max in a region. We also show that with a branch-and-bound procedure, the average-case complexity is much smaller than the worst-case complexity.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
AAD+96
|
Sameet Agarwal , Rakesh Agrawal , Prasad Deshpande , Ashish Gupta , Jeffrey F. Naughton , Raghu Ramakrishnan , Sunita Sarawagi, On the Computation of Multidimensional Aggregates, Proceedings of the 22th International Conference on Very Large Data Bases, p.506-521, September 03-06, 1996
|
| |
AGS97
|
|
 |
Ben80
|
|
 |
BF79
|
|
 |
BKSS90
|
Norbert Beckmann , Hans-Peter Kriegel , Ralf Schneider , Bernhard Seeger, The R*-tree: an efficient and robust access method for points and rectangles, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.322-331, May 23-26, 1990, Atlantic City, New Jersey, United States
|
| |
BR91
|
M. Berger and I. Regoutsos. An algorithm point clustering and grid generation. IEEE Transactions on Systems, Man and Cybernetics, 21(5):1278-86, 1991.
|
 |
Cha90
|
|
| |
CLS86
|
G.D. Cohen, A.C. Lobstein, and N.j.A. Sloane. b-k~her results on the covering radius of codes. IEJEB Trans. Information Theory, IT-32(5):680- 694, September 1986.
|
| |
CM89
|
|
| |
Cod93
|
E.F. Codd. Providing OLAP (on-line analytical processing) to user-analysts: An IT mandate. Technical report, E. F. Codd and Associates, 1993.
|
 |
Col96
|
|
 |
Com79
|
|
 |
CR89
|
|
| |
CS94
|
|
| |
GBLP96
|
Jim Gray , Adam Bosworth , Andrew Layman , Hamid Pirahesh, Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total, Proceedings of the Twelfth International Conference on Data Engineering, p.152-159, February 26-March 01, 1996
|
| |
GHQ95
|
|
| |
GHRU97
|
|
 |
HBA97
|
Ching-Tien Ho , Jehoshua Bruck , Rakesh Agrawal, Partial-sum queries in OLAP data cubes using covering codes, Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.228-237, May 11-15, 1997, Tucson, Arizona, United States
[doi> 10.1145/263661.263686]
|
 |
HRU96
|
Venky Harinarayan , Anand Rajaraman , Jeffrey D. Ullman, Implementing data cubes efficiently, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.205-216, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
IBM95
|
IBM. DB~ SQL Reference }or Common Servers Version ~, 1995.
|
| |
JD88
|
|
| |
JS96
|
T. Johnson and D. Shasha. Hierarchically sprit cube forests for decision support: description and tuned design, 1996. Working Paper.
|
| |
Lom95
|
D. Lomet, editor. Special Issue on Materialized Views and Data Warehousing. IEEE Data Engineering Bulletin, 18(2), June 1995.
|
| |
Meh84
|
|
| |
Mic92
|
|
| |
Mit70
|
L. Mitten. Branch and bound methods: General formulation and properties. Operations Research, 18:24-34, 1970.
|
| |
OLA96
|
The CLAP Council. MD-API the OLAP Applzcation Program Interface Version 0.5 Specification, September 1996.
|
| |
RND77
|
E.M. Reingold, J. Nievergelt, and N. Dec. Combinatorial Algorithms. Prentice-Hall, Englewood Cliffs. N J, 1977.
|
| |
Sam89
|
|
| |
SAM96
|
|
| |
SB95
|
P. Schroeter and j. Bigun. Hierarchical image segmentation by multi-dimensional clustering and orientation-adaptive boundary refinement. Pattern Recognition, 25(5):695-709, May 1995.
|
| |
SDNR96
|
|
| |
SR96
|
B. Salzberg and A. Reuter. Indexing for aggregation, 1996. Working Paper.
|
| |
STL89
|
|
 |
Vai85
|
|
 |
WL85
|
|
| |
Yao85
|
Andrew Yao. On the complexity of maintaining partial sums. SIAM J. Computing, 14(2):277- 288, May 1985.
|
| |
YL95
|
|
CITED BY 68
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ching-Tien Ho , Jehoshua Bruck , Rakesh Agrawal, Partial-sum queries in OLAP data cubes using covering codes, Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.228-237, May 11-15, 1997, Tucson, Arizona, United States
|
|
|
Jeffrey Scott Vitter , Min Wang , Bala Iyer, Data cube approximation and histograms via wavelets, Proceedings of the seventh international conference on Information and knowledge management, p.96-104, November 02-07, 1998, Bethesda, Maryland, United States
|
|
|
|
|
|
John R. Smith , Vittorio Castelli , Anant Jhingran , Chung-Sheng Li, Dynamic assembly of views in data cubes, Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.274-283, June 01-04, 1998, Seattle, Washington, United States
|
|
|
S. Geffner , D. Agrawal , A. El Abbadi , T. Smith, Browsing large digital library collections using classification hierarchies, Proceedings of the eighth international conference on Information and knowledge management, p.195-201, November 02-06, 1999, Kansas City, Missouri, United States
|
|
|
Yi-Leh Wu , Divyakant Agrawal , Amr El Abbadi, Using wavelet decomposition to support progressive and approximate range-sum queries over data cubes, Proceedings of the ninth international conference on Information and knowledge management, p.414-421, November 06-11, 2000, McLean, Virginia, United States
|
|
|
|
|
|
|
|
|
Zheng Xuan Loh , Tok Wang Ling , Chuan Heng Ang , Sin Yeung Lee, Analysis of pre-computed partition top method for range top-k queries in OLAP data cubes, Proceedings of the eleventh international conference on Information and knowledge management, November 04-09, 2002, McLean, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xuemin Lin , Qing Liu , Yidong Yuan , Xiaofang Zhou, Multiscale histograms: summarizing topological relations in large spatial datasets, Proceedings of the 29th international conference on Very large data bases, p.814-825, September 09-12, 2003, Berlin, Germany
|
|
|
|
|
|
|
|
|
Cuiping Li , Beng Chin Ooi , Anthony K. H. Tung , Shan Wang, DADA: a data cube for dominant relationship analysis, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|