| Strategies for processing ad hoc queries on large data warehouses |
| Full text |
Pdf
(245 KB)
|
| Source
|
Data Warehousing and OLAP
archive
Proceedings of the 5th ACM international workshop on Data Warehousing and OLAP
table of contents
McLean, Virginia, USA
Pages: 72 - 79
Year of Publication: 2002
ISBN:1-58113-590-4
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 11, Downloads (12 Months): 68, Citation Count: 5
|
|
|
ABSTRACT
As data warehousing applications grow in size, existing data organizations and access strategies, such as relational tables and B-tree indexes, are becoming increasingly ineffective. The two primary reasons for this are that these datasets involve many attributes and the queries on the data usually involve conditions on small subsets of the attributes. Two strategies are known to address these difficulties well, namely vertical partitioning and bitmap indexes. In this paper, we summarize our experience of implementing a number of bitmap index schemes on vertically partitioned data tables. One important observation is that simply scanning the vertically partitioned data tables is often more efficient than using B-tree based indexes to answer ad hoc range queries on static datasets. For these range queries, compressed bitmap indexes are in most cases more efficient than scanning vertically partitioned tables. We evaluate the performance of two different compression schemes for bitmap indexes stored is various ways. Using the compression scheme called Word-Aligned Hybrid Code (WAH) to store the bitmaps in plain files shows the best overall performance for bitmap indexes. Tests indicate that our bitmap index strategy based on WAH is not only efficient for attributes of low cardinality, say, <100, but also for high-cardinality attributes with 200,000 or more distinct values.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
G. Antoshenkov. Byte-Aligned Bitmap Compression. Technical Report, Oracle Corp., 1994. U.S. Patent number 5,363,098.
|
| |
3
|
|
| |
4
|
|
 |
5
|
|
 |
6
|
|
 |
7
|
|
 |
8
|
|
 |
9
|
|
| |
10
|
|
| |
11
|
M. Jürgens and H.-J. Lenz. Tree Based Indexes vs. Bitmap Indexes - a Performance Study. International Journal of Cooperative Information Systems, 10(3):355--376, 2001.
|
| |
12
|
V. Markl and R. Bayer. Processing Relational OLAP Queries with UB-Trees and Multidimensional Hierarchical Clustering. In Proceedings of DMDW 2000, June 5--6, 2000.
|
| |
13
|
|
 |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
 |
18
|
|
| |
19
|
K. Wu, E. J. Otoo, A. Shoshani, and H. Nordberg. Notes on Design and Implementation of Compressed Bit Vectors. Technical Report LBNL/PUB-3161, Lawrence Berkeley National Laboratory, Berkeley, CA, 2001.
|
CITED BY 6
|
|
|
|
|
Oliver Rübel , Prabhat , Kesheng Wu , Hank Childs , Jeremy Meredith , Cameron G. R. Geddes , Estelle Cormier-Michel , Sean Ahern , Gunther H. Weber , Peter Messmer , Hans Hagen , Bernd Hamann , E. Wes Bethel, High performance multivariate visual data exploration for extremely large data, Proceedings of the 2008 ACM/IEEE conference on Supercomputing, November 15-21, 2008, Austin, Texas
|
|
|
|
|
|
|
|
|
|
|
|
|
|