ACM Home Page
Please provide us with feedback. Feedback
A new histogram method for sparse attributes: the averaged rectangular attribute cardinality map
Full text PdfPdf (154 KB)
Source ACM International Conference Proceeding Series; Vol. 49 archive
Proceedings of the 1st international symposium on Information and communication technologies table of contents
Dublin, Ireland
SESSION: Data mining and database technology table of contents
Pages: 119 - 125  
Year of Publication: 2003
Authors
B. John Oommen  Carleton University, Ottawa; Canada
Jing Chen  Carleton University, Ottawa; Canada
Publisher
Trinity College Dublin 
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 21,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

Most current Database Management Systems (DBMS) use histograms in their query optimization, and in approximating query result sizes. This is because they can be utilized in determining efficient query evaluation plans. All the existing methods perform poorly when the attributes of a relation are very sparsely distributed, also called the "sparse data cases". These cases are the worst-cases scenarios for attributes with skewed distributions. In this paper, we propose a novel histogram-based algorithm, namely the Averaged Rectangular Attribute Cardinality Map (Averaged R-ACM), and demonstrate its performance in estimating query result sizes for the sparse data cases. Our proposed algorithm combines the advantages of the traditional widely-used histogram-based algorithm, namely the Equi-width histogram, and a relatively new algorithm, namely the R-ACM2 introduced in [Thi99]. The goals of compacting the sparse data distribution and of obtaining accurate estimates of query result sizes are achieved by utilizing this algorithm. The superiority of this algorithm is also validated by an extensive set of experiments. And the entire set of experimental results obtained by integrating the underlying algorithm and other histogram-based algorithms into the ORACLE query optimizer can be found in [Che03].


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
{Che03} J. Chen, On Utilizing New Histogram-Based Methods for Query Optimization, MCs. Thesis, School of Computer Science, Carleton Univ., Ottawa, Canada, February 2003.
2
 
3
 
4
{Ioa95} Yannis Ioannidis, "Histogram-Based Solutions to Diverse Databas Estimation Problems", In IEEE Computer Society, Vol. 8 No. 3, Sep 1995, pp. 10--18.
5
6
 
7
8
9
 
10
{Thi99} Murali Thiyagarajah, "Attribute Cardinality Maps: New Query Result Size Estimation Techniques for Database Systems", Ph.D. Thesis, School of Computer Science, Carleton Univ., Ottawa, Canada, May 1999.
 
11
{OoR02} B. J. Oommen and L. G. Rueda, "The Efficiency of Histogram-like Techniques for Database Query Optimization", In The Computer Journal, Vol. 45, No. 5, 2002, pp. 494--510.
 
12
 
13
{OoT01} B. J. Oommen, and M. Thiyagarajah, "Benchmarking Attribute Cardinality Maps for Database Systems Using the TPC-D Specifications", To appear in the IEEE Transactions on Systems, Man and Cybernetics.
14
15
16
17

Collaborative Colleagues:
B. John Oommen: colleagues
Jing Chen: colleagues