|
ABSTRACT
Query optimizers in object-relational database management systems typically require users to provide the execution cost models of user-defined functions (UDFs). Despite this need, however, there has been little work done to provide such a model. The existing approaches are static in that they require users to train the model a priori with pregenerated UDF execution cost data. Static approaches can not adapt to changing UDF execution patterns and thus degrade in accuracy when the UDF executions used for generating training data do not reflect the patterns of those performed during operation. This article proposes a new approach based on the recent trend of self-tuning DBMS by which the cost model is maintained dynamically and incrementally as UDFs are being executed online. In the context of UDF cost modeling, our approach faces a number of challenges, that is, it should work with limited memory, work with limited computation time, and adjust to the fluctuations in the execution costs (e.g., caching effect). In this article, we first provide a set of guidelines for developing techniques that meet these challenges, while achieving accurate and fast cost prediction with small overheads. Then, we present two concrete techniques developed under the guidelines. One is an instance-based technique based on the conventional k-nearest neighbor (KNN) technique which uses a multidimensional index like the R*-tree. The other is a summary-based technique which uses the quadtree to store summary values at multiple resolutions. We have performed extensive performance evaluations comparing these two techniques against existing histogram-based techniques and the KNN technique, using both real and synthetic UDFs/data sets. The results show our techniques provide better performance in most situations considered.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Norbert Beckmann , Hans-Peter Kriegel , Ralf Schneider , Bernhard Seeger, The R*-tree: an efficient and robust access method for points and rectangles, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.322-331, May 23-26, 1990, Atlantic City, New Jersey, United States
|
| |
3
|
Bogartz, R. S. 1994. An Introduction to the Analysis of Variance. Praeger Publishers.
|
 |
4
|
|
| |
5
|
Boulos, J., Viemont, Y., and Ono, K. 1997. A neural network approach for query cost evaluation. Trans. Inf. Process. Soc. Japan 38, 12, 2566--2575.
|
 |
6
|
Nicolas Bruno , Surajit Chaudhuri , Luis Gravano, STHoles: a multidimensional workload-aware histogram, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.211-222, May 21-24, 2001, Santa Barbara, California, United States
|
| |
7
|
Francesco Buccafurri , Filippo Furfaro , Domenico Sacca , Cristina Sirangelo, A quad-tree based multiresolution approach for two-dimensional summary data, Proceedings of the 15th international conference on Scientific and statistical database management, p.127-140, July 09-11, 2003, Cambridge, MA
[doi> 10.1109/SSDM.2003.1214974]
|
| |
8
|
Chang, C. L. 1974. Finding prototypes for nearest neighbor classifiers. IEEE Trans. Comput. 23, 11, 1179--1184.
|
| |
9
|
Chaudhuri, S. 1999. Self-tuning databases and application tuning. IEEE Data Eng. Bull. 22, 2, 3--46.
|
| |
10
|
Chaudhuri, S., Christensen, E., and Graefe, G. 1999. Self-tuning technology in microsoft sql server. IEEE Data Eng. Bull. 22, 2, 20--26.
|
 |
11
|
|
 |
12
|
|
 |
13
|
|
 |
14
|
|
 |
15
|
Amol Deshpande , Minos Garofalakis , Rajeev Rastogi, Independence is good: dependency-based histogram synopses for high-dimensional data, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.199-210, May 21-24, 2001, Santa Barbara, California, United States
|
| |
16
|
|
 |
17
|
|
| |
18
|
|
| |
19
|
Hart, P. E. 1968. The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 14, 3, 515--516.
|
| |
20
|
He, Z., Lee, B. S., and Snapp, R. 2004. Self-tuning UDF cost-modeling using the memory limited quadtree. In Proceedings of the International Conference on Extending Database Technology (EDBT'04). 513--531.
|
 |
21
|
|
 |
22
|
|
 |
23
|
|
| |
24
|
Jiang, S., Lee, B. S., and He, Z. 2003. The cost modeling of spatial operators using nonparametric regression. Tech. rep. CS-03-17, Department of Computer Science, University of Vermont. (Submitted for publication).
|
| |
25
|
Jolliffe, I. 1986. Principal Component Analysis. Springer-Verlag.
|
 |
26
|
Kihong Kim , Sang K. Cha , Keunjoo Kwon, Optimizing multidimensional index trees for main memory access, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.139-150, May 21-24, 2001, Santa Barbara, California, United States
|
 |
27
|
|
| |
28
|
Lee, B. S., Chen, L., Buzas, J., and Kannoth, V. 2004. Regression-based self-tuning modeling of smooth user-defined function costs for an object-relational database management system query optimizer. Comput. J. 47, 6 (Nov.), 673--693.
|
| |
29
|
Lee, B. S., Kannoth, V., and Buzas, J. 2003. A statistical cost-modeling of financial time series functions for an object-relational DBMS query optimizer. Technical rep. CS-03-10, Department of Computer Science, University of Vermont (March). (Submitted for publication).
|
 |
30
|
Mong Li Lee , Masaru Kitsuregawa , Beng Chin Ooi , Kian-Lee Tan , Anirban Mondal, Towards self-tuning data placement in parallel database systems, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.225-236, May 15-18, 2000, Dallas, Texas, United States
|
 |
31
|
|
 |
32
|
|
| |
33
|
|
| |
34
|
Pennsylvania. Last viewed:6-18-2003. PSADA---Data Download---Urban Areas. Available at URL:http://www.pasda.psu.edu/access/urban.shtml.
|
| |
35
|
|
| |
36
|
Procopiuc, O., Agarwal, P. K., Arge, L., and Vitter, J. S. 2003. Bkd-tree: A dynamic scalable kd-tree. In Proceedings of the 8th International Symposium on Spatial and Temporal Databases. 46--65.
|
| |
37
|
|
 |
38
|
|
| |
39
|
|
| |
40
|
Stone, C. J. 1977. Consistent nonparametric regression. Annals of Statistics 5, 4, 595--645.
|
| |
41
|
Thorburn, W. M. 1915. Occam's razor. Mind 24, 287--288.
|
 |
42
|
|
| |
43
|
VanHorn, D., Lee, B. S., Buzas, J., and Thompson, P. 2003. Metadata-based generation of statistical cost functions for text search. Tech. rep. CS-03-13, Department of Computer Science, University of Vermont.
|
| |
44
|
Wand, M. P. and Jones, M. C. 1995. Kernel Smoothing Monographs on Statistics and Applied Probability. Chapman & Hill.
|
| |
45
|
Wilson, D. L. 1972. Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst., Man and Cybern. 2, 4, 408--421.
|
| |
46
|
|
| |
47
|
Zipf, G. K. 1949. Human behavior and the Principle of Least Effort. Addison-Wesley.
|
|