ACM Home Page
Please provide us with feedback. Feedback
A learning-based approach to estimate statistics of operators in continuous queries: a case study
Full text PdfPdf (199 KB)
Source Data Mining And Knowledge Discovery archive
Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery table of contents
San Diego, California
SESSION: Data streams II table of contents
Pages: 66 - 72  
Year of Publication: 2003
Authors
Like Gao  George Mason University, VA
Min Wang  IBM T. J. Watson Research Center, NY
X. Sean Wang  George Mason University, VA
Sriram Padmanabhan  IBM T. J. Watson Research Center, NY
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 19,   Citation Count: 1
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/882082.882097
What is a DOI?

ABSTRACT

Statistic estimation such as output size estimation of operators is a well-studied subject in the database research community, mainly for the purpose of query optimization. The assumption, however, is that queries are ad-hoc and therefore the emphasis has been on capturing the data distribution. When long standing continuous queries on a changing database are concerned, a more direct approach, namely building an estimation model for each operator, is possible. In this paper, we propose a novel learning-based method. Our method consists of two steps. The first step is to design a dedicated feature extraction algorithm that can be used incrementally to obtain feature values from the underlying data. The second step is to use a data mining algorithm to generate an estimation model based on the feature values extracted from the historical data. To illustrate the approach, this paper studies the case of similarity-based searches over streaming time series. Experimental results show this approach provides accurate statistic estimates with a low overhead.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
4
5
 
6
J. Chen, D. J. DeWitt, and J. F. Naughton. Design and evaluation of alternative selection placement strategies in optimizing continuous queries. In ICDE Conference, 2002.
7
8
9
10
11
12
 
13
 
14
E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra. Dimensionality reduction for fast similarity search in large time series databases. Knowledge and Information Systems, 3(3):263--286, 2000.
 
15
16
 
17
 
18
 
19
S. Madden and M. J. Franklin. Fjording the stream: An architecture for queries over streaming sensor data. In ICDE Conference, 2002.
20
21
 
22
 
23
24
25
26
 
27
Jeffrey D. Taft. The Sliding DFT Page. On-line. http://www.nauticom.net/www/jdtaft/DFT_increm.htm.
28
29

Collaborative Colleagues:
Like Gao: colleagues
Min Wang: colleagues
X. Sean Wang: colleagues
Sriram Padmanabhan: colleagues