| Wavelet synopsis for data streams: minimizing non-euclidean error |
| Full text |
Pdf
(304 KB)
|
| Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
table of contents
Chicago, Illinois, USA
SESSION: Research track paper
table of contents
Pages: 88 - 97
Year of Publication: 2005
ISBN:1-59593-135-X
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 80, Citation Count: 14
|
|
|
ABSTRACT
We consider the wavelet synopsis construction problem for data streams where given n numbers we wish to estimate the data by constructing a synopsis, whose size, say B is much smaller than n. The B numbers are chosen to minimize a suitable error between the original data and the estimate derived from the synopsis.Several good one-pass wavelet construction streaming algorithms minimizing the l2 error exist. For other error measures, the problem is less understood. We provide the first one-pass small space streaming algorithms with provable error guarantees (additive approximation) for minimizing a variety of non-Euclidean error measures including all weighted lp (including l∞) and relative error lp metrics.In several previous works solutions (for weighted l2, l∞ and maximum relative error) where the B synopsis coefficients are restricted to be wavelet coefficients of the data were proposed. This restriction yields suboptimal solutions on even fairly simple examples. Other lines of research, such as probabilistic synopsis, imposed restrictions on how the synopsis was arrived at. To the best of our knowledge this paper is the first paper to address the general problem, without any restriction on how the synopsis is arrived at, as well as provide the first streaming algorithms with guaranteed performance for these classes of error measures.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
|
 |
4
|
|
 |
5
|
|
 |
6
|
|
| |
7
|
|
 |
8
|
Anna C. Gilbert , Sudipto Guha , Piotr Indyk , Yannis Kotidis , S. Muthukrishnan , Martin J. Strauss, Fast, small-space algorithms for approximate histogram maintenance, Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, May 19-21, 2002, Montreal, Quebec, Canada
[doi> 10.1145/509907.509966]
|
| |
9
|
|
| |
10
|
|
| |
11
|
S. Guha, C. Kim, and K. Shim. XWAVE: Optimal and approximate extended wavelets for streaming data. VLDB Conference, 2004.
|
| |
12
|
|
 |
13
|
|
| |
14
|
|
 |
15
|
Eamonn Keogh , Kaushik Chakrabarti , Michael Pazzani , Sharad Mehrotra, Locally adaptive dimensionality reduction for indexing large time series databases, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.151-162, May 21-24, 2001, Santa Barbara, California, United States
|
| |
16
|
Y. Matias and D. Urieli. Manuscript, 2004.
|
| |
17
|
Y. Matias and D. Urieli. Optimal workload-based wavelet synopses. Proc. of ICDT, 2005.
|
| |
18
|
|
 |
19
|
Yossi Matias , Jeffrey Scott Vitter , Min Wang, Wavelet-based histograms for selectivity estimation, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.448-459, June 01-04, 1998, Seattle, Washington, United States
|
| |
20
|
S. Muthukrishnan. Workload optimal wavelet synopsis. DIMACS TR, 2004.
|
| |
21
|
G. Strang and T. Nguyen. Wavelets and Filter Banks. Wellesley-Cambridge Press, 1996.
|
CITED BY 14
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Haixun Wang , Jian Yin , Jian Pei , Philip S. Yu , Jeffrey Xu Yu, Suppressing model overfitting in mining concept-drifting data streams, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|