|
ABSTRACT
Existing hierarchical summarization techniques fail to provide synopses good in terms of relative-error metrics. This paper introduces multiplicative synopses: a summarization paradigm tailored for effective relative-error summarization. This paradigm is inspired from previous hierarchical index-based summarization schemes, but goes beyond them by altering their underlying data representation mechanism. Existing schemes have decomposed the summarized data based on sums and differences of values, resulting in what we call additive synopses. We argue that the incapacity of these models to handle relative-error metrics stems exactly from this additive nature of their representation mechanism. We substitute this additive nature by a multiplicative one. We argue that this is more appropriate for achieving low-relative-error data approximations. We develop an efficient linear-time dynamic programming scheme for one-dimensional multiplicative synopsis construction under general relative-error-based metrics, and a special scheme for the case of maximum relative error. We generalize our schemes to higher data dimensionality and we show a surprising additional benefit gained by our special scheme for maximum relative error in this case. In our experimental study, we verify the higher efficacy of our model on relative-error-oriented summarization problems.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Deepak Agarwal , Dhiman Barman , Dimitrios Gunopulos , Neal E. Young , Flip Korn , Divesh Srivastava, Efficient and effective explanation of change in hierarchical summaries, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
[doi> 10.1145/1281192.1281197]
|
 |
3
|
Nicolas Bruno , Surajit Chaudhuri , Luis Gravano, STHoles: a multidimensional workload-aware histogram, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.211-222, May 21-24, 2001, Santa Barbara, California, United States
|
| |
4
|
|
 |
5
|
|
| |
6
|
S. Chen and A. Nucci. Dynamic nonuniform data approximation in databases with Haar wavelet. Journal of Computers, 2(8):64--76, 2007 (also DCC 2007).
|
| |
7
|
G. Cormode, M. Garofalakis, and D. Sacharidis. Fast approximate wavelet tracking on streams. In EDBT, 2006.
|
 |
8
|
|
 |
9
|
Amol Deshpande , Minos Garofalakis , Rajeev Rastogi, Independence is good: dependency-based histogram synopses for high-dimensional data, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.199-210, May 21-24, 2001, Santa Barbara, California, United States
|
 |
10
|
|
 |
11
|
|
 |
12
|
|
 |
13
|
Anna C. Gilbert , Sudipto Guha , Piotr Indyk , Yannis Kotidis , S. Muthukrishnan , Martin J. Strauss, Fast, small-space algorithms for approximate histogram maintenance, Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, May 19-21, 2002, Montreal, Quebec, Canada
[doi> 10.1145/509907.509966]
|
 |
14
|
Anna C. Gilbert , Yannis Kotidis , S. Muthukrishnan , Marin J. Strauss, Optimal and approximate computation of summary statistics for range aggregates, Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.227-236, May 2001, Santa Barbara, California, United States
[doi> 10.1145/375551.375598]
|
| |
15
|
|
| |
16
|
S. Guha and B. Harb. Approximation algorithms for wavelet transform coding of data streams. IEEE Transactions on Information Theory, 54(2):811--830, 2008 (also SODA 2006).
|
| |
17
|
|
| |
18
|
|
 |
19
|
|
 |
20
|
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
 |
25
|
H. V. Jagadish , Hui Jin , Beng Chin Ooi , Kian-Lee Tan, Global optimization of histograms, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.223-234, May 21-24, 2001, Santa Barbara, California, United States
|
| |
26
|
H. V. Jagadish , Nick Koudas , S. Muthukrishnan , Viswanath Poosala , Kenneth C. Sevcik , Torsten Suel, Optimal Histograms with Quality Guarantees, Proceedings of the 24rd International Conference on Very Large Data Bases, p.275-286, August 24-27, 1998
|
| |
27
|
|
| |
28
|
P. Karras and N. Mamoulis. The Haar+ tree: a refined synopsis data structure. In ICDE, 2007.
|
 |
29
|
|
| |
30
|
|
 |
31
|
Nick Koudas , S. Muthukrishnan , Divesh Srivastava, Optimal histograms for hierarchical range queries (extended abstract), Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.196-204, May 15-18, 2000, Dallas, Texas, United States
[doi> 10.1145/335168.335223]
|
 |
32
|
Ju-Hong Lee , Deok-Hwan Kim , Chin-Wan Chung, Multi-dimensional selectivity estimation using compressed histogram information, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.205-214, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
33
|
|
 |
34
|
Yossi Matias , Jeffrey Scott Vitter , Min Wang, Wavelet-based histograms for selectivity estimation, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.448-459, June 01-04, 1998, Seattle, Washington, United States
|
| |
35
|
M. Muralikrishna and D. J. DeWitt. Equi-depth histograms for estimating selectivity factors for multi-dimensional queries. In SIGMOD, 1988.
|
| |
36
|
S. Muthukrishnan. Subquadratic algorithms for workload-aware Haar wavelet synopses. In FSTTCS, 2005.
|
| |
37
|
|
| |
38
|
|
| |
39
|
S. Muthukrishnan, M. Strauss, and X. Zheng. Workload-optimal histograms on streams. In ESA, 2005.
|
| |
40
|
|
 |
41
|
|
| |
42
|
V. Poosala, V. Ganti, and Y. E. Ioannidis. Approximate query answering using histograms. IEEE Data Eng. Bull., 22(4):5--14, 1999.
|
| |
43
|
|
 |
44
|
Viswanath Poosala , Peter J. Haas , Yannis E. Ioannidis , Eugene J. Shekita, Improved histograms for selectivity estimation of range predicates, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.294-305, June 04-06, 1996, Montreal, Quebec, Canada
|
 |
45
|
|
| |
46
|
|
| |
47
|
|
| |
48
|
E. Terzi and P. Tsaparas. Efficient algorithms for sequence segmentation. In SIAM SDM, 2006.
|
 |
49
|
|
 |
50
|
|
|