|
ABSTRACT
Dimension attributes in data warehouses are typically hierarchical (e.g., geographic locations in sales data, URLs in Web traffic logs). OLAP tools are used to summarize the measure attributes (e.g., total sales) along a dimension hierarchy, and to characterize changes (e.g., trends and anomalies) in a hierarchical summary over time. When thenumber of changes identified is large (e.g., total sales in many stores differed from their expected values), a parsimonious explanation of the most significant changes is desirable. In this paper, we propose a natural model of parsimonious explanation, as a composition of node weights along the root-to-leaf paths in a dimension hierarchy, which permits changes to be aggregated with maximal generalization along the dimension hierarchy. We formalize this model of explaining changes in hierarchical summaries and investigate the problem of identifying optimally parsimonious explanations on arbitrary rooted one dimensional tree hierarchies. We show that such explanations can be computed efficiently in time essentially proportional to the number of leaves and the depth of the hierarchy. Further, our method can produce parsimonious explanations from the output of any statistical model that provides predictions and confidence intervals, making it widely applicable. Our experiments use real data sets to demonstrate the utility and robustness of our proposed model for explaining significant changes, as well as its superior parsimony compared to alternatives.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
Sanjeev Arora , László Babai , Jacques Stern , Z. Sweedyk, The hardness of approximate optima in lattices, codes, and systems of linear equations, Journal of Computer and System Sciences, v.54 n.2, p.317-331, April 1997
[doi> 10.1006/jcss.1997.1472]
|
| |
4
|
Dhiman Barman, Flip Korn, Divesh Srivastava, Dimitris Gunopulos, Neal E. Young, and Deepak Agarwal. Parsimonious Explanations of Change in Hierarchical Data. In Proc. of ICDE 2007.
|
| |
5
|
Census (population vs. location), 2000-2004. http://www.census.gov/popest/datasets.htm.
|
 |
6
|
|
| |
7
|
Graham Cormode , Flip Korn , S. Muthukrishnan , Divesh Srivastava, Finding hierarchical heavy hitters in data streams, Proceedings of the 29th international conference on Very large data bases, p.464-475, September 09-12, 2003, Berlin, Germany
|
| |
8
|
Graham Cormode and S. Muthukrishnan. What's new: Finding significant differences in network data streams. In Proc. of IEEE INFOCOM, pages 1534--1545, 2004.
|
 |
9
|
Cristian Estan , Stefan Savage , George Varghese, Automatically inferring patterns of resource consumption in network traffic, Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, August 25-29, 2003, Karlsruhe, Germany
[doi> 10.1145/863955.863972]
|
| |
10
|
|
| |
11
|
|
 |
12
|
|
| |
13
|
Panagiotis Karras and Nikos Mamoulis. The Haar+ Tree: a Refined Synopsis Data Structure. In Proc. of the IEEE 23rd ICDE, April 2007.
|
| |
14
|
|
 |
15
|
|
| |
16
|
Laks V. S. Lakshmanan , Raymond T. Ng , Christine Xing Wang , Xiaodong Zhou , Theodore J. Johnson, The generalized MDL approach for summarization, Proceedings of the 28th international conference on Very Large Data Bases, p.766-777, August 20-23, 2002, Hong Kong, China
|
 |
17
|
Yossi Matias , Jeffrey Scott Vitter , Min Wang, Wavelet-based histograms for selectivity estimation, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.448-459, June 01-04, 1998, Seattle, Washington, United States
|
| |
18
|
S. Muthukrishnan. Subquadratic algorithms for workload-aware Haar wavelet synopses. In Proc. of FSTTCS, 2005.
|
| |
19
|
P. J. Harrison. Exponential smoothing and short-term sales forecasting. Management Science, 13(11):821--842, 1967.
|
| |
20
|
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
S. Hill, D. Agarwal, R. Bell, and C. Volinsky. Building an effective representation for dynamic graphs. Journal of Computational and Graphical Statistics, 15:584--608, 2006.
|
| |
25
|
WorldCup 1998. http://ita.ee.lbl.gov/html/contrib/WorldCup.html.
|
 |
26
|
Yin Zhang , Sumeet Singh , Subhabrata Sen , Nick Duffield , Carsten Lund, Online identification of hierarchical heavy hitters: algorithms, evaluation, and applications, Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, October 25-27, 2004, Taormina, Sicily, Italy
[doi> 10.1145/1028788.1028802]
|
 |
27
|
|
CITED BY 4
|
|
|
|
|
|
|
|
|
|
|
Ajay Anil Mahimkar , Zihui Ge , Aman Shaikh , Jia Wang , Jennifer Yates , Yin Zhang , Qi Zhao, Towards automated performance diagnosis in a large IPTV network, ACM SIGCOMM Computer Communication Review, v.39 n.4, October 2009
|
|