|
ABSTRACT
Data stored in a data warehouse are inherently multidimensional, unlike most data-pruning techniques (such as iceberg and top-k queries). However, analysts need to issue multidimensional queries. For example, an analyst may need to select not just the most profitable stores or---separately---the most profitable products, but simultaneous sets of stores and products fulfilling some profitability constraints. To fill this need, we propose a new operator, the diamond dice. Because of the interaction between dimensions, the computation of diamonds is challenging. We present the first diamond-dicing experiments on large data sets. Our external memory algorithm avoids potentially expensive random accesses. Experiments show that we can compute diamond cubes over fact tables containing 100 million facts and 500,000 distinct attribute values in less than an hour using a single-core PC.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
C. Anderson. The long tail. Hyperion, 2006.
|
| |
2
|
K. Aouiche, D. Lemire, and R. Godin. Collaborative OLAP with tag clouds: Web 2.0 OLAP formalism and experimental evaluation. In WEBIST'08, 2008.
|
 |
3
|
|
 |
4
|
|
| |
5
|
J. Bennett and S. Lanning. The Netflix prize. In KDD Cup and Workshop 2007, 2007.
|
| |
6
|
|
 |
7
|
|
 |
8
|
|
 |
9
|
|
| |
10
|
|
| |
11
|
J. O. Engene. Five decades of terrorism in Europe: The TWEED dataset. Journal of Peace Research, 44(1):109--121, 2007.
|
| |
12
|
|
| |
13
|
|
| |
14
|
R. Godin, R. Missaoui, and H. Alaoui. Incremental concept formation algorithms based on Galois (concept) lattices. Computational Intelligence, 11:246--267, 1995.
|
| |
15
|
Jim Gray , Adam Bosworth , Andrew Layman , Hamid Pirahesh, Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total, Proceedings of the Twelfth International Conference on Data Engineering, p.152-159, February 26-March 01, 1996
|
| |
16
|
S. Hettich and S. D. Bay. The UCI KDD archive. http://kdd.ics.uci.edu (checked 2008-04-28), 2000.
|
| |
17
|
|
 |
18
|
|
 |
19
|
Cuiping Li , Beng Chin Ooi , Anthony K. H. Tung , Shan Wang, DADA: a data cube for dominant relationship analysis, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
[doi> 10.1145/1142473.1142547]
|
| |
20
|
|
 |
21
|
Zheng Xuan Loh , Tok Wang Ling , Chuan Heng Ang , Sin Yeung Lee, Analysis of pre-computed partition top method for range top-k queries in OLAP data cubes, Proceedings of the eleventh international conference on Information and knowledge management, November 04-09, 2002, McLean, Virginia, USA
[doi> 10.1145/584792.584806]
|
| |
22
|
|
| |
23
|
|
| |
24
|
Netflix, Inc. Nexflix prize. http://www.netflixprize.com (checked 2008-04-28), 2007.
|
| |
25
|
J. Pei, M. Cho, and D. Cheung. Cross table cubing: Mining iceberg cubes from data warehouses. In SDM'05, 2005.
|
| |
26
|
D. N. Politis, J. P. Romano, and M. Wolf. Subsampling. Springer, 1999.
|
| |
27
|
|
| |
28
|
H. Webb. Properties and applications of diamond cubes. In ICSOFT 2007 -- Doctoral Consortium, 2007.
|
| |
29
|
H. Webb, O. Kaser, and D. Lemire. Pruning attribute values from data cubes with diamond dicing. Technical Report TR-08-011, Computer Science and Applied Statistics, University of New Brunswick Saint John, 2008. available from http://http://arxiv.org/abs/0805.0747.
|
| |
30
|
Dong Xin , Jiawei Han , Xiaolei Li , Benjamin W. Wah, Star-cubing: computing iceberg cubes by top-down and bottom-up integration, Proceedings of the 29th international conference on Very large data bases, p.476-487, September 09-12, 2003, Berlin, Germany
|
| |
31
|
K. Yang. Information retrieval on the web. Annual Review of Information Science and Technology, 39:33--81, 2005.
|
| |
32
|
|
|