|
ABSTRACT
For complex data mining queries, query optimization issues arise, similar to those for the traditional database queries. However, few works have applied the cost-based query optimization, which is the key technique in optimizing traditional database queries, on complex mining queries. In this work, we develop a cost-based query optimization framework to an important collection of data mining queries, i.e. frequent pattern mining across multiple databases. Specifically, we make the following contributions: 1) We present a rich class of queries on mining frequent itemsets across multiple datasets supported by a SQL-based mechanism. 2) We present an approach to enumerate all possible query plans for the mining queries, and develop a dynamic programming approach and a branch-and-bound approach based on the enumeration algorithm to find optimal query plans with the least mining cost. 3) We introduce models to estimate the cost of individual mining operators. 4) We evaluate our query optimization techniques on both real and synthetic datasets and show significant performance improvements.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
Christan Borgelt. Apriori implementation. http://fuzzy.cs.Uni-Magdeburg.de/borgelt/Software. Version 4.08.
|
 |
4
|
Cristian Bucila , Johannes Gehrke , Daniel Kifer , Walker White, DualMiner: a dual-pruning algorithm for itemsets with constraints, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
[doi> 10.1145/775047.775054]
|
 |
5
|
|
 |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
Viviane Crestana and Nandit Soparkar. Mining decentralized data repositories. Technical Report CSE-TR-385-99, University of Michigan Department of Electrical Engineering and Computer Science, 1999.
|
 |
10
|
|
| |
11
|
Sašo Džeroski. Multi-relational data mining: an introduction. SIGKDD Explor. Newsl., 5(1):1--16, 2003.
|
| |
12
|
Bart Goethals and Mohammed J. Zaki. Workshop Report on Workshop on Frequent Itemset Mining Implementations (FIMI). 2003.
|
 |
13
|
Jiawei Han , Jian Pei , Yiwen Yin, Mining frequent patterns without candidate generation, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.1-12, May 15-18, 2000, Dallas, Texas, United States
|
 |
14
|
|
| |
15
|
|
| |
16
|
Viviane Crestana Jensen and Nandit Soparkar. Algebra based optimization strategies for decentralized mining. Technical Report CSE-TR-437-00, University of Michigan, 2000.
|
| |
17
|
|
| |
18
|
|
 |
19
|
|
 |
20
|
Laks V. S. Lakshmanan , Raymond Ng , Jiawei Han , Alex Pang, Optimization of constrained frequent set queries with 2-variable constraints, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.157-168, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
21
|
|
| |
22
|
|
| |
23
|
Rosa Meo, Marco Botta, and Roberto Esposito. Query rewriting in itemset mining. In Proc. of the 6th International Conference On Flexible Query Answering Systems, pages 111--124, June 2004.
|
 |
24
|
Raymond T. Ng , Laks V. S. Lakshmanan , Jiawei Han , Alex Pang, Exploratory mining and pruning optimizations of constrained associations rules, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.13-24, June 01-04, 1998, Seattle, Washington, United States
|
 |
25
|
M. Otey , S. Parthasarathy , A. Ghoting , G. Li , S. Narravula , D. Panda, Towards NIC-based intrusion detection, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2003, Washington, D.C.
[doi> 10.1145/956750.956847]
|
 |
26
|
|
 |
27
|
Sunita Sarawagi , Shiby Thomas , Rakesh Agrawal, Integrating association rule mining with relational database systems: alternatives and implications, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.343-354, June 01-04, 1998, Seattle, Washington, United States
|
| |
28
|
Ramakrishnan Srikant, Quoc Vu, and Rakesh Agrawal. Mining association rules with item constraints. In David Heckerman, Heikki Mannila, Daryl Pregibon, and Ramasamy Uthurusamy, editors, KDD Conference Proceedings, pages 67--73, 1997.
|
 |
29
|
Dick Tsur , Jeffrey D. Ullman , Serge Abiteboul , Chris Clifton , Rajeev Motwani , Svetlozar Nestorov , Arnon Rosenthal, Query flocks: a generalization of association-rule mining, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.1-12, June 01-04, 1998, Seattle, Washington, United States
|
 |
30
|
|
| |
31
|
Craig Utley. Microsoft sql server 9.0 technical articles: Introduction to sql server 2005 data mining. http://technet.microsoft.com/en-us/library/ms345131.aspx.
|
 |
32
|
|
| |
33
|
|
| |
34
|
Y. N. Law, C. R. Luo, H. Wang, and C. Zaniol. Atlas: a turing complete extension of sql for data mining applications and streams. In Posters of the 2003 ACM SIGMOD international conference on Management of data, 2003.
|
| |
35
|
|
| |
36
|
M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In 3rd Intl. Conf. on Knowledge Discovery and Data Mining., August 1997.
|
|