|
ABSTRACT
From the standpoint of supporting human-centered discovery of knowledge, the present-day model of mining association rules suffers from the following serious shortcomings: (i) lack of user exploration and control, (ii) lack of focus, and (iii) rigid notion of relationships. In effect, this model functions as a black-box, admitting little user interaction in between. We propose, in this paper, an architecture that opens up the black-box, and supports constraint-based, human-centered exploratory mining of associations. The foundation of this architecture is a rich set of constraint constructs, including domain, class, and SQL-style aggregate constraints, which enable users to clearly specify what associations are to be mined. We propose constrained association queries as a means of specifying the constraints to be satisfied by the antecedent and consequent of a mined association.
In this paper, we mainly focus on the technical challenges in guaranteeing a level of performance that is commensurate with the selectivities of the constraints in an association query. To this end, we introduce and analyze two properties of constraints that are critical to pruning: anti-monotonicity and succinctness. We then develop characterizations of various constraints into four categories, according to these properties. Finally, we describe a mining algorithm called CAP, which achieves a maximized degree of pruning for all categories of constraints. Experimental results indicate that CAP can run much faster, in some cases as much as 80 times, than several basic algorithms. This demonstrates how important the succinctness and anti-monotonicity properties are, in delivering the performance guarantee.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
 |
5
|
Sergey Brin , Rajeev Motwani , Craig Silverstein, Beyond market baskets: generalizing association rules to correlations, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.265-276, May 11-15, 1997, Tucson, Arizona, United States
|
| |
6
|
|
 |
7
|
Takeshi Fukuda , Yasukiko Morimoto , Shinichi Morishita , Takeshi Tokuyama, Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.13-23, June 04-06, 1996, Montreal, Quebec, Canada
|
 |
8
|
Eui-Hong Han , George Karypis , Vipin Kumar, Scalable parallel data mining for association rules, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.277-288, May 11-15, 1997, Tucson, Arizona, United States
|
| |
9
|
|
 |
10
|
Joseph M. Hellerstein , Peter J. Haas , Helen J. Wang, Online aggregation, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.171-182, May 11-15, 1997, Tucson, Arizona, United States
|
 |
11
|
|
| |
12
|
M. Kamber, J. Hail, and J. Y. Chiang. Metarule-guided mining of multi-dimensional association rules using data cubes. KDD 97, pp 207-210.
|
 |
13
|
Mika Klemettinen , Heikki Mannila , Pirjo Ronkainen , Hannu Toivonen , A. Inkeri Verkamo, Finding interesting rules from large sets of discovered association rules, Proceedings of the third international conference on Information and knowledge management, p.401-407, November 29-December 02, 1994, Gaithersburg, Maryland, United States
[doi> 10.1145/191246.191314]
|
| |
14
|
|
| |
15
|
|
 |
16
|
|
| |
17
|
Raymond T. Ng, Laks V.S. Lakshmanan, Jiawei Han, and Alex Pang. Exploratory Mining and Optimization of Constrained Association Queries. Technical Report, University of British Columbia and Concordia University, October 1997.
|
 |
18
|
Jong Soo Park , Ming-Syan Chen , Philip S. Yu, An effective hash-based algorithm for mining association rules, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.175-186, May 22-25, 1995, San Jose, California, United States
|
| |
19
|
|
 |
20
|
|
| |
21
|
|
 |
22
|
|
| |
23
|
R. Srikant, Q. Vu and R. Agrawal. Mining association rules with item constraints. KDD 97, pp 67-73.
|
| |
24
|
|
| |
25
|
D. Tsur, J. D. Ullman, S. Abitboul, C. Clifton, R. Motwani, and S. Nestorov. Query flocks: A generalization of association-rule mining, http://wwwdb.stanford.edu/,.,ullrnan/pub/qflocks.ps, 1-20, Oct. 1997.
|
CITED BY 141
|
|
|
|
|
|
|
|
|
|
|
S. Parthasarathy , M. J. Zaki , M. Ogihara , S. Dwarkadas, Incremental and interactive sequence mining, Proceedings of the eighth international conference on Information and knowledge management, p.251-258, November 02-06, 1999, Kansas City, Missouri, United States
|
|
|
|
|
|
|
|
|
Yiming Ma , Bing Liu , Ching Kian Wong , Philip S. Yu , Shuik Ming Lee, Targeting the right students using data mining, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.457-464, August 20-23, 2000, Boston, Massachusetts, United States
|
|
|
Biswadeep Nag , Prasad M. Deshpande , David J. DeWitt, Using a knowledge cache for interactive discovery of association rules, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.244-253, August 15-18, 1999, San Diego, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
Anthony K.H. Tung , Hongjun Lu , Jiawei Han , Ling Feng, Breaking the barrier of transactions: mining inter-transaction association rules, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.297-301, August 15-18, 1999, San Diego, California, United States
|
|
|
|
|
|
|
|
|
Bing Liu , Wynne Hsu , Yiming Ma, Mining association rules with multiple minimum supports, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.337-341, August 15-18, 1999, San Diego, California, United States
|
|
|
Venkatesh Ganti , Johannes Gehrke , Raghu Ramakrishnan, A framework for measuring changes in data characteristics, Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.126-137, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
|
|
|
|
|
|
|
|
Bing Liu , Minqing Hu , Wynne Hsu, Multi-level organization and summarization of the discovered rules, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.208-217, August 20-23, 2000, Boston, Massachusetts, United States
|
|
|
|
|
|
Necip Fazil Ayan , Abdullah Uz Tansel , Erol Arkun, An efficient algorithm to update large itemsets with early pruning, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.287-291, August 15-18, 1999, San Diego, California, United States
|
|
|
Minos N. Garofalakis , Rajeev Rastogi , S. Seshadri , Kyuseok Shim, Data mining and the Web: past, present and future, Proceedings of the 2nd international workshop on Web information and data management, p.43-47, November 02-06, 1999, Kansas City, Missouri, United States
|
|
|
Ling Feng , Hongjun Lu , Jeffrey Xu Yu , Jiawei Han, Mining inter-transaction associations with templates, Proceedings of the eighth international conference on Information and knowledge management, p.225-233, November 02-06, 1999, Kansas City, Missouri, United States
|
|
|
Xiuzhen Zhang , Guozu Dong , Ramamohanarao Kotagiri, Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.310-314, August 20-23, 2000, Boston, Massachusetts, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bing Liu , Wynne Hsu , Yiming Ma, Pruning and summarizing the discovered associations, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.125-134, August 15-18, 1999, San Diego, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Daniel Kifer , Johannes Gehrke , Cristian Bucila , Walker White, How to quickly find a witness, Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.272-283, June 09-11, 2003, San Diego, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Gao Cong , Anthony K. H. Tung , Xin Xu , Feng Pan , Jiong Yang, FARMER: finding interesting rule groups in microarray datasets, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
|
|
|
|
|
|
Cristian Bucila , Johannes Gehrke , Daniel Kifer , Walker White, DualMiner: a dual-pruning algorithm for itemsets with constraints, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Guozhu Dong , Jiawei Han , Joyce M. W. Lam , Jian Pei , Ke Wang , Wei Zou, Mining Constrained Gradients in Large Databases, IEEE Transactions on Knowledge and Data Engineering, v.16 n.8, p.922-938, August 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jian Pei , Jiawei Han , Behzad Mortazavi-Asl , Jianyong Wang , Helen Pinto , Qiming Chen , Umeshwar Dayal , Mei-Chun Hsu, Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach, IEEE Transactions on Knowledge and Data Engineering, v.16 n.11, p.1424-1440, November 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Chang-Kai Hsu , Jyh-Cheng Chang , Maiga Chang , Jia-Sheng Heh, Unsupervised reconstruction mechanism to recover the hypermedia structure of instructional materials on the web based on the association lattice of keywords, Proceedings of the 5th WSEAS International Conference on Distance Learning and Web Engineering, p.151-157, August 23-25, 2005, Corfu Island, Greece
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Dong Xin , Jiawei Han , Xiaolei Li , Benjamin W. Wah, Star-cubing: computing iceberg cubes by top-down and bottom-up integration, Proceedings of the 29th international conference on Very large data bases, p.476-487, September 09-12, 2003, Berlin, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Tongyuan Wang , Bipin C. Desai , Huzhan Zheng , Yanjiang Qiao, Knowledge discovery in Chinese medicine, Proceedings of the 2008 C3S2E conference, May 12-13, 2008, Montreal, Quebec, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Francesco Bonchi , Fosca Giannotti , Claudio Lucchese , Salvatore Orlando , Raffaele Perego , Roberto Trasarti, A constraint-based querying system for exploratory pattern discovery, Information Systems, v.34 n.1, p.3-27, March, 2009
|
|
|
|
|
|
|
|
|
María N. Moreno , Saddys Segrera , Vivian F. López , M. José Polo, A method for mining quantitative association rules, Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization, p.173-178, September 22-24, 2006, Lisbon, Portugal
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|