|
ABSTRACT
In the context of mining for frequent patterns using the standard levelwise algorithm, the following question arises: given the current level and the current set of frequent patterns, what is the maximal number of candidate patterns that can be generated on the next level? We answer this question by providing tight upper bounds, derived from a combinatorial result from the sixties by Kruskal and Katona. Our result is useful to secure existing algorithms from a combinatorial explosion of the number of candidate patterns.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Ramesh C. Agarwal , Charu C. Aggarwal , V. V. V. Prasad, Depth first generation of long patterns, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.108-118, August 20-23, 2000, Boston, Massachusetts, United States
[doi> 10.1145/347090.347114]
|
| |
2
|
|
 |
3
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
4
|
Rakesh Agrawal , Hiekki Mannila , Ramakrishnan Srikant , Hannu Toivonen , A. Inkeri Verkamo, Fast discovery of association rules, Advances in knowledge discovery and data mining, American Association for Artificial Intelligence, Menlo Park, CA, 1996
|
| |
5
|
|
| |
6
|
Agrawal, R. and Srikant, R. 1994b. Fast algorithms for mining association rules. IBM Research Report RJ9839, IBM Alamaden Research Center, San Jose, California. June.
|
| |
7
|
Agrawal, R. and Srikant, R. 1994c. Quest Synthetic Data Generator. IBM Alamaden Research Center, http://www.almaden.ibm.com/software/quest/Resources/index.shtml.
|
 |
8
|
|
| |
9
|
Blake, C. and Merz, C. 1998. UCI Repository of machine learning databases. University of California, Irvine, Dept. of Information and Computer Sciences, http://www.ics.uci.edu/~mlearn/MLRepository.html.
|
| |
10
|
Bollobás, B. 1986. Combinatorics. Cambridge University Press.
|
| |
11
|
|
 |
12
|
Sergey Brin , Rajeev Motwani , Jeffrey D. Ullman , Shalom Tsur, Dynamic itemset counting and implication rules for market basket data, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.255-264, May 11-15, 1997, Tucson, Arizona, United States
|
| |
13
|
|
| |
14
|
Frankl, P. 1984. A new short proof for the Kruskal--Katona theorem. Discrete Mathematics 48, 327--329.
|
| |
15
|
|
| |
16
|
Goethals, B. and Zaki, M. J., Eds. 2003. Proceedings of the Workshop on Frequent Itemset Mining Implementations (FIMI-03), Melbourne Florida, USA, November 19, 2003. CEUR Workshop Proceedings, vol. 90. http://CEUR-WS.org/Vol-90/.
|
 |
17
|
Jiawei Han , Jian Pei , Yiwen Yin, Mining frequent patterns without candidate generation, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.1-12, May 15-18, 2000, Dallas, Texas, United States
|
| |
18
|
Katona, G. 1968. A theorem of finite sets. In Theory Of Graphs. Akadémia Kiadó, 187--207.
|
 |
19
|
|
| |
20
|
Kruskal, J. 1963. The number of simplices in a complex. In Mathematical Optimization Techniques. Univ. of California Press, 251--278.
|
| |
21
|
|
 |
22
|
Junqiang Liu , Yunhe Pan , Ke Wang , Jiawei Han, Mining frequent item sets by opportunistic projection, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
[doi> 10.1145/775047.775081]
|
| |
23
|
|
 |
24
|
Jong Soo Park , Ming-Syan Chen , Philip S. Yu, An effective hash-based algorithm for mining association rules, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.175-186, May 22-25, 1995, San Jose, California, United States
|
| |
25
|
|
| |
26
|
Pei, J., Han, J., and Mao, R. 2000. Closet: An efficient algorithm for mining frequent closed itemsets. ACM SIGMOD'00 Workshop on Research Issues in Data Mining and Knowledge Discovery.
|
| |
27
|
|
| |
28
|
|
| |
29
|
Zaki, M. and Hsiao, C.-J. 2002. CHARM: An efficient algorithm for closed itemset mining. In Proceedings of the Second SIAM International Conference on Data Mining, R. Grossman, J. Han, V. Kumar, H. Mannila, and R. Motwani, Eds. SIAM.
|
| |
30
|
Zaki, M., Parthasarathy, S., Ogihara, M., and Li, W. 1997. New algorithms for fast discovery of association rules. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, D. Heckerman, H. Mannila, and D. Pregibon, Eds. AAAI Press, 283--296.
|
 |
31
|
|
CITED BY 6
|
|
|
|
|
|
|
|
|
|
|
Ruoming Jin , Scott McCallen , Yuri Breitbart , Dave Fuhry , Dong Wang, Estimating the number of frequent itemsets in a large database, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, March 24-26, 2009, Saint Petersburg, Russia
|
|
|
|
|
|
|
|