|
ABSTRACT
Mining maximal frequent itemsets is one of the most fundamental problems in data mining. In this paper we study the complexity-theoretic aspects of maximal frequent itemset mining, from the perspective of counting the number of solutions. We present the first formal proof that the problem of counting the number of distinct maximal frequent itemsets in a database of transactions, given an arbitrary support threshold, is #P-complete, thereby providing strong theoretical evidence that the problem of mining maximal frequent itemsets is NP-hard. This result is of particular interest since the associated decision problem of checking the existence of a maximal frequent itemset is in P.We also extend our complexity analysis to other similar data mining problems dealing with complex data structures, such as sequences, trees, and graphs, which have attracted intensive research interests in recent years. Normally, in these problems a partial order among frequent patterns can be defined in such a way as to preserve the downward closure property, with maximal frequent patterns being those without any successor with respect to this partial order. We investigate several variants of these mining problems in which the patterns of interest are subsequences, subtrees, or subgraphs, and show that the associated problems of counting the number of maximal frequent patterns are all either #P-complete or #P-hard.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Ramesh C. Agarwal , Charu C. Aggarwal , V. V. V. Prasad, Depth first generation of long patterns, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.108-118, August 20-23, 2000, Boston, Massachusetts, United States
[doi> 10.1145/347090.347114]
|
| |
2
|
|
| |
3
|
|
| |
4
|
T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Satamoto, and S. Arikawa. Efficient substructure discovery from large semi-structured data. In SDM, 2002.
|
 |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
 |
10
|
Dimitrios Gunopulos , Roni Khardon , Heikki Mannila , Sanjeev Saluja , Hannu Toivonen , Ram Sewak Sharma, Discovering all most specific sentences, ACM Transactions on Database Systems (TODS), v.28 n.2, p.140-174, June 2003
[doi> 10.1145/777943.777945]
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
S. O. Kuznetsov. Interpretation on graphs and complexity characteristics of a search for specific patterns. Nauchno-Tekhnicheskaya Informatsiya, Seriya 2 (Automatic Documentation and Mathematical Linguistics), 23(1):23--27, 1989.
|
| |
15
|
|
| |
16
|
C. H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994.
|
| |
17
|
|
| |
18
|
J. S. Provan and M. O. Ball. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM Journal on Computing, 12(4):777--788, 1983.
|
 |
19
|
|
| |
20
|
E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249--260, 1995.
|
| |
21
|
|
| |
22
|
L. G. Valiant. The complexity of computing the permanent. Theoretical Computer Science, 8:189--201, 1979.
|
 |
23
|
|
| |
24
|
|
| |
25
|
|
 |
26
|
|
| |
27
|
M. J. Zaki and M. Ogihara. Theoretical foundations of association rules. In DMKD, 1998.
|
| |
28
|
M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In KDD, 1997.
|
CITED BY 13
|
|
|
|
|
Jian Pei , Yidong Yuan , Xuemin Lin , Wen Jin , Martin Ester , Qing Liu , Wei Wang , Yufei Tao , Jeffrey Xu Yu , Qing Zhang, Towards multidimensional subspace skyline analysis, ACM Transactions on Database Systems (TODS), v.31 n.4, p.1335-1381, December 2006
|
|
|
|
|
|
|
|
|
|
|
|
Lei Chang , Tengjiao Wang , Dongqing Yang , Hua Luan , Shiwei Tang, Efficient algorithms for incremental maintenance of closed sequential patterns in large databases, Data & Knowledge Engineering, v.68 n.1, p.68-106, January, 2009
|
|
|
Bingjun Sun , Prasenjit Mitra , C. Lee Giles, Mining, indexing, and searching for textual chemical molecule information on the web, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
|
|
|
Jian Pei , Haixun Wang , Jian Liu , Ke Wang , Jianyong Wang , Philip S. Yu, Discovering Frequent Closed Partial Orders from Strings, IEEE Transactions on Knowledge and Data Engineering, v.18 n.11, p.1467-1481, November 2006
|
|
|
|
|
|
Ruoming Jin , Scott McCallen , Yuri Breitbart , Dave Fuhry , Dong Wang, Estimating the number of frequent itemsets in a large database, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, March 24-26, 2009, Saint Petersburg, Russia
|
|
|
|
|
|
|
|
|
Zhenglu Yang , Lin Li , Botao Wang , Masaru Kitsuregawa, Towards efficient dominant relationship exploration of the product items on the web, Proceedings of the 22nd national conference on Artificial intelligence, p.1483-1488, July 22-26, 2007, Vancouver, British Columbia, Canada
|
|