ACM Home Page
Please provide us with feedback. Feedback
The complexity of mining maximal frequent itemsets and maximal frequent patterns
Full text PdfPdf (243 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Seattle, WA, USA
SESSION: Research track papers table of contents
Pages: 344 - 353  
Year of Publication: 2004
ISBN:1-58113-888-1
Author
Guizhen Yang  University at Buffalo, The State University of New York, Buffalo, NY
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 145,   Citation Count: 13
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1014052.1014091
What is a DOI?

ABSTRACT

Mining maximal frequent itemsets is one of the most fundamental problems in data mining. In this paper we study the complexity-theoretic aspects of maximal frequent itemset mining, from the perspective of counting the number of solutions. We present the first formal proof that the problem of counting the number of distinct maximal frequent itemsets in a database of transactions, given an arbitrary support threshold, is #P-complete, thereby providing strong theoretical evidence that the problem of mining maximal frequent itemsets is NP-hard. This result is of particular interest since the associated decision problem of checking the existence of a maximal frequent itemset is in P.We also extend our complexity analysis to other similar data mining problems dealing with complex data structures, such as sequences, trees, and graphs, which have attracted intensive research interests in recent years. Normally, in these problems a partial order among frequent patterns can be defined in such a way as to preserve the downward closure property, with maximal frequent patterns being those without any successor with respect to this partial order. We investigate several variants of these mining problems in which the patterns of interest are subsequences, subtrees, or subgraphs, and show that the associated problems of counting the number of maximal frequent patterns are all either #P-complete or #P-hard.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
 
4
T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Satamoto, and S. Arikawa. Efficient substructure discovery from large semi-structured data. In SDM, 2002.
5
 
6
 
7
 
8
 
9
10
 
11
 
12
 
13
 
14
S. O. Kuznetsov. Interpretation on graphs and complexity characteristics of a search for specific patterns. Nauchno-Tekhnicheskaya Informatsiya, Seriya 2 (Automatic Documentation and Mathematical Linguistics), 23(1):23--27, 1989.
 
15
 
16
C. H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994.
 
17
 
18
J. S. Provan and M. O. Ball. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM Journal on Computing, 12(4):777--788, 1983.
19
 
20
E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249--260, 1995.
 
21
 
22
L. G. Valiant. The complexity of computing the permanent. Theoretical Computer Science, 8:189--201, 1979.
23
 
24
 
25
26
 
27
M. J. Zaki and M. Ogihara. Theoretical foundations of association rules. In DMKD, 1998.
 
28
M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In KDD, 1997.

CITED BY  13