|
ABSTRACT
Set pattern discovery from binary relations has been extensively studied during the last decade. In particular, many complete and efficient algorithms for frequent closed set mining are now available. Generalizing such a task to n-ary relations (n ≥ 2) appears as a timely challenge. It may be important for many applications, for example, when adding the time dimension to the popular objects × features binary case. The generality of the task (no assumption being made on the relation arity or on the size of its attribute domains) makes it computationally challenging. We introduce an algorithm called Data-Peeler. From an n-ary relation, it extracts all closed n-sets satisfying given piecewise (anti) monotonic constraints. This new class of constraints generalizes both monotonic and antimonotonic constraints. Considering the special case of ternary relations, Data-Peeler outperforms the state-of-the-art algorithms CubeMiner and Trias by orders of magnitude. These good performances must be granted to a new clever enumeration strategy allowing to efficiently enforce the closeness property. The relevance of the extracted closed n-sets is assessed on real-life 3-and 4-ary relations. Beyond natural 3-or 4-ary relations, expanding a relation with an additional attribute can help in enforcing rather abstract constraints such as the robustness with respect to binarization. Furthermore, a collection of closed n-sets is shown to be an excellent starting point to compute a tiling of the dataset.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Foto Afrati , Gautam Das , Aristides Gionis , Heikki Mannila , Taneli Mielikainen , Panayiotis Tsaparas, Mining Chains of Relations, Proceedings of the Fifth IEEE International Conference on Data Mining, p.553-556, November 27-30, 2005
[doi> 10.1109/ICDM.2005.94]
|
 |
2
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
3
|
|
| |
4
|
|
| |
5
|
Boulicaut, J.-F. and Jeudy, B. 2005. Constraint-Based data mining. In The Data Mining and Knowledge Discovery Handbook, O. Maimon and L. Rokach, Eds. Springer, 399--416.
|
| |
6
|
|
| |
7
|
Cerf, L., Besson, J., Robardet, C., and Boulicaut, J.-F. 2008. Data-Peeler: Constraint-Based closed pattern mining in n-ary relations. In Proceedings of the 8th SIAM International Conference on Data Mining (SDM'08). SIAM.
|
| |
8
|
McCluskey, J. 1956. Minimization of Boolean functions. Bell Syst. Tech. J. 35, 5, 1417--1444.
|
| |
9
|
Gao, M., Jiang, J.-H., Jiang, Y., Li, Y., Sinha, S., and Brayton, R. 2000. MVSIS. In Notes of the IEEE International Workshop on Logic Synthesis. IEEE Computer Society.
|
| |
10
|
Garriga, G. C., Khardon, R., and Raedt, L. D. 2007. On mining closed sets in multi-relational data. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI'07). AAAI Press, 804--809.
|
| |
11
|
Gély, A. 2005. A generic algorithm for generating closed sets of a binary relation. In Proceedings of the 3rd International Conference on Formal Concept Analysis (ICFCA'05). Lecture Notes in Computer Science, Vol, 3403, Springer, 223--234.
|
 |
12
|
|
| |
13
|
|
 |
14
|
Jiawei Han , Jian Pei , Yiwen Yin, Mining frequent patterns without candidate generation, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.1-12, May 15-18, 2000, Dallas, Texas, United States
|
| |
15
|
|
| |
16
|
|
 |
17
|
Daxin Jiang , Jian Pei , Murali Ramanathan , Chun Tang , Aidong Zhang, Mining coherent gene clusters from gene-sample-time microarray data, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
[doi> 10.1145/1014052.1014101]
|
| |
18
|
Karnaugh, M. 1953. The map method for synthesis of combinational logic circuits. Trans. Amer. Institute Electric. Eng. Part I 72, 9, 593--599.
|
 |
19
|
Raymond T. Ng , Laks V. S. Lakshmanan , Jiawei Han , Alex Pang, Exploratory mining and pruning optimizations of constrained associations rules, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.13-24, June 01-04, 1998, Seattle, Washington, United States
|
 |
20
|
Feng Pan , Gao Cong , Anthony K. H. Tung , Jiong Yang , Mohammed J. Zaki, Carpenter: finding closed patterns in long biological datasets, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2003, Washington, D.C.
[doi> 10.1145/956750.956832]
|
| |
21
|
|
| |
22
|
|
| |
23
|
Pei, J., Han, J., and Mao, R. 2000. CLOSET: An efficient algorithm for mining frequent closed itemsets. In Workshop on Research Issues in Data Mining and Knowledge Discovery (SIGMOD'00). ACM Press, 21--30.
|
| |
24
|
Pensa, R. G. and Boulicaut, J.-F. 2005. Boolean property encoding for local set pattern discovery: An application to gene expression data analysis. In Local Pattern Detection. Vol. 3539. Springer, 115--134.
|
| |
25
|
Rudell, R. and Sangiovanni-Vincentelli, A. 1985. Espresso-MV: Algorithms for multiple valued logic minimization. In Proceedings of the IEEE Custom International Circuit Conference. IEEE Computer Society, 230--234.
|
| |
26
|
|
 |
27
|
|
 |
28
|
Takeaki Uno , Masashi Kiyomi , Hiroki Arimura, LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining, Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations, p.77-86, August 21-21, 2005, Chicago, Illinois
[doi> 10.1145/1133905.1133916]
|
 |
29
|
|
| |
30
|
Wille, R. 1982. Restructuring lattice theory: An approach based on hierarchies of concepts. In Ordered Sets, I. Rival, Ed. Reidel, 445--470.
|
| |
31
|
Zaki, M. J. and Hsiao, C. J. 2002. ChARM: An efficient algorithm for closed itemset mining. In Proceedings of the 2nd SIAM International Conference on Data Mining (SDM'02). SIAM.
|
 |
32
|
|
|