|
ABSTRACT
Patterns of contrast are a very important way of comparing multi-dimensional datasets. Such patterns are able to capture regions of high difference between two classes of data, and are useful for human experts and the construction of classifiers. However, mining such patterns is particularly challenging when the number of dimensions is large. This paper describes a new technique for mining several varieties of contrast pattern, based on the use of Zero-Suppressed Binary Decision Diagrams (ZBDDs), a powerful data structure for manipulating sparse data. We study the mining of both simple contrast patterns, such as emerging patterns, and more novel and complex contrasts, which we call disjunctive emerging patterns. A performance study demonstrates our ZBDD technique is highly scalable, substantially improves on state of the art mining for emerging patterns and can be effective for discovering complex contrasts from datasets with thousands of attributes.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
F. A. Aloul, I. L. Markov, and K. A. Sakallah. MINCE: A static global variable ordering for SAT and BDD. In Int'l Workshop on Logic Synthesis, 2001.
|
| |
2
|
F. A. Aloul, M. N. Mneimneh, and K. Sakallah. ZBDD-based backtrack search SAT solver. In Int'l Workshop on Logic Synthesis, 2002.
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
 |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
Hiroshige Fujii , Goichi Ootomo , Chikahiro Hori, Interleaving based variable ordering methods for ordered binary decision diagrams, Proceedings of the 1993 IEEE/ACM international conference on Computer-aided design, p.38-41, November 07-11, 1993, Santa Clara, California, United States
|
 |
14
|
Jiawei Han , Jian Pei , Yiwen Yin, Mining frequent patterns without candidate generation, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.1-12, May 15-18, 2000, Dallas, Texas, United States
|
| |
15
|
|
 |
16
|
Haiquan Li , Jinyan Li , Limsoon Wong , Mengling Feng , Yap-Peng Tan, Relative risk and odds ratio: a data mining perspective, Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 13-15, 2005, Baltimore, Maryland
[doi> 10.1145/1065167.1065215]
|
| |
17
|
|
| |
18
|
J. Li, H. Liu, J. R. Downing, A. Yeoh, and L. Wong. Simple rules underlying gene expression profiles of more than six subtypes of Acute Lymphoblastic Leukaemia (ALL) patients. Bioinformatics, 19:71--78, 2003.
|
| |
19
|
J. Li and L. Wong. Emerging patterns and gene expression data. In Proc. of the 12th Workshop on Genome Informatics, pages 3--13, 2001.
|
| |
20
|
J. Li and L. Wong. Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. Bioinformatics, 18(10):1406--1407, 2002.
|
| |
21
|
B. Liu, L. P. Ku, and W. Hsu. Discovering interesting holes in data. In Proc. of IJCAI, pages 930--935, 1997.
|
| |
22
|
|
 |
23
|
|
| |
24
|
S. Minato. Zero-suppressed BDDs and their applications. Int'l Journal on Software Tools for Technology Transfer (STTT), 3(2):156--170, 2001.
|
| |
25
|
|
| |
26
|
A. Mishchenko. An introduction to Zero-suppressed Binary Decision Diagrams.
|
| |
27
|
T. M. Mitchell. Generalization as Search. AI, 18(2):203--226, 1982.
|
 |
28
|
Feng Pan , Gao Cong , Anthony K. H. Tung , Jiong Yang , Mohammed J. Zaki, Carpenter: finding closed patterns in long biological datasets, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2003, Washington, D.C.
[doi> 10.1145/956750.956832]
|
| |
29
|
A. Rauzy. Mathematical foundations of minimal cutsets. IEEE Transactions on Reliability, 50(4), 2001.
|
 |
30
|
|
| |
31
|
|
 |
32
|
|
| |
33
|
M. Sebag. Delaying the choice of bias: A disjunctive version space approach. In Proc. of ICML 1996, pages 444--452.
|
| |
34
|
F. Somenzi. CUDD: CU decision diagram package, 1997. Public software, Colorado University, Boulder.
|
| |
35
|
A. Soulet, B. Cramilleux, and F. Rioult. Condensed representation of emerging patterns. In Proc. of PAKDD 04, pages 127--132, 2004.
|
 |
36
|
|
 |
37
|
|
|