|
ABSTRACT
We present an association rule mining method for mining high confidence rules, which describe interesting gene relationships from microarray datasets. Microarray datasets typically contain an order of magnitude more genes than experiments, rendering many data mining methods impractical as they are optimised for sparse datasets. A new family of row-enumeration rule mining algorithms have emerged to facilitate mining in dense datasets. These algorithms rely on pruning infrequent relationships to reduce the search space by using the support measure. This major shortcoming results in the pruning of many potentially interesting rules with low support but high confidence. We propose a new row-enumeration rule mining method, MaxConf, to mine high confidence rules from microarray data. MaxConf is a support-free algorithm which directly uses the confidence measure to effectively prune the search space. Experiments on three microarray datasets show that MaxConf outperforms support-based rule mining with respect to scalability and rule extraction. Furthermore, detailed biological analyses demonstrate the effectiveness of our approach -- the rules discovered by MaxConf are substantially more interesting and meaningful compared with support-based methods.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
T. Akutsu, S. Miyano, and S. Kuhara, “Inferring Qualitative Relations in Genetic Networks and Metabolic Pathways,” Bioinformatics, vol. 16, no. 8, pp. 727-734, 2000.
|
| |
3
|
C. Creighton and S. Hanash, “Mining Gene Expression Databases for Association Rules,” Bioinformatics, vol. 19, no. 1, pp. 79-86, 2003.
|
| |
4
|
|
| |
5
|
T. Akutsu, S. Miyano, and S. Kuhara, “Identification of Genetic Networks from a Small Number of Gene Expression Patterns under the Boolean Network Model,” Proc. Pacific Symp. Biocomputing, vol. 4, pp. 17-28, 1999.
|
 |
6
|
Feng Pan , Gao Cong , Anthony K. H. Tung , Jiong Yang , Mohammed J. Zaki, Carpenter: finding closed patterns in long biological datasets, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2003, Washington, D.C.
[doi> 10.1145/956750.956832]
|
| |
7
|
C. Alfarano et al., “The Biomolecular Interaction Network Database and Related Tools 2005 Update,” Nucleic Acids Research, vol. 33, pp. D418-D424, 2005.
|
| |
8
|
The Gene Ontology Consortium, “The Gene Ontology (GO) Database and Informatics Resource,” Nucleic Acids Research, vol. 32, pp. D258-D261, 2004.
|
| |
9
|
P. Spellman, G. Sherlock, M. Zhang, V. Iyer, K. Anders, M. Eisen, P. Brown, D. Botstein, and B. Futcher, “Comprehensive Identification of Cell Cycle-Regulated Genes of the Yeast Saccharomyces Cerevisiae by Microarray Hybridization,” Molecular Biology of the Cell, vol. 9, pp. 3273-3297, 1998.
|
| |
10
|
D. Singh et al., “Gene Expression Correlates of Clinical Prostate Cancer Behavior,” Cancer Cell, vol. 1, pp. 203-209, 2002.
|
| |
11
|
A. Gasch, P. Spellman, C. Kao, O. Carmel-Harel, M. Eisen, G. Storz, D. Botstein, and P. Brown, “Genomic Expression Changes in the Response of Yeast Cells to Environmental Changes,” Molecular Biology of the Cell, vol. 11, no. 12, pp. 4241-4257, 2000.
|
| |
12
|
|
 |
13
|
|
 |
14
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
 |
15
|
Gao Cong , Anthony K. H. Tung , Xin Xu , Feng Pan , Jiong Yang, FARMER: finding interesting rule groups in microarray datasets, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
[doi> 10.1145/1007568.1007587]
|
| |
16
|
M. Zaki and C. Hsiao, “CHARM: An Efficient Algorithm for Closed Association Rule Mining,” Proc. SIAM Int'l Conf. Data Mining (SDM), pp. 457-473, 2002.
|
 |
17
|
Jian Pei , Runying Mao , Kan Hu , Hua Zhu, Towards data mining benchmarking: a test bed for performance study of frequent pattern mining, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.592, May 15-18, 2000, Dallas, Texas, United States
|
 |
18
|
|
| |
19
|
T. Hughes et al., “Functional Discovery via a Compendium of Expression Profiles,” Cell, vol. 102, pp. 109-126, 2000.
|
| |
20
|
S. Mnaimneh et al., “Exploration of Essential Gene Functions via Titratable Promoter Alleles,” Cell, vol. 118, pp. 31-44, 2004.
|
| |
21
|
|
| |
22
|
R. Hassett, A. Romeo, and D. Kosman, “Regulation of High Affinity Iron Uptake in the Yeast Saccharomyces Cerevisiae,” J.Biological Chemistry, vol. 273, no. 13, pp. 7628-7636, 1998.
|
| |
23
|
V. Haurie, H. Boucherie, and F. Sagliocco, “The Snf1 Protein Kinase Controls the Induction of Genes of the Iron Uptake Pathway at the Diauxic Shift in Saccharomyces Cerevisiae,” J.Biological Chemistry, vol. 278, no. 46, pp. 45391-45396, 2003.
|
| |
24
|
L. Martins, L. Jensen, J. Simon, G. Keller, and D. Winge, “Metalloregulation of FRE1 and FRE2 Homologs in Saccharomyces Cerevisiae,” J. Biological Chemistry, vol. 273, no. 37, pp.23716-23721, 1998.
|
 |
25
|
|
CITED BY
|
|
Gaurav Pandey , Gowtham Atluri , Michael Steinbach , Chad L. Myers , Vipin Kumar, An association analysis approach to biclustering, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, June 28-July 01, 2009, Paris, France
|
|