|
ABSTRACT
The problem of analyzing microarray data became one of important topics in bioinformatics over the past several years, and different data mining techniques have been proposed for the analysis of such data. In this paper, we propose to use association rule discovery methods for determining associations among expression levels of different genes. One of the main problems related to the discovery of these associations is the scalability issue. Microarrays usually contain very large numbers of genes that are sometimes measured in 10,000s. Therefore, analysis of such data can generate a very large number of associations that can often be measured in millions. The paper addresses this problem by presenting a method that enables biologists to evaluate these very large numbers of discovered association rules during the post-analysis stage of the data mining process. This is achieved by providing several rule evaluation operators, including rule grouping, filtering, browsing, and data inspection operators, that allow biologists to validate multiple individual gane regulation patterns at a time. By iteratively applying these operators, biologists can explore a significant part of all the initially generated rules in an acceptable period of time and thus answer biological questions that are of a particular interest to him or her. To validate our method, we tested our system on the microarray data pertaining to the studies of environmental hazards and their influence of gane expression processes. As a result, we managed to answer several questions that were of interest to the biologists that had collected this data.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
3
|
Rakesh Agrawal , Hiekki Mannila , Ramakrishnan Srikant , Hannu Toivonen , A. Inkeri Verkamo, Fast discovery of association rules, Advances in knowledge discovery and data mining, American Association for Artificial Intelligence, Menlo Park, CA, 1996
|
 |
4
|
|
| |
5
|
|
| |
6
|
Berrar, D., Dubitzky, W., Granzow, M., and Eils, R. Analysis of Gene Expression and Drug Activity Data by Knowledge-based Association Mining. In Proceedings of Critical Assessment of Microarray Data Analysis Techniques (CAMDA'01), pp. 25--28, 2001.
|
| |
7
|
Bicciato, S., Paladin, M., Didone, G., Di Bello, C. Analysis of an Associative Memory Neural Network for Pattern Identification in Gene Expression Data. Proceedings of BIOKDD'01, 2001.
|
| |
8
|
Bowtell, D.D. Options available---from start to finish--for obtaining expression data by microarray. Nature Genetics, vol. 21 (1 Suppl):25--32, 1999.
|
| |
9
|
Brown, M.P, Grundy, W.N., Lin, D., Cristiani, N., Sugnet, C.W., Furey, T.S., Ares, M, and Haussler D. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proceedings of National Academy of Sciences, vol. 97, no 1., Jan. 2000.
|
| |
10
|
|
| |
11
|
Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D. Cluster analysis and display of genome-wide expression patterns. Proceedings of National Academy of Sciences, 95(25):14863--8, 1998.
|
 |
12
|
|
| |
13
|
|
| |
14
|
Jelinsky, S., Estep, P., Church, G, Samson, L. Regulatory Networks Revealed by Transcriptional Profiling of Damaged Saccharomyces cerevisiae Cells: RPN4 Links Base Excision Repair with Proteasomes. Molecular and Cellular Biology, 20(21), Nov., 2000.
|
 |
15
|
Mika Klemettinen , Heikki Mannila , Pirjo Ronkainen , Hannu Toivonen , A. Inkeri Verkamo, Finding interesting rules from large sets of discovered association rules, Proceedings of the third international conference on Information and knowledge management, p.401-407, November 29-December 02, 1994, Gaithersburg, Maryland, United States
[doi> 10.1145/191246.191314]
|
| |
16
|
Kotala, P., Perera A., Kai Zhou, J., Mudivarthy, S., Perrizo, W., and Deckard, E. Gene Expression Profiling of DNA Microarray Data Using Peano Count Trees (P-Trees). Online Proceedings of the First Virtual Conference on Genomics and Bioinformatics, October 2001. URL: http://midas-10.cs.ndsu.nodak.edu/bio/
|
| |
17
|
Kurra, G. Niu, W., Bhatnagar, R. Mining Microarray Expression Data for Classifier Gene-Cores. Proceedings of BIOKDD'01, 2001.
|
| |
18
|
|
| |
19
|
Lewin, Benjamin. Genes VI. Oxford; New York: Oxford University Press, 1997.
|
| |
20
|
Liu, B. and Hsu, W., 1996. Post-Analysis of Learned Rules. In Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI '96), pp. 828--834.
|
 |
21
|
Bing Liu , Wynne Hsu , Yiming Ma, Pruning and summarizing the discovered associations, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.125-134, August 15-18, 1999, San Diego, California, United States
[doi> 10.1145/312129.312216]
|
| |
22
|
|
| |
23
|
Padmanabhan, B. and Tuzhilin, A. A Belief-Driven Method for Discovering Unexpected Patterns." In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD'98), August 1998.
|
| |
24
|
Pavlidis, P., C. Tang, W. Noble, Classification of genes using probabilistic models of microarray expression profiles. In Proceedings of BIOKDD'01, 2001.
|
 |
25
|
|
| |
26
|
Pevsner P.A., Lysov Y., Khrapko K.R., Belyavsky A., Floreny'ev, Mirzabekov A. Improved Chips for Sequencing by Hybridization. Journal of Biomolecular Structure and Dynamics 9(2), pp 399--410, 1991.
|
| |
27
|
|
| |
28
|
Srikant, R., Vu, Q., and Agrawal, R. Mining Association Rules with Item Constraints. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD'97), AAAI Press, Menlo Park, California, 1997.
|
| |
29
|
Suzuki, E., 1997. Autonomous Discovery of Reliable Exception Rules. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD'97), pp. 259--262, 1997.
|
| |
30
|
Tamayo, P, Slonim, D., Mesirov, J., Zhu, Q, Kitareewan, S., Dmitrovsky, E., Lander, E., Golub, T. Interpreting patterns of gene expression with self-organizing maps: Methods and applications to hematopoietic differentiation. In Proceedings of National Academy of Sciences, Vol. 96, March 1999.
|
| |
31
|
Toivonen, H., Klemettinen M., Ronkainen P., Hatonen, K. and Mannila H. Pruning and grouping discovered association rules. In ECML-95 Workshop on Statistics, Machine Learning, and Knowledge Discovery in Databases. 1995.
|
 |
32
|
|
| |
33
|
Wang, K, Tay, S.H.W. and Liu, B. Interestingness-based interval merger for numeric association rules. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD'98), August 1998.
|
CITED BY 15
|
|
|
|
|
Peter Fule , John F. Roddick, Experiences in building a tool for navigating association rule result sets, Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation, p.103-108, January 01, 2004, Dunedin, New Zealand
|
|
|
Deepayan Chakrabarti , Spiros Papadimitriou , Dharmendra S. Modha , Christos Faloutsos, Fully automatic cross-associations, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
|
|
|
Imad Rahal , Dongmei Ren , Amal Perera , Hassan Najadat , William Perrizo , Riad Rahhal , Willy Valdivia, Incremental interactive mining of constrained association rules from biological annotation data with nominal features, Proceedings of the 2005 ACM symposium on Applied computing, March 13-17, 2005, Santa Fe, New Mexico
|
|
|
|
|
|
|
|
|
Bing Liu , Kaidi Zhao , Jeffrey Benkler , Weimin Xiao, Rule interestingness analysis using OLAP operations, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
Kaidi Zhao , Bing Liu , Jeffrey Benkler , Weimin Xiao, Opportunity map: identifying causes of failure - a deployed data mining system, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
Zan Huang , Jiexun Li , Hua Su , George S. Watts , Hsinchun Chen, Large-scale regulatory network analysis from microarray data: modified Bayesian network learning and association rule mining, Decision Support Systems, v.43 n.4, p.1207-1225, August, 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
INDEX TERMS
Primary Classification:
I.
Computing Methodologies
I.2
ARTIFICIAL INTELLIGENCE
I.2.3
Deduction and Theorem Proving
Subjects:
Deduction (e.g., natural, rule-based)
Additional Classification:
H.
Information Systems
H.2
DATABASE MANAGEMENT
H.2.4
Systems
Subjects:
Rule-based databases
H.2.8
Database applications
Subjects:
Data mining
General Terms:
Algorithms,
Design,
Measurement,
Performance,
Reliability
Keywords:
analysis of microarray data,
association rules,
bioinformatics,
expert-driven rule validation,
post-processing of discovered rules,
rule filtering,
rule grouping
|