|
ABSTRACT
The discovery of biclusters, which denote groups of items that show coherent values across a subset of all the transactions in a data set, is an important type of analysis performed on real-valued data sets in various domains, such as biology. Several algorithms have been proposed to find different types of biclusters in such data sets. However, these algorithms are unable to search the space of all possible biclusters exhaustively. Pattern mining algorithms in association analysis also essentially produce biclusters as their result, since the patterns consist of items that are supported by a subset of all the transactions. However, a major limitation of the numerous techniques developed in association analysis is that they are only able to analyze data sets with binary and/or categorical variables, and their application to real-valued data sets often involves some lossy transformation such as discretization or binarization of the attributes. In this paper, we propose a novel association analysis framework for exhaustively and efficiently mining "range support" patterns from such a data set. On one hand, this framework reduces the loss of information incurred by the binarization- and discretization-based approaches, and on the other, it enables the exhaustive discovery of coherent biclusters. We compared the performance of our framework with two standard biclustering algorithms through the evaluation of the similarity of the cellular functions of the genes constituting the patterns/biclusters derived by these algorithms from microarray data. These experiments show that the real-valued patterns discovered by our framework are better enriched by small biologically interesting functional classes. Also, through specific examples, we demonstrate the ability of the RAP framework to discover functionally enriched patterns that are not found by the commonly used biclustering algorithm ISA. The source code and data sets used in this paper, as well as the supplementary material, are available at http://www.cs.umn.edu/vk/gaurav/rap.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
M. Ashburner et al. Gene Ontology: tool for the unification of biology. Nature Genetics, 25(1):25--29, 2000.
|
| |
3
|
|
| |
4
|
C. Becquet et al. Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human sage data. Genome Biology, 3(12):1--16, 2002.
|
| |
5
|
A. Ben-Dor, B. Chor, R. Karp, and Z. Yakhini. Discovering Local Structure in Gene Expression Data: The Order-Preserving Submatrix Problem. Journal of Computational Biology, 10(3-4):373--384, 2003.
|
| |
6
|
C. Borgelt. Efficient Implementations of Apriori and Eclat. In Proc. FIMI, 2003.
|
| |
7
|
Elizabeth I. Boyle , Shuai Weng , Jeremy Gollub , Heng Jin , David Botstein , J. Michael Cherry , Gavin Sherlock, GO: :TermFinder---open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes, Bioinformatics, v.20 n.18, p.3710-3715, December 2004
[doi> 10.1093/bioinformatics/bth456]
|
 |
8
|
|
 |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
C. Creighton and S. Hanash. Mining gene expression databases for association rules. Bioinformatics, 19(1):79--86, January 2003.
|
 |
13
|
|
 |
14
|
Takeshi Fukuda , Yasuhido Morimoto , Shinichi Morishita , Takeshi Tokuyama, Mining optimized association rules for numeric attributes, Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.182-191, June 04-06, 1996, Montreal, Quebec, Canada
[doi> 10.1145/237661.237708]
|
| |
15
|
G. Getz, E. Levine, and E. Domany. Coupled two-way clustering analysis of gene microarray data. PNAS, 97(22):12079, 2000.
|
| |
16
|
E.-H. Han, G. Karypis, and V. Kumar. Min-apriori: An algorithm for finding association rules in data with continuous attributes. Technical Report 97-068, Dept. of Comp. Sc. and Engg., Univ. of Minnesota.
|
| |
17
|
|
| |
18
|
Y. Huang, H. Xiong, W. Wu, and S. Y. Sung. Mining quantitative maximal hyperclique patterns: A summary of results. In Proc. PAKDD, pages 552--556, 2006.
|
| |
19
|
T. R. Hughes et al. Functional discovery via a compendium of expression profiles. Cell, 102(1):109--126, 2000.
|
| |
20
|
|
| |
21
|
J. Ihmels, G. Friedlander, S. Bergmann, O. Sarig, Y. Ziv, and N. Barkai. Revealing modular organization in the yeast transcriptional network. Nat. Genet., 31:370--377, 2002.
|
| |
22
|
|
| |
23
|
|
| |
24
|
T. M. Murali and S. Kasif. Extracting conserved gene expression motifs from gene expression data. In Proc. Pac Symp Biocomput., pages 77--88, 2003.
|
| |
25
|
C. L. Myers et al. Finding function: evaluation methods for functional genomic data. BMC Genomics, 7:187, 2006.
|
| |
26
|
D. V. Nguyen, A. B. Arpat, N. Wang, and R. J. Carroll. DNA microarray experiments: biological and technological aspects. Biometrics, 58(4):701--717, 2002.
|
| |
27
|
G. Pandey, V. Kumar, and M. Steinbach. Computational approaches for protein function prediction: A survey. Technical Report 06-028, Dept. of Comp. Sc. and Engg., Univ. of Minnesota, 2006.
|
| |
28
|
Amela Prelić , Stefan Bleuler , Philip Zimmermann , Anja Wille , Peter Bühlmann , Wilhelm Gruissem , Lars Hennig , Lothar Thiele , Eckart Zitzler, A systematic comparison and evaluation of biclustering methods for gene expression data, Bioinformatics, v.22 n.9, p.1122-1129, May 2006
[doi> 10.1093/bioinformatics/btl060]
|
| |
29
|
|
| |
30
|
|
| |
31
|
M. Seno and G. Karypis. Finding frequent patterns using length-decreasing support constraints. Data Min. Knowl. Discov., 10(3):197--228, 2005.
|
| |
32
|
|
 |
33
|
|
 |
34
|
|
 |
35
|
Michael Steinbach , Pang-Ning Tan , Hui Xiong , Vipin Kumar, Generalizing the notion of support, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
[doi> 10.1145/1014052.1014141]
|
| |
36
|
A. Tanay et al. Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. PNAS, 101(9):2981--2986, 2004.
|
| |
37
|
A. Tanay, R. Sharan, and R. Shamir. Discovering statistically significant biclusters in gene expression data. Bioinformatics, 18(90001):S136--S144, 2002.
|
| |
38
|
|
| |
39
|
|
| |
40
|
H. Yu, L. Gao, K. Tu, and Z. Guo. Broadly predicting specific gene functions with expression similarity and taxonomy similarity. Gene, 352:75--81, 2005.
|
| |
41
|
M. J. Zaki and C.-J. Hsiao. Charm: An efficient algorithm for closed itemset mining. In Proc. SDM, 2002.
|
| |
42
|
F. Zhu, X. Yan, J. Han, P. Yu, and H. Cheng. Mining colossal frequent patterns by core pattern fusion. In Proc. IEEE ICDE, pages 706--715, 2007.
|
|