|
ABSTRACT
In our attempts to understand cellular function at the molecular level, we must be able to synthesize information from disparate types of genomic data. We consider the problem of inferring gene functional classifications from a heterogeneous data set consisting of DNA microarray expression measurements and phylogenetic profiles from whole-genome sequence comparisons. We demonstrate the application of the support vector machine (SVM) learning algorithm to this functional inference task. Our results suggest the importance of exploiting prior information about the heterogeneity of the data. In particular, we propose an SVM kernel function that is explicitly heterogeneous. We also show how to use knowledge about heterogeneity to aid in feature selection.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25:3389-3402, 1997.
|
| |
2
|
|
| |
3
|
M. P. S. Brown, W. N. Grundy, D. Lin, N. Cristianini, C. Sugnet, T. S. Furey, Jr. M. Ares, and D. Haussler. Knowledge-based analysis of microarray gene expression data using support vector machines. Proceedings of the National Academy of Sciences of the United States of America, 97(1):262-267, 2000.
|
| |
4
|
|
| |
5
|
O. Chapelle and V. Vapnik. Model selection for support vector machines. In Sara A. Solla, Todd K. Leen, and Klaus-Robert M~ller, editors, Advances in Neural Information Processing Systems 12. MIT Press, 2000.
|
| |
6
|
S. Chu, J. DeRisi, M. Eisen, J. Mulholland, D. Botstein, P. Brown, and I. Herskowitz. The transcriptional program of sporulation in budding yeast. Science, 282:699-705, 1998.
|
| |
7
|
|
| |
8
|
J.L. DeRisi, V.R. Iyer, and P.O. Brown. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science, 278:680-686, 1997.
|
| |
9
|
R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973.
|
| |
10
|
M. Eisen, P. Spellman, P.O. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences of the United States of America, 95:14863-14868, 1998.
|
| |
11
|
T. S. Furey, N. Cristianini, N. Duffy, D. W. Bednarski, M. Schummer, and D. Haussler. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16:906-914, 2000.
|
| |
12
|
|
| |
13
|
|
| |
14
|
E. M. Marcotte, M. Pellegrini, M. J. Thompson, T. O. Yeates, and D. Eisenberg. A combined algorithm for genome-wide prediction of protein function. Nature, 402(6757):83-86, 1999.
|
| |
15
|
S. Mika, G. R~tsch, J. Weston, B. Sch~lkopf, and K.-R. M~ller. Fisher discriminant analysis with kernels. In Proceedings of the IEEE Neural Networks for Signal Processing Workshop 1999, 1999.
|
| |
16
|
P. Pavlidis, T. S. Furey, M. Liberto, D. Haussler, and W. N. Grundy. Promoter region-based classification of genes. In Proceedings of the Pacific Symposium on Biocomputing, 2001. To appear.
|
| |
17
|
M. Pellegrini, E. M. Marcotte, M. J. Thompson, D. Eisenberg, and T. O. Yeates. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proceedings of the National Academy of Sciences of the United States of America, 96(8):4285-4288, 1999.
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
P. T. Spellman, G. Sherlock, M. Q. Zhang, V. R. Iyer, K. Anders, M. B. Eisen, P. O. Brown, D. Botstein, and B. Futcher. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell, 9:3273-3297, 1998.
|
| |
22
|
P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E. Lander, and T. Golub. Interpreting patterns of gene expression with self-organizing maps. Proceedings of the National Academy of Sciences of the United States of America, 96:2907-2912, 1999.
|
| |
23
|
V. N. Vapnik. Statistical Learning Theory. Adaptive and learning systems for signal processing, communications, and control. Wiley, New York, 1998.
|
| |
24
|
J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, and V. Vapnik. Feature selection for SVMs. In Sara A Solla, Todd K Leen, and Klaus-Robert M~ller, editors, Advances in Neural Information Processing Systems 13. MIT Press, 2001.
|
CITED BY 26
|
|
|
|
|
Janne Nikkila , Petri Törönen , Samuel Kaski , Jarkko Venna , Eero Castrén , Garry Wong, Analysis and visualization of gene expression data using self-organizing maps, Neural Networks, v.15 n.8-9, p.953-966, October 2002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lin Deng , Jian Pei , Jinwen Ma , Dik Lun Lee, A rank sum test method for informative gene discovery, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
|
|
|
Jian-Tao Sun , Ben-Yu Zhang , Zheng Chen , Yu-Chang Lu , Chun-Yi Shi , Wei-Ying Ma, GE-CKO: A Method to Optimize Composite Kernels for Web Page Classification, Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, p.299-305, September 20-24, 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|