ACM Home Page
Please provide us with feedback. Feedback
Gene functional classification from heterogeneous data
Full text PdfPdf (103 KB)
Source Annual Conference on Research in Computational Molecular Biology archive
Proceedings of the fifth annual international conference on Computational biology table of contents
Montreal, Quebec, Canada
Pages: 249 - 255  
Year of Publication: 2001
ISBN:1-58113-353-7
Authors
Paul Pavlidis  Columbia Genome Center, Columbia University
Jason Weston  Barnhill Technologies, Savannah, Georgia
Jinsong Cai  Department of Medical Informatics, Columbia University
William Noble Grundy  Department of Computer Science, Columbia University
Sponsor
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 63,   Citation Count: 26
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/369133.369228
What is a DOI?

ABSTRACT

In our attempts to understand cellular function at the molecular level, we must be able to synthesize information from disparate types of genomic data. We consider the problem of inferring gene functional classifications from a heterogeneous data set consisting of DNA microarray expression measurements and phylogenetic profiles from whole-genome sequence comparisons. We demonstrate the application of the support vector machine (SVM) learning algorithm to this functional inference task. Our results suggest the importance of exploiting prior information about the heterogeneity of the data. In particular, we propose an SVM kernel function that is explicitly heterogeneous. We also show how to use knowledge about heterogeneity to aid in feature selection.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25:3389-3402, 1997.
 
2
 
3
M. P. S. Brown, W. N. Grundy, D. Lin, N. Cristianini, C. Sugnet, T. S. Furey, Jr. M. Ares, and D. Haussler. Knowledge-based analysis of microarray gene expression data using support vector machines. Proceedings of the National Academy of Sciences of the United States of America, 97(1):262-267, 2000.
 
4
 
5
O. Chapelle and V. Vapnik. Model selection for support vector machines. In Sara A. Solla, Todd K. Leen, and Klaus-Robert M~ller, editors, Advances in Neural Information Processing Systems 12. MIT Press, 2000.
 
6
S. Chu, J. DeRisi, M. Eisen, J. Mulholland, D. Botstein, P. Brown, and I. Herskowitz. The transcriptional program of sporulation in budding yeast. Science, 282:699-705, 1998.
 
7
 
8
J.L. DeRisi, V.R. Iyer, and P.O. Brown. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science, 278:680-686, 1997.
 
9
R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973.
 
10
M. Eisen, P. Spellman, P.O. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences of the United States of America, 95:14863-14868, 1998.
 
11
T. S. Furey, N. Cristianini, N. Duffy, D. W. Bednarski, M. Schummer, and D. Haussler. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16:906-914, 2000.
 
12
 
13
 
14
E. M. Marcotte, M. Pellegrini, M. J. Thompson, T. O. Yeates, and D. Eisenberg. A combined algorithm for genome-wide prediction of protein function. Nature, 402(6757):83-86, 1999.
 
15
S. Mika, G. R~tsch, J. Weston, B. Sch~lkopf, and K.-R. M~ller. Fisher discriminant analysis with kernels. In Proceedings of the IEEE Neural Networks for Signal Processing Workshop 1999, 1999.
 
16
P. Pavlidis, T. S. Furey, M. Liberto, D. Haussler, and W. N. Grundy. Promoter region-based classification of genes. In Proceedings of the Pacific Symposium on Biocomputing, 2001. To appear.
 
17
M. Pellegrini, E. M. Marcotte, M. J. Thompson, D. Eisenberg, and T. O. Yeates. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proceedings of the National Academy of Sciences of the United States of America, 96(8):4285-4288, 1999.
 
18
 
19
 
20
 
21
P. T. Spellman, G. Sherlock, M. Q. Zhang, V. R. Iyer, K. Anders, M. B. Eisen, P. O. Brown, D. Botstein, and B. Futcher. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell, 9:3273-3297, 1998.
 
22
P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E. Lander, and T. Golub. Interpreting patterns of gene expression with self-organizing maps. Proceedings of the National Academy of Sciences of the United States of America, 96:2907-2912, 1999.
 
23
V. N. Vapnik. Statistical Learning Theory. Adaptive and learning systems for signal processing, communications, and control. Wiley, New York, 1998.
 
24
J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, and V. Vapnik. Feature selection for SVMs. In Sara A Solla, Todd K Leen, and Klaus-Robert M~ller, editors, Advances in Neural Information Processing Systems 13. MIT Press, 2001.

CITED BY  26

Collaborative Colleagues:
Paul Pavlidis: colleagues
Jason Weston: colleagues
Jinsong Cai: colleagues
William Noble Grundy: colleagues