ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Evolutionary Optimization of Kernel Weights Improves Protein Complex Comembership Prediction
Full text PdfPdf (3.09 MB)
Source IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) archive
Volume 6 ,  Issue 3  (July 2009) table of contents
Pages: 427-437  
Year of Publication: 2009
ISSN:1545-5963
Authors
Marc Hulsman  Delft University of Technology, Delft
Marcel J. T. Reinders  Delft University of Technology, Delft
Dick de Ridder  Delft University of Technology, Delft
Publisher
IEEE Computer Society Press  Los Alamitos, CA, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 60,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: 10.1109/TCBB.2008.137

ABSTRACT

In recent years, more and more high-throughput data sources useful for protein complex prediction have become available (e.g., gene sequence, mRNA expression, and interactions). The integration of these different data sources can be challenging. Recently, it has been recognized that kernel-based classifiers are well suited for this task. However, the different kernels (data sources) are often combined using equal weights. Although several methods have been developed to optimize kernel weights, no large-scale example of an improvement in classifier performance has been shown yet. In this work, we employ an evolutionary algorithm to determine weights for a larger set of kernels by optimizing a criterion based on the area under the ROC curve. We show that setting the right kernel weights can indeed improve performance. We compare this to the existing kernel weight optimization methods (i.e., (regularized) optimization of the SVM criterion or aligning the kernel with an ideal kernel) and find that these do not result in a significant performance improvement and can even cause a decrease in performance. Results also show that an expert approach of assigning high weights to features with high individual performance is not necessarily the best strategy.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
B. Schölkopf, K. Tsuda, and J.-P. Vert, Kernel Methods in Computational Biology. MIT Press, 2004.
 
2
 
3
A. Ben-Hur, C.S. Ong, S. Sonnenburg, B. Schölkopf, and G. Rätsch, "Support Vector Machines and Kernels for Computational Biology," PLoS Computational Biology, vol. 4, no. 10, Oct. 2008.
 
4
 
5
Y. Ho et al., "Systematic Identification of Protein Complexes in Saccharomyces Cerevisiae by Mass Spectrometry," Nature, vol. 415, pp. 180-183, Jan. 2002.
 
6
A.-C. Gavin et al., "Functional Organization of the Yeast Proteome by Systematic Analysis of Protein Complexes," Nature, vol. 415, pp. 141-147, 2002.
 
7
G.D. Bader and C.W.V. Hogue, "Analyzing Yeast Protein-Protein Interaction Data Obtained from Different Sources," Nature Biotechnology, vol. 20, pp. 991-997, 2002.
 
8
C. von Mering, R. Krause, B. Snel, M. Cornell, S.G. Oliver, S. Fields, and P. Bork, "Comparative Assessment of Large-Scale Data Sets of Protein-Protein Interactions," Nature, vol. 417, pp. 399-403, 2002.
 
9
R. Jansen and M. Gerstein, "Analyzing Protein Function on a Genomic Scale: The Importance of Gold-Standard Positives and Negatives for Network Prediction," Current Opinion in Microbiology , vol. 7, pp. 535-545, 2004.
 
10
G.D. Bader and C.W.V. Hogue, "An Automated Method for Finding Molecular Complexes in Large Protein Interaction Networks," BMC Bioinformatics, vol. 4, 2003.
 
11
R. Jansen, Y. Haiyuan, D. Greenbaum, Y. Kluger, N.J. Krogan, S. Chung, A. Emili, M. Snyder, J.F. Greenblatt, and M. Gerstein, "A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data," Science, vol. 302, pp. 449-453, 2003.
 
12
L.V. Zhang, S.L. Wong, O.D. King, and F.P. Roth, "Predicting Co-Complexed Protein Pairs Using Genomic and Proteomic Data Integration," BMC Bioinformatics, vol. 5, i38-i46, 2004.
 
13
R.J.P. van Berlo, L.F.A. Wessels, D. de Ridder, and M.J.T. Reinders, "Protein Complex Prediction Using an Integrative Bioinformatics Approach," J. Bioinformatics and Computational Biology, vol. 4, pp. 839-861, 2007.
 
14
Y. Qi, Z. Bar-Joseph, and J. Klein-Seetharaman, "Evaluation of Different Biological Data and Computational Classification Methods for Use in Protein Interaction Prediction," Proteins: Structure, Function, and Bioinformatics, vol. 63, pp. 490-500, 2006.
 
15
 
16
 
17
 
18
 
19
O. Bousquet and D. Herrmann, "On the Complexity of Learning the Kernel Matrix," Advances in Neural Information Processing Systems, S. Becker, S. Thrun, and K. Obermayer, eds., vol. 15. pp. 415-422, MIT Press, 2003.
 
20
N. Cristianini, J. Shawe-Taylor, A. Elisseeff, and J. Kandola, "On Kernel Target Alignment," Advances in Neural Information Processing Systems, vol. 14, pp. 367-374, MIT Press, 2002.
 
21
P. Pavlidis, J. Weston, C. Jinsong, and W.S. Noble, "Learning Gene Functional Classifications from Multiple Data Types," J. Computational Biology, vol. 9, pp. 401-412, 2002.
 
22
 
23
24
 
25
F. Friedrichs and C. Igel, "Evolutionary Tuning of Multiple SVM Parameters," Proc. European Symp. Artificial Neural Networks (ESANN), pp. 519-524, 2004.
 
26
 
27
 
28
A. Auger and N. Hansen, "A Restart CMA Evolution Strategy with Increasing Population Size," Proc. IEEE Congress Evolutionary Computation (CEC '05), vol. 2, pp. 1769-1776, 2005.
29
 
30
F. Bach, R. Thibaux, and M.I. Jordan, "Computing Regularization Paths for Learning Multiple Kernels," J. Machine Learning Research, vol. 5, pp. 1391-1415, 2004.
 
31
H.W. Mewes, K. Heumann, A. Kaps, K. Mayer, F. Pfeiffer, S. Stocker, and D. Frishman, "MIPS: A Database for Genomes and Protein Sequences," Nucleic Acids Research, vol. 28, pp. 37-40, 2000.
 
32
A. Ben-Hur and W.S. Noble, "Choosing Negative Examples for the Prediction of Protein-Protein Interactions," BMC Bioinformatics , vol. 7, no. 1, 2006.
 
33
J. Gollub et al., "The Stanford Microarray Database: Data Access and Quality Assessment Tools," Nucleic Acids Research, vol. 31, no. 1, pp. 94-96, 2003.
 
34
T.R. Hughes et al., "Functional Discovery via a Compendium of Expression Profiles," Cell, vol. 102, pp. 109-126, 2000.
 
35
S.L. Tai, V.M. Boer, P. Daran-Lapujade, M.C. Walsh, J.H. de Winde, J.-M. Daran, and J.T. Pronk, "Two-Dimensional Transcriptome Analysis in Chemostat Cultures: Combinatorial Effects of Oxygen Availability and Macronutrient Limitation in Saccharomyces Cerevisiae," J. Biological Chemistry, vol. 280, no. 1, pp. 437- 447, 2005.
 
36
M.J. Dunham, H. Badrane, T. Ferea, J. Adams, P.O. Brown, F. Rosenzweig, and D. Botstein, "Characteristic Genome Rearrangements in Experimental Evolution of Saccharomyces Cerevisiae," Proc. Nat'l Academy of Sciences USA, vol. 99, pp. 16144-16149, 2002.
 
37
J.D. Lieb, L. Xiaole, D. Botstein, and P.O. Brown, "Promoter-Specific Binding of Rap1 Revealed by Genome-Wide Maps of Protein-DNA Association," Nature Genetics, vol. 28, pp. 327-334, 2001.
 
38
C.T. Harbison et al., "Transcriptional Regulatory Code of a Eukaryotic Genome," Nature, vol. 431, pp. 99-104, Sept. 2004.
 
39
N.J. Krogan et al., "High-Definition Macromolecular Composition of Yeast RNA-Processing Complexes," Molecular Cell, vol. 13, pp. 225-239, 2004.
 
40
I. Xenarios, L. Salwinski, X.J. Duan, P. Higney, S.-M. Kim, and D. Eisenberg, "DIP, the Database of Interacting Proteins: A Research Tool for Studying Cellular Networks of Protein Interactions," Nucleic Acids Research, vol. 30, no. 1, pp. 303-305, 2002.
 
41
L.J. Lu, Y. Xia, A. Paccanaro, H. Yu, and M. Gerstein, "Assessing the Limits of Genomic Data Integration for Predicting Protein Networks," Genome Research, vol. 15, no. 7, pp. 945-953, 2005.
 
42
H. Yu, N.M. Luscombe, H.X. Lu, X. Zhu, Y. Xia, J.-D.J. Han, N. Bertin, S. Chung, M. Vidal, and M. Gerstein, "Annotation Transfer between Genomes: Protein-Protein Interologs and Protein-dna Regulogs," Genome Research, vol. 14, no. 6, pp. 1107-1118, 2004.
 
43
C.-C. Chang and C.-J. Lin, "LIBSVM: A Library for Support Vector Machine," http://www.csie.ntu.edu.tw/cjlin/libsvm, 2008.
 
44
R.P.W. Duin, P. Juszczak, P. Paclik, E. Pekalska, D. de Ridder, and D.M.J. Tax, "PRTools, A Matlab Toolbox for Pattern Recognition," http://prtools.org, 2009.
 
45
C.-W. Hsu, C.-C. Chang, and L. Chih-Jen, "A Practical Guide to Support Vector Classification," technical report, Dept. of Computer Science and Information Eng., Nat'l Taiwan Univ., 2003.
 
46
 
47
S.L. Wong et al., "Combining Biological Networks to Predict Genetic Interactions," Proc. Nat'l Academy of Sciences USA, vol. 101, pp. 15682-15687, 2004.
 
48
49

Collaborative Colleagues:
Marc Hulsman: colleagues
Marcel J. T. Reinders: colleagues
Dick de Ridder: colleagues