|
ABSTRACT
Choosing the right representation for a problem is important. In this article we introduce a linear genetic programming approach for motif discovery in protein families, and we also present a thorough comparison between our approach and Koza-style genetic programming using ADFs. In a study of 45 protein families, we demonstrate that our algorithm, given equal processing resources and no prior knowledge in shaping of datasets, consistently generates motifs that are of significantly better quality than those we found by using trees as representation. For several of the studied protein families we evolve motifs comparable to those found in Prosite, a manually curated database of protein motifs.Our linear genome gave better results than Koza-style genetic programming for 37 of 45 families. The difference is statistically significant for 24 of the families at the 99% confidence level.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Arne Halaas , Børge Svingen , Magnar Nedland , Pål Sætrom , Ola Snøve, Jr. , Olaf René Birkeland, A recursive MISD architecture for pattern matching, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, v.12 n.7, p.727-734, July 2004
[doi> 10.1109/TVLSI.2004.830918]
|
| |
2
|
Boeckmann B., BAiroch A., Apweiler R., Blatter M, Estreicher A., Gasteiger E., Martin M, Michoud K., O'Donovan C., Phan I., Pilbout S., and Schneider M. The swiss-prot protein knowledgebase and its supplement trembl in 2003. Nucleic Acids Research, 31:365--370, 2003.
|
| |
3
|
|
| |
4
|
A. Brazma, I. Jonassen, I. Eidhammer, and D. Gilbert. Approaches to the automatic discovery of patterns in biosequences. Journal of Computational Biology, 5(2):277--304, 1998.
|
| |
5
|
Jason M. Daida and Adam M. Hilss. Identifying structural mechanisms in standard genetic programming. In E. Cantu-Paz, J. A. Foster, K. Deb, D. Davis, R. Roy, U.-M. O'Reilly, H.-G. Beyer, R. Standish, G. Kendall, S. Wilson, M. Harman, J. Wegener, D. Dasgupta, M. A. Potter, A. C. Schultz, K. Dowsland, N. Jonoska, and J. Miller, editors, Genetic and Evolutionary Computation - GECCO-2003, volume 2724 of LNCS, pages 1639--1651, Chicago, 12-16 July 2003. Springer-Verlag.
|
| |
6
|
Yoav Freund and Robert E. Schapire. Experiments with a new boosting algorithm. In International Conference on Machine Learning, pages 148--156, 1996.
|
| |
7
|
Larry Gonick and Woollcott Smith. Cartoon Guide to Statistics, chapter 9. HarperPerennial, 1993.
|
| |
8
|
Yuh-Jyh Hu. Biopattern discovery by genetic programming. In John R. Koza, Wolfgang Banzhaf, Kumar Chellapilla, Kalyanmoy Deb, Marco Dorigo, David B. Fogel, Max H. Garzon, David E. Goldberg, Hitoshi Iba, and Rick Riolo, editors, Genetic Programming 1998: Proceedings of the Third Annual Conference, pages 152-157, University of Wisconsin, Madison, Wisconsin, USA, 22--25 July 1998. Morgan Kaufmann.
|
| |
9
|
Nicolas Hulo, Christian J. A. Sigrist, Virginie Le Saux, Petra S. Langendijk-Genevaux, Lorenza Bordoli, Alexandre Gattiker, Edouard De Castro, Philipp Bucher, and Amos Bairoch. Recent improvements to the PROSITE database. Nucl. Acids Res., 32(90001):D134--137, 2004.
|
| |
10
|
John R. Koza and David Andre. Automatic discovery using genetic programming of an unknown-sized detector of protein motifs containing repeatedly-used subexpressions. In Justinian P. Rosca, editor, Proceedings of the Workshop on Genetic Programming: From Theory to Real-World Applications, pages 89--97, Tahoe City, California, USA, 9 July 1995.
|
| |
11
|
John R. Koza and David Andre. Automatic discovery of protein motifs using genetic programming. In Xin Yao, editor, Evolutionary Computation: Theory and Applications. World Scientific, Singapore, 1996. In Press 1997?
|
| |
12
|
Bjorn Olsson. Using evolutionary algorithms in the design of protein fingerprints. In Wolfgang Banzhaf, Jason Daida, Agoston E. Eiben, Max H. Garzon, Vasant Honavar, Mark Jakiela, and Robert E. Smith, editors, Proceedings of the Genetic and Evolutionary Computation Conference, volume 2, pages 1636-1642, Orlando, Florida, USA, 13-17 July 1999. Morgan Kaufmann.
|
| |
13
|
|
| |
14
|
I. Rigoutsos, A. Floratos, L. Parida, Y. Gao, and D. Platt. The emergence of pattern discovery techniques in computational biolog. Metabolic Engineering, 2:159--177, 2000.
|
| |
15
|
Brian J. Ross. Probabilistic pattern matching and the evolution of stochastic regular expressions. In Scott Brave and Annie S. Wu, editors, Late Breaking Papers at the 1999 Genetic and Evolutionary Computation Conference, pages 229--237, Orlando, Florida, USA, 13 July 1999.
|
| |
16
|
|
| |
17
|
Brian J. Ross. The evaluation of a stochastic regular motif language for protein sequences. In Lee Spector, Erik D. Goodman, Annie Wu, W. B. Langdon, Hans-Michael Voigt, Mitsuo Gen, Sandip Sen, Marco Dorigo, Shahram Pezeshk, Max H. Garzon, and Edmund Burke, editors, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), pages 120--128, San Francisco, California, USA, 7-11 July 2001. Morgan Kaufmann.
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
|