ACM Home Page
Please provide us with feedback. Feedback
Modeling evolutionary fitness for DNA motif discovery
Full text PdfPdf (1.38 MB)
Source
Genetic And Evolutionary Computation Conference archive
Proceedings of the 11th Annual conference on Genetic and evolutionary computation table of contents
Montreal, Québec, Canada
SESSION: Track 3: bioinformatics and computational biology table of contents
Pages 225-232  
Year of Publication: 2009
ISBN:978-1-60558-325-9
Authors
Sven Rahmann  TU Dortmund, Dortmund, Germany
Tobias Marschall  TU Dortmund, Dortmund, Germany
Frank Behler  TU Dortmund, Dortmund, Germany
Oliver Kramer  TU Dortmund, Dortmund, Germany
Sponsors
SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 18,   Downloads (12 Months): 51,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1569901.1569933
What is a DOI?

ABSTRACT

The motif discovery problem consists of finding over-represented patterns in a collection of sequences. Its difficulty stems partly from the large number of possibilities to define both the motif space to be searched and the notion of over-representation. Since the size of the search space is generally exponential in the motif length, many heuristic methods, including evolutionary algorithms, have been developed. However, comparatively little attention has been devoted to the adequate evaluation of motif quality, especially when comparing motifs of different lengths. We propose an evolution strategy to solve the motif discovery problem based on a new fitness function that simultaneously takes into account (1) the number of motif occurrences, (2) the motif length, and (3) its information content. Experimental results show that the proposed method succeeds in uncovering the correct motif positions and length with high accuracy.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
T. L. Bailey and C. Elkan. Fitting a mixture model by expectation maximization to discover motifs in biopolymer. In ISMB' 94, pages 28--36, 1994.
 
2
 
3
 
4
 
5
 
6
 
7
G. B. Fogel, D. G. Weekes, G. Varga, E. R. Dow, H. B. Harlow, J. E. Onyia, and C. Su. Discovery of sequence motifs related to coexpression of genes using evolutionary computation. Nucleic Acids Res, 32(13): 3826--3835, 2004.
 
8
 
9
S. T. Jensen, X. S. Liu, Q. Zhou, and J. S. Liu. Computational discovery of gene regulatory binding motifs: a Bayesian perspective. Statistical Science, 19(1):188---204, 2004.
 
10
J. Kalinowski et al. The complete Corynebacterium glutamicum ATCC 13032 genome sequence and its impact on the production of l-aspartate-derived amino acids and vitamins. Journal of Biotechnology, 104(1-3):5--25, 2003.
 
11
M. Kaya. Motif discovery using multi-objective genetic algorithm in biosequences. Advances in Intelligent Data Analysis VII, 4723:320--331, 2007.
 
12
T. A. Kohl, J. Baumbach, B. Jungwirth, A. Pühler, and A. Tauch. The GlxR regulon of the amino acid producer Corynebacterium glutamicum: In silico and in vitro detection of DNA binding sites of a global transcription regulator. Journal of Biotechnology, 135(4): 340--350, 2008.
 
13
14
15
 
16
G. Pavesi, P. Mereghetti, G. Mauri, and G. Pesole. Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Res, 32(Web Server Issue):W199--W203, 2004.
 
17
S. Rahmann, T. Müller, and M. Vingron. On the power of profiles for transcription factor binding site detection. Statistical Applications in Genetics and Molecular Biology, 2(1):Article 7, 2003.
 
18
K. Robinson, A. McGuire, and G. Church. A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K 12 genome. Journal of Molecular Biology, 284:241--254, 1998.
 
19
A. Sandelin, W. Alkema, P. G. Engström, W. W. Wasserman, and B. Lenhard. JASPAR: an open access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res, 32(1):D91--D94, 2004.
 
20
G. Sandve, O. Abul, V. Walseng, and F. Drabløs. Improved benchmarks for computational motif discovery. BMC Bioinformatics, 8(1):193, 2007.
 
21
G. Sandve and F. Drabløs. A survey of motif discovery methods in an integrated framework. Biology Direct, 1:Article 11, 2006.
 
22
T. Schneider and R. Stephens. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res, 18:6097--6100, 1990.
 
23
T. Schneider, G. Stromo, L. Gold, and A. Ehrenfeucht. Information content of binding sites on nucleotide sequences. Journal of Molecular Biology, 188(3):415--431, 1986.
 
24
S. Sinha and M. Tompa. YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res, 31(13):3586--3588, 2003.
 
25
G. D. Stormo. DNA binding sites: representation and discovery. Bioinformatics, 16:16--23, 2000.
 
26
M. Tompa et al. Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology, 23:137--144, 2005.
 
27

Collaborative Colleagues:
Sven Rahmann: colleagues
Tobias Marschall: colleagues
Frank Behler: colleagues
Oliver Kramer: colleagues