ACM Home Page
Please provide us with feedback. Feedback
MDGA: motif discovery using a genetic algorithm
Full text PdfPdf (475 KB)
Source Genetic And Evolutionary Computation Conference archive
Proceedings of the 2005 conference on Genetic and evolutionary computation table of contents
Washington DC, USA
SESSION: Biological applications table of contents
Pages: 447 - 452  
Year of Publication: 2005
ISBN:1-59593-010-8
Authors
Dongsheng Che  University of Georgia, Athens, GA
Yinglei Song  University of Georgia, Athens, GA
Khaled Rasheed  University of Georgia, Athens, GA
Sponsors
SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 48,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1068009.1068080
What is a DOI?

ABSTRACT

Computationally identifying transcription factor binding sites in the promoter regions of genes is an important problem in computational biology and has been under intensive research for a decade. To predict the binding site locations efficiently, many algorithms that incorporate either approximate or heuristic techniques have been developed. However, the prediction accuracy is not satisfactory and binding site prediction thus remains a challenging problem. In this paper, we develop an approach that can be used to predict binding site motifs using a genetic algorithm. Based on the generic framework of a genetic algorithm, the approach explores the search space of all possible starting locations of the binding site motifs in different target sequences with a population that undergoes evolution. Individuals in the population compete to participate in the crossovers and mutations occur with a certain probability. Initial experiments demonstrated that our approach could achieve high prediction accuracy in a small amount of computation time. A promising advantage of our approach is the fact that the computation time does not explicitly depend on the length of target sequences and hence may not increase significantly when the target sequences become very long.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Bailey, T.L. and Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, Stanford, CA, AAAI Press, Bethesda, MD, 1994, 28--36.
 
2
Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: A sequence logo generator, Genome Research, 14, 6 (June 2004), 1188--1190.
 
3
 
4
Galas, D.J. and Schmitz, A. A DNA footprinting: a simple method for the detection of protein-DNA binding specificity. Nucleic Acids Res., 5, 9 (Sep. 1978), 3157--3170.
 
5
Garner, M. M., and A. Revzin. A gel electrophoresis method for quantifying the binding of proteins to specific DNA regions: application to components of the Escherichia coli lactose operon regulatory system. Nucleic Acids Res. 9, 13 (July, 1981), 3047--3060.
 
6
Harbison, C.T., Gordon, D.B., Lee, T.I., Rinaldi, N., Macisaac, K.D., Danford, T.D., Hannett, N.M., Tagne, J.-B., Reynolds, D.B., Yoo, J., Jennings, E.G., Zeitlinger, J., Pokholok, D.K., Kellis, M., Rolfe, P.A., Takusagawa, K.T., Lander, E.S., Gifford, D.K., Fraenkel, E. and Young, R.A. Transcriptional Regulatory Code of a Eukaryotic Genome. Nature, 431, 7004 (Sep. 2004), 99--104.
 
7
Hertz, G.Z. and Stormo, G.D. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics, 15, 7 (July, 1999), 563--577.
 
8
Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F. and Wooton, J.C. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 262, 8 (Oct. 1993), 208--214.
 
9
 
10
Liu, J.S., Neuwald, A.F. and Lawrence, C.E. Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Am. Stat. Assoc., 90, 432 (Nov. 1995), 1156--1170.
 
11
Liu, X., Brutlag, D.L. and Liu, J.S. BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac. Symp. Biocomput., 6, 2001, 127--138.
 
12
Notredame, C. and Higgins, D.G. SAGA: Sequence alignment by genetic algorithm. Nucleic Acids Res. 24, 8 (Apr. 1996), 1515--1524
 
13
Roth, F.R., Hughes, J. D., Estep, P. E. and Church G. M. Finding DNA regulatory motifs within unaligned non-coding sequences clustered by whole-Genome mRNA quantitation. Nature Biotechnology 16, 10 (Oct. 1998), 939--45.
 
14
Stormo, G.D. Computer methods for analyzing sequence recognition of nucleic acids. Annu. Rev. BioChem. 17, 1988, 241--263.
 
15
Stormo,G.D. and Hartzell, G.W. Identifying protein-binding sites from unaligned DNA fragments. Proc. Natl Acad. Sci. USA, 86, 4, (Feb. 1989), 1183--1187.


Collaborative Colleagues:
Dongsheng Che: colleagues
Yinglei Song: colleagues
Khaled Rasheed: colleagues