|
ABSTRACT
Computationally identifying transcription factor binding sites in the promoter regions of genes is an important problem in computational biology and has been under intensive research for a decade. To predict the binding site locations efficiently, many algorithms that incorporate either approximate or heuristic techniques have been developed. However, the prediction accuracy is not satisfactory and binding site prediction thus remains a challenging problem. In this paper, we develop an approach that can be used to predict binding site motifs using a genetic algorithm. Based on the generic framework of a genetic algorithm, the approach explores the search space of all possible starting locations of the binding site motifs in different target sequences with a population that undergoes evolution. Individuals in the population compete to participate in the crossovers and mutations occur with a certain probability. Initial experiments demonstrated that our approach could achieve high prediction accuracy in a small amount of computation time. A promising advantage of our approach is the fact that the computation time does not explicitly depend on the length of target sequences and hence may not increase significantly when the target sequences become very long.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Bailey, T.L. and Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, Stanford, CA, AAAI Press, Bethesda, MD, 1994, 28--36.
|
| |
2
|
Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: A sequence logo generator, Genome Research, 14, 6 (June 2004), 1188--1190.
|
| |
3
|
|
| |
4
|
Galas, D.J. and Schmitz, A. A DNA footprinting: a simple method for the detection of protein-DNA binding specificity. Nucleic Acids Res., 5, 9 (Sep. 1978), 3157--3170.
|
| |
5
|
Garner, M. M., and A. Revzin. A gel electrophoresis method for quantifying the binding of proteins to specific DNA regions: application to components of the Escherichia coli lactose operon regulatory system. Nucleic Acids Res. 9, 13 (July, 1981), 3047--3060.
|
| |
6
|
Harbison, C.T., Gordon, D.B., Lee, T.I., Rinaldi, N., Macisaac, K.D., Danford, T.D., Hannett, N.M., Tagne, J.-B., Reynolds, D.B., Yoo, J., Jennings, E.G., Zeitlinger, J., Pokholok, D.K., Kellis, M., Rolfe, P.A., Takusagawa, K.T., Lander, E.S., Gifford, D.K., Fraenkel, E. and Young, R.A. Transcriptional Regulatory Code of a Eukaryotic Genome. Nature, 431, 7004 (Sep. 2004), 99--104.
|
| |
7
|
Hertz, G.Z. and Stormo, G.D. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics, 15, 7 (July, 1999), 563--577.
|
| |
8
|
Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F. and Wooton, J.C. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 262, 8 (Oct. 1993), 208--214.
|
| |
9
|
|
| |
10
|
Liu, J.S., Neuwald, A.F. and Lawrence, C.E. Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Am. Stat. Assoc., 90, 432 (Nov. 1995), 1156--1170.
|
| |
11
|
Liu, X., Brutlag, D.L. and Liu, J.S. BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac. Symp. Biocomput., 6, 2001, 127--138.
|
| |
12
|
Notredame, C. and Higgins, D.G. SAGA: Sequence alignment by genetic algorithm. Nucleic Acids Res. 24, 8 (Apr. 1996), 1515--1524
|
| |
13
|
Roth, F.R., Hughes, J. D., Estep, P. E. and Church G. M. Finding DNA regulatory motifs within unaligned non-coding sequences clustered by whole-Genome mRNA quantitation. Nature Biotechnology 16, 10 (Oct. 1998), 939--45.
|
| |
14
|
Stormo, G.D. Computer methods for analyzing sequence recognition of nucleic acids. Annu. Rev. BioChem. 17, 1988, 241--263.
|
| |
15
|
Stormo,G.D. and Hartzell, G.W. Identifying protein-binding sites from unaligned DNA fragments. Proc. Natl Acad. Sci. USA, 86, 4, (Feb. 1989), 1183--1187.
|
|