|
Warning: The download time has expired please click on the item to try again.
ABSTRACT
This paper introduces two exact algorithms for extracting conserved structured motifs from a set of DNA sequences. Structured motifs are composed of p ⪈ 2 parts separated by constrained spacers These algorithms use a suffix tree for fulfilling this task. They are efficient enough to be able to extract site consensus, such as promoter sequences, from a whole collection of non coding sequences extracted from a genome. In particular, their time complexity scales linearly with N2n where n is the average length of the sequences and N their number. An application with interesting results to the identification of promoter consensus sequences in bacterial genomes is shown.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
O. O. Berg and P. H. yon Hippel. Seleetton of DNA binding sites by regulatory proteins. If. The binding specificity of cyclic AMP receptor protein to recognition sites. J. Mol. B~ol., 200:709-793, 1988.
|
| |
2
|
P. Bieganski, J. Riedl, J. V. Carl~, and E M. P~zel. Generalized suffix trees for biological sequence data: applications and implementations. In Proc. of the .27th Hau~a~ Int. Oonf. on Systems Sc~., pages 35-44. iEEE Comp,ter Society Press, 1994.
|
| |
3
|
A. Brazrna, I. Jona~en, J. Vdo, and E Ukkonen. Predicting gene regulatory elements ;n sdzao on a genomic scale. Gcno,ne Research, 8:1202-1215, 1998.
|
| |
4
|
L R. Cardon and G. D. Stormo. Expectation Maximizalion algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragqnents. J. Mol. Bwl., 223 139-170, 1992.
|
| |
5
|
B. Combrugghe, S. B,mby, and H. Buc. Cyclic AMP receptor protein: role in transcription activation Sczence, 224:831- 838, 1984.
|
| |
6
|
Y. M Fraenkel, Y. Mandel, D Friedherg, and H Margalit. Identification of common motifs in ,nahgned DNA sequences: application to ~schcr~ch~a eoh trp regulon. Oomput. Appl B;osc;., 11.379-387, 1995.
|
| |
7
|
D. J. Galas, M. Eggert, and M S Waterman. Rigorous pattern-recognition methods for DNA sequences. Analysts of promoter sequenee~q from Escher~ch~a colt J. Mot. B:ol., 18l~:117-128, 1985.
|
| |
8
|
C. A. Gross, M. Lon~to, and R. Losiek. Bacterial sigma factors. In S. L. Knight and K. R. Yamamoto, editors, 7Yanscr~phonal Rcgulatlon, volume 1, pages 129-176. Cold Spring Harbor Laboratory Press, 1992.
|
| |
9
|
|
| |
10
|
J. D. Helmann. Compilation and analysis of Baedlus subtdts c~-dependent promoter sequences: evidence for extended contact between RNA polymerase and upstream promoter DNA. Nucleic Acids Res., ~a:2aal-2aao, lgos.
|
| |
11
|
A. K!ingenhoff, K. Frech, K. Qimndt, and T. Werner. Functional promoter modules can be detected by formal models independent of overall nucleotide sequence similarity. Bsotnformatscs 1, 15:180-186, 1999.
|
| |
12
|
C. E. Lawrence and A. A. Reilly. An expectation maximization (F.M) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins: struct., lunar., and genetics, 7:41-51, 1990.
|
| |
13
|
B. Lewin. Genes VI. Oxford University Press, 1997.
|
 |
14
|
|
| |
15
|
M. A. Mulder, H. Zappe, and L. M. 8teyn. Mycobacterial promoters. Tuber. L~ng Dis., 78:211-223, 1997.
|
| |
16
|
O. N. Ozoline, A. A. Deer, and M. V. Arkhipova. Noncanonical sequence elements in the promoter structure, cluster analysis of promoters recognized by Escher, ch,a cot: RNA polymerase. Pluelelc Acids Res., 25:4703-4709, 1998.
|
| |
17
|
|
| |
18
|
M.T. Record, W. S. Reznikoff, M. L. Craig, K. L McQuade, and P. J. Schlax. Esc~erichia coli RNA polymerase a?~ promotets, and the kinetics of the steps of transcription initiation. ID F. C. Neidhardt, editor, Escher~ch~a colt and Salmonella, volume 1, pages 792-820. ASM Press, 1996.
|
| |
19
|
|
| |
20
|
T. D. Schneider, G. D. Stormo, L. Gold, and A. Ehrenfeucht. Information content of binding sites on nucleotide sequences. Y. Mot. Blot., 188:415-431, 1986.
|
| |
21
|
G. D. Stormo and O. W. Hartzell Ill. identifying proteinbinding rotes from unaligned DNA fragments. Proc. Natl. Acad. Scs. USA, 86:1183-1187, 1989.
|
| |
22
|
|
| |
23
|
E. Ukkonen On-line construction of snmx-trees. Algorithm, ca, 14:249-260, 1995.
|
| |
24
|
J. van Helden, B. Andre, and J. Collado-Vides. Extracting regulatory sites from the upstream region of yeast genes by computational a~alysis of oligonueleotide frequencies. J. Mol. B,ot., 281:827-842, 1998.
|
| |
25
|
A. Vanet, L. Marsan, A. Labigne, and M.-F. Sagot. Inferring regulatory elements from a whole genome. An analysis of the o~~ family of promoter signals. }999. submitted to J. Mol. B~ol.
|
| |
26
|
A. Vane{, L. Marsan, and M.-F. Sagot. Promoter sequences and algorithlnical methods for identifying them. Research m Macrab~otogy, 150:1-21, 1999. in press.
|
| |
27
|
T. Werner. Models for prediction and recognition of eukaryotic promoters. Mature. Oenome, 10:168-175, 1999
|
| |
28
|
F. Wolfertstetter, K. Frech, G. Hcrrmann, and T. Werner. Identification of functional elements in unaligned nucleic acid sequence~ by a novel tuple search algorithms. Oomput. Appl. B,osc:., 12:71-80, 1996.
|
CITED BY 8
|
|
Robin Gras , David Hernandez , Patricia Hernandez , Nadine Zangge , Yoann Mescam , Julien Frey , Olivier Martin , Jacques Nicolas , Ron D. Appel, Cooperative Metaheuristics for Exploring Proteomic Data, Artificial Intelligence Review, v.20 n.1-2, p.95-120, October 2003
|
|
|
|
|
|
|
|
|
|
|
|
Costas S. Iliopoulos , Christos Makris , Yannis Panagis , Katerina Perdikuri , Evangelos Theodoridis , Athanasios Tsakalidis, The Weighted Suffix Tree: An Efficient Data Structure for Handling Molecular Weighted Sequences and its Applications, Fundamenta Informaticae, v.71 n.2,3, p.259-277, August 2006
|
|
|
|
|
|
|
|
|
|
|