ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Extracting structured motifs using a suffix tree—algorithms and application to promoter consensus identification
Full text PdfPdf (796 KB)
Source Annual Conference on Research in Computational Molecular Biology archive
Proceedings of the fourth annual international conference on Computational molecular biology table of contents
Tokyo, Japan
Pages: 210 - 219  
Year of Publication: 2000
ISBN:1-58113-186-0
Authors
Laurent Marsan  Institut Gaspard Monge, Université de Marne la Vallée, 2, rue de la Butte Verte, 93160 - Noisy le Grand
Marie-France Sagot  Institut Gaspard Monge, Université de Marne la Vallée, 2, rue de la Butte Verte, 93160 - Noisy le Grand and Institut Pasteur, Service d'Informatique Scientifique, 28, rue du Dr. Roux, 75324 - Paris Cedex 15
Sponsor
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 46,   Citation Count: 8
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/332306.332553
What is a DOI?

Warning: The download time has expired please click on the item to try again.


ABSTRACT

This paper introduces two exact algorithms for extracting conserved structured motifs from a set of DNA sequences. Structured motifs are composed of p ⪈ 2 parts separated by constrained spacers These algorithms use a suffix tree for fulfilling this task. They are efficient enough to be able to extract site consensus, such as promoter sequences, from a whole collection of non coding sequences extracted from a genome. In particular, their time complexity scales linearly with N2n where n is the average length of the sequences and N their number. An application with interesting results to the identification of promoter consensus sequences in bacterial genomes is shown.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
O. O. Berg and P. H. yon Hippel. Seleetton of DNA binding sites by regulatory proteins. If. The binding specificity of cyclic AMP receptor protein to recognition sites. J. Mol. B~ol., 200:709-793, 1988.
 
2
P. Bieganski, J. Riedl, J. V. Carl~, and E M. P~zel. Generalized suffix trees for biological sequence data: applications and implementations. In Proc. of the .27th Hau~a~ Int. Oonf. on Systems Sc~., pages 35-44. iEEE Comp,ter Society Press, 1994.
 
3
A. Brazrna, I. Jona~en, J. Vdo, and E Ukkonen. Predicting gene regulatory elements ;n sdzao on a genomic scale. Gcno,ne Research, 8:1202-1215, 1998.
 
4
L R. Cardon and G. D. Stormo. Expectation Maximizalion algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragqnents. J. Mol. Bwl., 223 139-170, 1992.
 
5
B. Combrugghe, S. B,mby, and H. Buc. Cyclic AMP receptor protein: role in transcription activation Sczence, 224:831- 838, 1984.
 
6
Y. M Fraenkel, Y. Mandel, D Friedherg, and H Margalit. Identification of common motifs in ,nahgned DNA sequences: application to ~schcr~ch~a eoh trp regulon. Oomput. Appl B;osc;., 11.379-387, 1995.
 
7
D. J. Galas, M. Eggert, and M S Waterman. Rigorous pattern-recognition methods for DNA sequences. Analysts of promoter sequenee~q from Escher~ch~a colt J. Mot. B:ol., 18l~:117-128, 1985.
 
8
C. A. Gross, M. Lon~to, and R. Losiek. Bacterial sigma factors. In S. L. Knight and K. R. Yamamoto, editors, 7Yanscr~phonal Rcgulatlon, volume 1, pages 129-176. Cold Spring Harbor Laboratory Press, 1992.
 
9
 
10
J. D. Helmann. Compilation and analysis of Baedlus subtdts c~-dependent promoter sequences: evidence for extended contact between RNA polymerase and upstream promoter DNA. Nucleic Acids Res., ~a:2aal-2aao, lgos.
 
11
A. K!ingenhoff, K. Frech, K. Qimndt, and T. Werner. Functional promoter modules can be detected by formal models independent of overall nucleotide sequence similarity. Bsotnformatscs 1, 15:180-186, 1999.
 
12
C. E. Lawrence and A. A. Reilly. An expectation maximization (F.M) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins: struct., lunar., and genetics, 7:41-51, 1990.
 
13
B. Lewin. Genes VI. Oxford University Press, 1997.
14
 
15
M. A. Mulder, H. Zappe, and L. M. 8teyn. Mycobacterial promoters. Tuber. L~ng Dis., 78:211-223, 1997.
 
16
O. N. Ozoline, A. A. Deer, and M. V. Arkhipova. Noncanonical sequence elements in the promoter structure, cluster analysis of promoters recognized by Escher, ch,a cot: RNA polymerase. Pluelelc Acids Res., 25:4703-4709, 1998.
 
17
 
18
M.T. Record, W. S. Reznikoff, M. L. Craig, K. L McQuade, and P. J. Schlax. Esc~erichia coli RNA polymerase a?~ promotets, and the kinetics of the steps of transcription initiation. ID F. C. Neidhardt, editor, Escher~ch~a colt and Salmonella, volume 1, pages 792-820. ASM Press, 1996.
 
19
 
20
T. D. Schneider, G. D. Stormo, L. Gold, and A. Ehrenfeucht. Information content of binding sites on nucleotide sequences. Y. Mot. Blot., 188:415-431, 1986.
 
21
G. D. Stormo and O. W. Hartzell Ill. identifying proteinbinding rotes from unaligned DNA fragments. Proc. Natl. Acad. Scs. USA, 86:1183-1187, 1989.
 
22
 
23
E. Ukkonen On-line construction of snmx-trees. Algorithm, ca, 14:249-260, 1995.
 
24
J. van Helden, B. Andre, and J. Collado-Vides. Extracting regulatory sites from the upstream region of yeast genes by computational a~alysis of oligonueleotide frequencies. J. Mol. B,ot., 281:827-842, 1998.
 
25
A. Vanet, L. Marsan, A. Labigne, and M.-F. Sagot. Inferring regulatory elements from a whole genome. An analysis of the o~~ family of promoter signals. }999. submitted to J. Mol. B~ol.
 
26
A. Vane{, L. Marsan, and M.-F. Sagot. Promoter sequences and algorithlnical methods for identifying them. Research m Macrab~otogy, 150:1-21, 1999. in press.
 
27
T. Werner. Models for prediction and recognition of eukaryotic promoters. Mature. Oenome, 10:168-175, 1999
 
28
F. Wolfertstetter, K. Frech, G. Hcrrmann, and T. Werner. Identification of functional elements in unaligned nucleic acid sequence~ by a novel tuple search algorithms. Oomput. Appl. B,osc:., 12:71-80, 1996.

CITED BY  8

Collaborative Colleagues:
Laurent Marsan: colleagues
Marie-France Sagot: colleagues