ACM Home Page
Please provide us with feedback. Feedback
Finding motifs using random projections
Full text PdfPdf (147 KB)
Source Annual Conference on Research in Computational Molecular Biology archive
Proceedings of the fifth annual international conference on Computational biology table of contents
Montreal, Quebec, Canada
Pages: 69 - 76  
Year of Publication: 2001
ISBN:1-58113-353-7
Authors
Jeremy Buhler  Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA
Martin Tompa  Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA
Sponsor
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 61,   Citation Count: 26
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/369133.369172
What is a DOI?

Warning: The download time has expired please click on the item to try again.


ABSTRACT

Pevzner and Sze [23] considered a precise version of the motif discovery problem and simultaneously issued an algorithmic challenge: find a motif M of length 15, where each planted instance differs from M in 4 positions. Whereas previous algorithms all failed to solve this (15,4)-motif problem. Pevzner and Sze introduced algorithms that succeeded. However, their algorithms failed to solve the considerably more difficult (14,4)-, (16,5)-, and (18,6)-motif problems.

We introduce a novel motif discovery algorithm based on the use of random projections of the input's substrings. Experiments on simulated data demonstrate that this algorithm performs better than existing algorithms and, in particular, typically solves the difficult (14,4)-, (16,5)-, and (18,6)-motif problems quite efficiently. A probabilistic estimate shows that the small values of d for which the algorithm fails to recover the planted (l, d)-motif are in all likelihood inherently impossible to solve. We also present experimental results on realistic biological data by identifying ribosome binding sites in prokaryotes as well as a number of known transcriptional regulatory motifs in eukaryotes.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
R. D. Andersen, S. J. Taplitz, S. Wong, G. Bristol, B. Larkin, and H. R. Herschman. Metal-dependent binding of a factor in vivo to the metal-responsive elements of the metallothionein 1 gene promoter. Molecular and Cellular Biology, 7:3574-81, 1987.
 
2
3
 
4
 
5
D. S. W. Boam, A. R. Clark, and K. Docherty. Positive and negative regulation of the human insulin gene by multiple trans-acting factors. Journal of Biological Chemistry, 265:8285-96, 1990.
 
6
A. Brazma, I. Jonassen, J. Vilo, and E. Ukkonen. Predicting gene regulatory elements in silico on a genomic scale. Genome Research, 15:1202-1215, 1998.
 
7
J. Buhler. Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics, 2001. To appear.
 
8
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1):1-38, 1977.
 
9
Y. M. Fraenkel, Y. Mandel, D. Friedberg, and H. Margalit. Identification of common motifs in unaligned DNA sequences: application to Escherichia coli Lrp regulon. Computer Applications in the Biosciences, 11(4):379-387, 1995.
 
10
D. J. Galas, M. Eggert, and M. S. Waterman. Rigorous pattern-recognition methods for DNA sequences: Analysis of promoter sequences from Escherichia coli. Journal of Molecular Biology, 186(1):117-128, Nov. 1985.
 
11
 
12
W. S. Hayes and M. Borodovsky. Deriving ribosomal binding site (RBS) statistical models from unannotated DNA sequences and the use of the RBS model for N-terminal prediction. In Pacific Symposium on Biocomputing, pages 279-290, 1998.
 
13
G. Z. Hertz and G. D. Stormo. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics, 15(7/8):563-577, July/August 1999.
14
 
15
M. Kozak. Comparison of initiation of protein synthesis in procaryotes, eucaryotes, and organelles. Microbiological Reviews, 47(1):1-45, 1983.
 
16
C. E. Lawrence, S. F. Altschul, M. S. Boguski, J. S. Liu, A. F. Neuwald, and J. C. Wootton. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 262:208-214, 8 October 1993.
 
17
C. E. Lawrence and A. A. Reilly. An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins: Structure, Function, and Genetics, 7:41-51, 1990.
 
18
B. Lewin. Genes VI. Oxford University Press, 1997.
 
19
M. Linial, N. Linial, N. Tishby, and G. Yona. Global self-organization of all known protein sequences reveals inherent biological signatures. Journal of Molecular Biology, 268:539-556, 1997.
 
20
C. J. McInerny, J. F. Partridge, G. E. Mikesell, D. P. Creemer, and L. L. Breeden. A novel Mcm1-dependent element in the SWI4, CLN3, CDC6, and CDC47 promoters activates M/G-specific transcription. Genes & Development, 11:1277-1288, 1997.
 
21
A. L. Means and P. G. Farnham. Transcription initiation from the dihydrofolate reductase promoter is positioned by hip1 binding at the initiation site. Molecular and Cellular Biology, 10:653-61, 1990.
 
22
S. Natsan and M. Gilman. Yy1 facilitates the association of serum response factor with the c-fos serum response element. Molecular and Cellular Biology, 15:5975-82, 1995.
 
23
24
25
 
26
 
27
 
28
R. Staden. Methods for discovering novel motifs in nucleic acid sequences. Computer Applications in the Biosciences, 5(4):293-298, 1989.
 
29
 
30
J. van Helden, B. Andr~, and J. Collado-Vides. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. Journal of Molecular Biology, 281(5):827-842, Sept. 4 1998.
 
31
E. Wingender, P. Dietze, H. Karas, and R. Kn~ppel. TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Research, 24(1):238-241, 1996. http://transfac.gbfbraunschweig. de/TRANSFAC/.

CITED BY  26

Collaborative Colleagues:
Jeremy Buhler: colleagues
Martin Tompa: colleagues