ACM Home Page
Please provide us with feedback. Feedback
A comparative analysis method for detecting binding sites in coding regions
Full text PdfPdf (313 KB)
Source Annual Conference on Research in Computational Molecular Biology archive
Proceedings of the seventh annual international conference on Research in computational molecular biology table of contents
Berlin, Germany
Pages: 57 - 66  
Year of Publication: 2003
ISBN:1-58113-635-8
Author
Mathieu Blanchette  University of California, Santa Cruz, Santa Cruz, CA
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 37,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/640075.640082
What is a DOI?

ABSTRACT

While the problem of predicting transcription factor binding sites in a gene's promoter region has been extensively studied, binding sites located in coding regions are also crucial for regulating gene expression but are more difficult to detect. Coding region binding sites are mostly involved in splicing regulation, but also in transcriptional and post-transcriptional regulation. We consider the problem of predicting such binding sites by comparative analysis. Comparative analysis is based on the idea that functional sequences tend to evolve at slower rate than nonfunctional sequence, making unusually well conserved regions likely to be of interest. The difficulty in applying comparative analysis to the detection of binding sites located in coding sequence is that the whole sequence is under selective pressure, because it needs to code for a functional protein. We present a technique to distinguish between conservation due to constraints on the amino acid product and conservation due to constraints imposed by regulatory factors. More precisely, we show how to calculate the probability of observing a certain degree of conservation among the nucleotides of given set of orthologous codons, given a set of constraints on the amino acids they need to encode. The algorithms described are implemented in a program called Cosmo, available at http://bio.cs.washington.edu. We ran Cosmo on several genes known to contain exonic splicing enhancers and report the results.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
M. Blanchette, B. Schwikowski, and M. Tompa. Algorithms for phylogenetic footprinting. Journal of Computational Biology, 9(2):211--223, 2002.
 
3
M. Blanchette and M. Tompa. Footprinter web site. http://bio.cs.washington.edu/FootPrinterResults.
 
4
M. Blanchette and M. Tompa. Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Research, 12:739--748, 2002.
 
5
B. J. Blencowe. Exonic splicing enhancers: mechanisms of action, diversity and role in human genetic diseases. TIBS, 25:106--110, 2000.
 
6
L. Cartegni, S. L. Chew, and A. Krainer. Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nature Genetics, 3:285--298, 2002.
 
7
S. Chew, H. Liu, A. Mayeda, and A. Krainer. Evidence for the function of an exonic splicing enhancer after the first catalytic step of pre-mRNA splicing. Proc. Natl. Acad. Sci. USA, 96:10655--10660, 1999.
 
8
W. Dirksen, X. Li, A. Mayeda, A. Krainer, and F. Rottman. Mapping the sf2/asf binding sites in the bovine growth hormone exonic splicing enhancer. J. of Biological Chemistry, 275:37:29170--29177, 2000.
 
9
L. Duret and P. Bucher. Searching for regulatory elements in human noncoding sequences. Current Opinions in Structural Biology, 7:399--405, 1997.
 
10
S. Eddy and R. Durbin. RNA sequence analysis using covariance models. Nucleic Acids Research, 22, 11:2079--2088, 1994.
 
11
Eukaryotic gene orthologs database. http://www.tigr.org/tdb/tgi/ego, 2002.
 
12
W. Fairbrother, R. Yeh, P. Sharp, and C. Burge. Predictive identification of exonic splicing enhancers in human genes. Science, 297:1007--1013, 2002.
 
13
A. Fedorov, S. Saxonov, L. Fedorova, and I. Daizadeh. Comparison of intron-containing and intron-lacking human genes elucidates putative exonic splicing enhancers. Nucleic Acids Research, 29(7):1464--1469, 2001.
 
14
J. Felsenstein. Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution, 17:368--376, 1981.
 
15
J. Felsenstein. Phylip - phylogeny inference package (version 3.2). Cladistics, 5:164--166, 1989.
 
16
N. Goldman and Z. Yang. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol., 11(5):725--736, 1994.
 
17
J. Gorodkin, L. J. Heyer, and G. Stormo. Finding the most significant common sequence and structure motifs in a set of RNA sequences. Nucleic Acids Research, 25:3724--3732, 1997.
 
18
E. Green and NIH intramural sequencing center. Comparative vertebrate sequencing project, 2002. http://www.nisc.nih.gov.
 
19
M. Hastings and A. Krainer. Pre-mRNA splicing in the new millenium. Current Opinion in Cell Biology, 13:302--309, 2001.
 
20
S. Karlin and H. Taylor. A first course in stochastic processes. Academic Press, second edition, 1975.
 
21
M. Krawczak, E. Ball, I. Fenton, P. Stenson, S. Abeysinghe, N. Thomas, and D. Cooper. Human gene mutation database - a biomedical information and research resource. Human Mutation, 15(1):45--51, 2000.
 
22
C. E. Lawrence, S. F. Altschul, M. S. Boguski, J. S. Liu, A. F. Neuwald, and J. C. Wootton. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 262:208--214, 1993.
 
23
L. Lim and C. Burge. A computational analysis of sequence features involved in recognition of short introns. Proc. Natl. Acad. Sci., 98(20):11193--11198, 2001.
 
24
C. Lin and R. Tam. Transcriptional regulation of cd28 expression by cd28gr, a novel promoter element located in exon 1 of the cd28 gene. Journal of Immunology, 166(10):6134--43, 2001.
 
25
H. Liu, L. Cartegni, M. Zhang, and A. Krainer. A mechanism for exon skipping caused by nonsense or missense mutation in BRCA1 and other genes. Nature Genetics, 27:55--58, 2001.
 
26
D. R. Maddisson. The tree of life web project. http://tolweb.org, 2002.
 
27
L. McCue, W. Thompson, C. Carmack, M. Ryan, J. Liu, V. Derbyshire, and C. Lawrence. Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Research, 29(3):774--782, 2001.
 
28
B. Modrek and C. Lee. A genomic view of alternative splicing. Nature Genetics, 30(1):13--19, 2002.
 
29
B. Morgenstern. DIALIGN 2: Improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics, 15(3):211--218, 1999.
 
30
Y. Nakamura, T. Gojoburi, and T. Ikemura. Codon usage tabulated from international DNA sequence database: status for the year 2000. Nucleic Acids Research, 28:292, 2000.
 
31
D. D. Sankoff. Minimal mutation trees of sequences. SIAM Journal on Applied Mathematics, 28:35--42, 1975.
 
32
O. Stoss, P. Stoilov, R. Daoud, A. Hartmann, M. Olbrich, and S. Stamm. Misregulation of pre-mRNA splicing that causes human diseases. concepts and therapeutic strategy. Gene therapy and Molecular Biology, 5:9--30, 2000.
 
33
D. Tagle, B. Koop, M. Goodman, J. Slightom, D. Hess, and R. Jones. Embryonic ε and γ globin genes of a prosimian primate (Galago crassicaudatus) nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. Journal of Molecular Biology, 203:439--455, 1988.
 
34
M. Tu, W. Tong, R. Perkins, and C. Valentine. Predicted changes in pre-mRNA secondary structure vary in their association with exon skipping for mutations in exons 2, 4, and 8 of the hprt gene and exon 51 of the fibrillin gene. Mutation Research Genomics, 432:15--32, 2000.
 
35
D. Wheeler, C. Chappey, A. Lash, D. Leipe, T. Madden, G. Schuler, T. Tatusova, and B. Rapp. Database resources of the national center for biotechnology information. Nucleic Acids Research, Jan 1;28(1):10--4, 2000.
 
36
Z. Yang. PAML: a program package for phylogenetic analysis by maximum likelihood. CABIOS, 13:555--556, 1997.