|
ABSTRACT
We construct several score functions for use in locating unusually conserved regions in a genome-wide search of aligned DNA from two species. We test these functions on regions of the human genome aligned to the mouse genome. These score functions are derived from properties of neutrally evolving sites on the mouse and human genome, and can be adjusted to the local background rate of conservation. The aim of these functions is to try to identify regions of the human genome that are conserved by evolutionary selection, because they have an important function, rather than by chance. We use them to get a very rough estimate of the amount of DNA in the human genome that is under selection.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
S. Altschul, W. Gish, W. Miller, E. Myers, and D. Lipman. Basic local alignment search tool. J. Mol. Biol., 215:403--410, 1990.
|
| |
2
|
R. D. Blake, S. T. Hess, and J. Nicholson-Tuell. The influence of nearest neighbors on the rate and pattern of spontaneous point mutations. J. Mol. Evol., 34:189--200, 1992.
|
| |
3
|
|
| |
4
|
R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, 1998.
|
| |
5
|
L. Elnitski, R. C. Hardison, J. Li, S. Yang, D. Kolbe, P. Eswara, M. J. O'Connor, S. Schwartz, W. Miller, , and F. Chiaromonte. Distinguishing regulatory dna from neutral sites. Genome Research, 13:64--72, 2003.
|
| |
6
|
J. Felsenstein and G. A. Churchill. A hidden markov model approach to variation among sites in rate of evolution. Mol. Biol. Evol., 13(1):93--104, 1996.
|
| |
7
|
J. Huelsenbeck and B. Rannala. Phylogenetic methods come of age: testing hypotheses in an evolutionary context. Science, 276:227--231, 1997.
|
| |
8
|
International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature, 409:860--921, 2001.
|
| |
9
|
G. Matissi, P. M. Sharp, and C. Gautier. Chromosomal location effects of gene evolution in mammals. Current Biology, 9:786--791, 1999.
|
| |
10
|
B. R. Morton. The influence of neighboring base composition on substitutions in plant chloroplast coding sequences. Mol. Biol. Evol., 14(2):189--194, 1997.
|
| |
11
|
Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature, 420:520--562, 2002.
|
| |
12
|
K. D. Pruitt and D. R. Maglott. RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Research, 29(1):137--140, Jan. 2001.
|
| |
13
|
J. A. Rice. Mathematical Statistics and Data Analysis. Duxbury Press, 2nd edition, June 1994.
|
| |
14
|
K. M. Roskin, M. Diekhans, W. J. Kent, and D. Haussler. Score functions for assessing conservation in locally aligned regions of DNA from two species. Technical Report UCSC-CRL-02-30, University of California---Santa Cruz, 2002.
|
| |
15
|
S. Schwartz, W. J. Kent, A. Smit, Z. Zhang, R. Baertsch, R. Hardison, D. Haussler, and W. Miller. Human-mouse alignments with BLASTZ. Genome Research, 13:103--107, 2003.
|
 |
16
|
|
| |
17
|
D. Weaver, C. Workman, and G. Stormo. Modeling regulatory networks with weight matrices, 1999.
|
| |
18
|
Z. Yang. Among-site variation and its impact on phylogenetic analysis. Tree, 11(9):367--371, 1996.
|
INDEX TERMS
Primary Classification:
J.
Computer Applications
J.3
LIFE AND MEDICAL SCIENCES
Subjects:
Biology and genetics
Additional Classification:
I.
Computing Methodologies
I.5
PATTERN RECOGNITION
I.5.1
Models
Subjects:
Statistical
General Terms:
Algorithms,
Design,
Performance
Keywords:
CpG effect,
ancestral repeat,
comparative genomics,
context-dependent base substitutions,
dinucleotide dependence,
evolutionary models,
fraction of human genome under selection,
mouse-human alignments,
mutual information,
neutral evolution
|