|
ABSTRACT
Previous divide-and-conquer segmentation analyses of DNA sequences do not provide a satisfactory stopping criterion for the recursion. This paper proposes that segmentation be considered as a model selection process. Using the tools in model selection, a limit for the stopping criterion on the relaxed end can be determined. The Bayesian information criterion, in particular, provides a much more stringent stopping criterion than what is currently used. Such a stringent criterion can be used to delineate larger DNA domains. A relationship between the stopping criterion and the average domain size is empirically determined, which may aid in the determination of isochore borders.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
MD Adams, et al. (2000), "The genome sequence of Drosophila melanogaster", Science, 287:2185-2195.
|
| |
2
|
H Akaike (1974), "A new look at the statistical model identification", IEEE Transactions on Automatic Control, 19:716-723.
|
| |
3
|
JO Berger, DA Berry (1988), "Analyzing data: is objectivity possible?", American Scientist, 76:159-165.
|
| |
4
|
P Bernaola-Galvan, R Roman-Roldan, JL Oliver (1996), "Compositional segmentation and long-range fractal correlations in DNA sequences", Physical Review E, 53(5):5181-5189.
|
| |
5
|
P Bernaola-Galvan, P Carpena, R Roman-Roldan, JL Oliver (1999), "Decomposition of DNA sequence complexity", Physical Review Letters, 83(16):3336-3339.
|
| |
6
|
P Bernaola-Galvan, I Grosse, P Carpena, JL Oliver, R Roman-Roldan, HE Stanley (2000), "Finding borders between coding and noncoding DNA regions by an entropic segmentation method", Physical Review Letters, 85:1342-1345.
|
| |
7
|
G Bernardi (1989), "The isochore organization of the human genome", Annual Review of Genetics, 23:637-661.
|
| |
8
|
G Bernardi (1995), "The human genome: organization and evolutionary history", Annual Review of Genetics, 29:445-476.
|
| |
9
|
JV Braun, HG M. uller (1998), "Statistical methods for DNA sequence segmentation", Statistical Science, 13(2):142-162.
|
| |
10
|
L Breiman, JH Friedman, RA Olshen, CJ Stone (1984), Classification and Regression Trees (Wadsworth).
|
| |
11
|
BE Brodsky, BS Darkhovsky (1993), Nonparametric Methods in Change Point Problems (Kluwer Academic).
|
| |
12
|
KP Burnham, DR Anderson (1998), Model Selection and Inference (Springer).
|
| |
13
|
E Carlstein, HG M. uller, D Siegmund (1994), eds. Change-Point Problems (IMS).
|
| |
14
|
J Chen, AK Gupta (2000), Parametric Statistical Change Point Analysis (Birkhauser).
|
| |
15
|
GA Churchill (1989), "Stochastic models for heterogeneous DNA sequences", Bulletin of Mathematical Biology, 51:79-94.
|
| |
16
|
DR Cox, DV Hinkley (1974), Theoretical Statistics (Chapman & Hill).
|
| |
17
|
AWF Edwards (1972), Likelihood (Cambridge Univ Press).
|
| |
18
|
RA Elton (1974), "Theoretical models for heterogeneity of base composition in DNA", Journal of Theoretical Biology, 45:533-553.
|
| |
19
|
CM Hurvich, CL Tsai (1989), "Regression and time series model selection in small samples", Biometrika, 76:297-307.
|
| |
20
|
S Kullback, RA Leibler (1951), "On information and sufficiency", Annals of Mathematical Statistics, 22:79-86.
|
| |
21
|
|
| |
22
|
W Li (1997), "The study of correlation structures of DNA sequences - a critical review", Computer & Chemistry, 21(4):257-272.
|
| |
23
|
|
| |
24
|
W Li (2001), "New stopping criteria for segmenting DNA sequences", preprint.
|
| |
25
|
|
| |
26
|
J Lin (1991), "Divergence measures based on the Shannon entropy", IEEE Transactions on Information Theory, 37(1):145-151.
|
| |
27
|
MHC sequencing consortium (1999), "Complete sequence and gene map of a human major histocompatibility complex", Nature, 401:921-923.
|
| |
28
|
JL Oliver, R Roman-Roldan, J Perez, P Bernaola-Galvan (1999), "SEGMENT: identifying compositional domains in DNA sequence", Bioinformatics, 15(12):974-979.
|
| |
29
|
E Parzen, K Tanabe, G Kitagawa (1998), eds. Selected Papers of Hirotugu Akaike (Springer).
|
| |
30
|
|
| |
31
|
|
| |
32
|
AE Raftery (1995), "Bayesian model selection in social research", in Sociological Methodology, ed. PV Marsden (Blackwells), pp.185-195.
|
| |
33
|
VE Ramensky, V Ju Markeev, MA Roytberg, VG Tumanyan (2000), "DNA segmentation through the Bayesian approach", Journal of Computational Biology, 7(1-2):215-231.
|
| |
34
|
R Roman-Roldan, P Bernaola-Galvan, JL Oliver (1998), "Sequence compositional complexity of DNA through an entropic segmentation method", Physical Review Letters, 80(6):1344-1347.
|
| |
35
|
F. Sanger, et al. (1982), "Nucleotide sequence of bacteriophage - DNA," Journal of Molecular Biology, 162:729-773.
|
| |
36
|
G Schwarz (1978), "Estimating the dimension of a model", Annals of Statistics, 6:461-464.
|
| |
37
|
N Sugiura (1978), "Further analysis of the data by Akaike's information criterion and the finite corrections", Communications in Statistics, Theory and Methods, A7:13-26.
|
| |
38
|
H Zhang, B Singer (1999), Recursive Partitioning in the Health Sciences (Springer)
|
CITED BY 7
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Alessandro Perina , Marco Cristani , Luciano Xumerle , Vittorio Murino , Pier Franco Pignatti , Giovanni Malerba, Fully non-homogeneous hidden Markov model double net: A generative model for haplotype reconstruction and block discovery, Arificial Intelligence in Medicine, v.45 n.2-3, p.135-150, February, 2009
|
|
|
|
|