ACM Home Page
Please provide us with feedback. Feedback
DNA segmentation as a model selection process
Full text PdfPdf (171 KB)
Source Annual Conference on Research in Computational Molecular Biology archive
Proceedings of the fifth annual international conference on Computational biology table of contents
Montreal, Quebec, Canada
Pages: 204 - 210  
Year of Publication: 2001
ISBN:1-58113-353-7
Author
Wentian Li  Laboratory of Statistical Genetics, The Rockefeller University, Box 192, New York, NY
Sponsor
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 29,   Citation Count: 7
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/369133.369202
What is a DOI?

ABSTRACT

Previous divide-and-conquer segmentation analyses of DNA sequences do not provide a satisfactory stopping criterion for the recursion. This paper proposes that segmentation be considered as a model selection process. Using the tools in model selection, a limit for the stopping criterion on the relaxed end can be determined. The Bayesian information criterion, in particular, provides a much more stringent stopping criterion than what is currently used. Such a stringent criterion can be used to delineate larger DNA domains. A relationship between the stopping criterion and the average domain size is empirically determined, which may aid in the determination of isochore borders.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
MD Adams, et al. (2000), "The genome sequence of Drosophila melanogaster", Science, 287:2185-2195.
 
2
H Akaike (1974), "A new look at the statistical model identification", IEEE Transactions on Automatic Control, 19:716-723.
 
3
JO Berger, DA Berry (1988), "Analyzing data: is objectivity possible?", American Scientist, 76:159-165.
 
4
P Bernaola-Galvan, R Roman-Roldan, JL Oliver (1996), "Compositional segmentation and long-range fractal correlations in DNA sequences", Physical Review E, 53(5):5181-5189.
 
5
P Bernaola-Galvan, P Carpena, R Roman-Roldan, JL Oliver (1999), "Decomposition of DNA sequence complexity", Physical Review Letters, 83(16):3336-3339.
 
6
P Bernaola-Galvan, I Grosse, P Carpena, JL Oliver, R Roman-Roldan, HE Stanley (2000), "Finding borders between coding and noncoding DNA regions by an entropic segmentation method", Physical Review Letters, 85:1342-1345.
 
7
G Bernardi (1989), "The isochore organization of the human genome", Annual Review of Genetics, 23:637-661.
 
8
G Bernardi (1995), "The human genome: organization and evolutionary history", Annual Review of Genetics, 29:445-476.
 
9
JV Braun, HG M. uller (1998), "Statistical methods for DNA sequence segmentation", Statistical Science, 13(2):142-162.
 
10
L Breiman, JH Friedman, RA Olshen, CJ Stone (1984), Classification and Regression Trees (Wadsworth).
 
11
BE Brodsky, BS Darkhovsky (1993), Nonparametric Methods in Change Point Problems (Kluwer Academic).
 
12
KP Burnham, DR Anderson (1998), Model Selection and Inference (Springer).
 
13
E Carlstein, HG M. uller, D Siegmund (1994), eds. Change-Point Problems (IMS).
 
14
J Chen, AK Gupta (2000), Parametric Statistical Change Point Analysis (Birkhauser).
 
15
GA Churchill (1989), "Stochastic models for heterogeneous DNA sequences", Bulletin of Mathematical Biology, 51:79-94.
 
16
DR Cox, DV Hinkley (1974), Theoretical Statistics (Chapman & Hill).
 
17
AWF Edwards (1972), Likelihood (Cambridge Univ Press).
 
18
RA Elton (1974), "Theoretical models for heterogeneity of base composition in DNA", Journal of Theoretical Biology, 45:533-553.
 
19
CM Hurvich, CL Tsai (1989), "Regression and time series model selection in small samples", Biometrika, 76:297-307.
 
20
S Kullback, RA Leibler (1951), "On information and sufficiency", Annals of Mathematical Statistics, 22:79-86.
 
21
 
22
W Li (1997), "The study of correlation structures of DNA sequences - a critical review", Computer & Chemistry, 21(4):257-272.
 
23
 
24
W Li (2001), "New stopping criteria for segmenting DNA sequences", preprint.
 
25
 
26
J Lin (1991), "Divergence measures based on the Shannon entropy", IEEE Transactions on Information Theory, 37(1):145-151.
 
27
MHC sequencing consortium (1999), "Complete sequence and gene map of a human major histocompatibility complex", Nature, 401:921-923.
 
28
JL Oliver, R Roman-Roldan, J Perez, P Bernaola-Galvan (1999), "SEGMENT: identifying compositional domains in DNA sequence", Bioinformatics, 15(12):974-979.
 
29
E Parzen, K Tanabe, G Kitagawa (1998), eds. Selected Papers of Hirotugu Akaike (Springer).
 
30
 
31
 
32
AE Raftery (1995), "Bayesian model selection in social research", in Sociological Methodology, ed. PV Marsden (Blackwells), pp.185-195.
 
33
VE Ramensky, V Ju Markeev, MA Roytberg, VG Tumanyan (2000), "DNA segmentation through the Bayesian approach", Journal of Computational Biology, 7(1-2):215-231.
 
34
R Roman-Roldan, P Bernaola-Galvan, JL Oliver (1998), "Sequence compositional complexity of DNA through an entropic segmentation method", Physical Review Letters, 80(6):1344-1347.
 
35
F. Sanger, et al. (1982), "Nucleotide sequence of bacteriophage - DNA," Journal of Molecular Biology, 162:729-773.
 
36
G Schwarz (1978), "Estimating the dimension of a model", Annals of Statistics, 6:461-464.
 
37
N Sugiura (1978), "Further analysis of the data by Akaike's information criterion and the finite corrections", Communications in Statistics, Theory and Methods, A7:13-26.
 
38
H Zhang, B Singer (1999), Recursive Partitioning in the Health Sciences (Springer)

CITED BY  7