ACM Home Page
Please provide us with feedback. Feedback
SpeedHap: An Accurate Heuristic for the Single Individual SNP Haplotyping Problem with Many Gaps, High Reading Error Rate and Low Coverage
Full text Publisher SitePublisher Site PdfPdf (1.70 MB)
Source IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) archive
Volume 5 ,  Issue 4  (October 2008) table of contents
Pages 492-502  
Year of Publication: 2008
ISSN:1545-5963
Authors
Loredana M. Genovese  IIT-CNR, Pisa
Filippo Geraci  IIT-CNR, Pisa
Marco Pellegrini  IIT-CNR, Pisa
Publisher
IEEE Computer Society Press  Los Alamitos, CA, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 57,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: 10.1109/TCBB.2008.67

ABSTRACT

Single nucleotide polymorphism (SNP) is the most frequent form of DNA variation. The set of SNP's present in a chromosome (called the em haplotype) is of interest in a wide area of applications in molecular biology and biomedicine, including diagnostic and medical therapy. In this paper we propose a new heuristic method for the problem of haplotype reconstruction for (portions of) a pair of homologous human chromosomes from a single individual (SIH). The problem is well known in literature and exact algorithms have been proposed for the case when no (or few) gaps are allowed in the input fragments. These algorithms, though exact and of polynomial complexity, are slow in practice. When gaps are considered no exact method of polynomial complexity is known. The problem is also hard to approximate with guarantees. Therefore fast heuristics have been proposed. In this paper we describe SpeedHap, a new heuristic method that is able to tackle the case of many gapped fragments and retains its effectiveness even when the input fragments have high rate of reading errors (up to 20%) and low coverage (as low as 3). We test SpeedHap on real data from the HapMap Project.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
 
4
R. Cilibrasi, L. van Iersel, S. Kelk, and J. Tromp, "On the Complexity of Several Haplotyping Problems," Proc. Fifth Int'l Workshop Algorithms in Bioinformatics (WABI '05), pp. 128-139, 2005.
 
5
 
6
The Int'l HapMap Consortium, "A Haplotype Map of the Human Genome" Nature, vol. 437, pp. 1299-1320, 2005.
 
7
M.J. Daly, J.D. Rioux, S.F. Schaffner, T.J. Hudson, and E.S. Lander, "High-Resolution Haplotype Structure in the Human Genome," Nature Genetics, vol. 29, pp. 229-232, 2001.
 
8
 
9
Y. Guo and D.C. Jamison, "The Distribution of SNPS in Human Gene Regulatory Regions," BMC Genomics, vol. 6, no. 140, 2005.
 
10
D. Gusfield and S.H. Orzack, "Haplotype Inference," CRC Handbook on Bioinformatics, chapter 1, pp. 1-25, CRC Press, 2005.
 
11
C.-G. Hur, S. Kim, C.H. Kim, S.H. Yoon, Y.-H. In, C. Kim, and H.G. Cho, "Fasim: Fragments Assembly Simulation Using Biased-Sampling Model and Assembly Simulation for Microbial Genome Shotgun Sequencing," J. Microbiology and Biotechnology, vol. 16, no. 5, 2006.
 
12
X. Ke, S. Hunt, W. Tapper, R. Lawrence, G. Stavrides, J. Ghori, P. Whittaker, A. Collins, A.P. Morris, D. Bentley, L.R. Cardon, and P. Deloukas, "The Impact of SNP Density on Fine-Scale Patterns of Linkage Disequilibrium," Human Molecular Genetics, vol. 13, no. 6, pp. 577-588, 2004.
 
13
 
14
E.S. Lander and M.S. Waterman, "Genomic Mapping by Fingerprinting Random Clones: A Mathematical Analysis," Genomics, vol. 2, pp. 231-239, 1988.
15
 
16
L.K. Matukumalli, J.J. Grefenstette, D.L. Hyten, I.-Y. Choi, P.B. Cregan, and C.P. Van Tassell, "Application of Machine Learning in SNP Discovery," BMC Bioinformatics, vol. 7, no. 4, 2006.
 
17
 
18
19
 
20
A. Panconesi and M. Sozio, "Fast Hare: A Fast Heuristic for Single Individual SNP Haplotype Reconstruction," Proc. Fourth Int'l Workshop Algorithms in Bioinformatics (WABI '04), pp. 266-277, 2004.
 
21
J.K. Pritchard and M. Przeworski, "Linkage Disequilibrium in Humans: Models and Data," Am. J. Human Genetics, vol. 69, pp. 1-14, 2001.
 
22
R. Sachidanandam et al., "A Map of Human Genome Sequence Variation Containing 1.42 Million Single Nucleotide Polymorphisms," Nature, vol. 409, pp. 928-933, Feb. 2001.
 
23
K.A. Frazer et al., "A Second Generation Human Haplotype Map of over 3.1 Million SNPs," Nature, vol. 449, pp. 851-861, Oct. 2007.
 
24
 
25
J.C. Roach, C. Boysen, K. Wang, and L. Hood, "Pairwise End Sequencing: A Unified Approach to Genomic Mapping and Sequencing," Genomics, vol. 26, no. 2, pp. 345-353, 1995.
 
26
L.-Y. Wu, R.-S. Wang, X.-S. Zhang, and L. Chen, "A Markov Chain Model for Haplotype Assembly from SNP Fragments," Genome Informatics, vol. 17, no. 2, pp. 162-171, 2006.
 
27
 
28
 
29
M.P. Weiner and T.J. Hudson, "Introduction to SNPs: Discovery of Markers for Disease," Biotechniques, suppl., 2002.
 
30
Y.-Y. Zhao, L.-Y. Wu, J.-H. Zhang, R.-S. Wang, and X.-S. Zhang, "Haplotype Assembly from Aligned Weighted SNP Fragments," Computational Biology and Chemistry, vol. 29, no. 4, pp. 281-287, 2005.

Collaborative Colleagues:
Loredana M. Genovese: colleagues
Filippo Geraci: colleagues
Marco Pellegrini: colleagues