ACM Home Page
Please provide us with feedback. Feedback
Viability of in-house datamarting approaches for population genetics analysis of snp genotypes
Full text PdfPdf (296 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 2nd international workshop on Data and text mining in bioinformatics table of contents
Napa Valley, California, USA
SESSION: Short papers table of contents
Pages 69-72  
Year of Publication: 2008
ISBN:978-1-60558-251-1
Authors
Jorge Amigo  University of Santiago de Compostela, Santiago de Compostela, Spain
Christopher Phillips  University of Santiago de Compostela, Santiago de Compostela, Spain
Antonio Salas  University of Santiago de Compostela, Santiago de Compostela, Spain
Sponsors
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 39,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458449.1458465
What is a DOI?

ABSTRACT

Databases containing very large amounts of SNP (Single Nucleotide Polymorphism) data are now freely available for researchers interested in medical or population genetics applications. While many of these SNP repositories have implemented data retrieval tools for general purpose mining, these alone cannot cover the broad spectrum of needs of most medical and population genetics studies. To address this limitation, we propose building in-house customized data marts from the raw data provided by the largest public databases. In particular, for population genetics analysis based on genotypes we propose building a set of data processing scripts that would deal with raw data coming from the major SNP variation databases (e.g. HapMap, Perlegen) that can be stripped into single genotypes and then grouped into populations. This allows not only in-house standardization and normalization of the genotyping data retrieved from different repositories, but also the calculation of statistical indices from simple allele frequency estimates up to elaborate genetic differentiation tests within populations, together with the ability to combine population samples from different databases. This article describes the viability of implementing scripts for handling extensive datasets of SNP genotypes with low computational costs, and shows that the updating of these data marts is straightforward, permitting easy implementation of new external data and the computation of new statistical indices.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
McNamee, L. A., Launsby, B. D., Frisse, M. E., Lehmann, R. and Ebker, K. Scaling an expert system data mart: more facilities in real-time. Proc AMIA Symp1998), 498--502.
 
2
Arnrich, B., Walter, J., Albert, A., Ennker, J. and Ritter, H. Data mart based research in heart surgery: challenges and benefit. Stud Health Technol Inform, 107, Pt 1 2004), 8--12.
 
3
Phillips, C. Online resources for SNP analysis: a review and route map. Mol Biotechnol, 35, 1 (Jan 2007), 65--97.
 
4
Rosenberg, N. A. Standardized subsets of the HGDP-CEPH Human Genome Diversity Cell Line Panel, accounting for atypical and duplicated samples and pairs of close relatives. Ann Hum Genet, 70, Pt 6 (Nov 2006), 841--847.
 
5
Smith, M. W. and O'Brien, S. J. Mapping by admixture linkage disequilibrium: advances, limitations and guidelines. Nat Rev Genet, 6, 8 (Aug 2005), 623--632.
 
6
Dougherty, D. Sed & awk. O'Reilly, Sebastopol, CA, 1990.
 
7
Apostólico, A. and Galil, Z. Pattern matching algorithms. Oxford University Press, New York, 1997.
 
8
Isken, M. W., Littig, S. J. and West, M. A data mart for operations analysis. J Healthc Inf Manag, 15, 2 (Summer 2001), 143--153.
 
9
The International HapMap Consortium, A haplotype map of the human genome. Nature, 437, 7063 (Oct 27 2005), 1299--1320.
 
10
Thorisson, G. A., Smith, A. V., Krishnan, L. and Stein, L. D. The International HapMap Project Web site. Genome Res, 15, 11 (Nov 2005), 1592--1593.
 
11
Peacock, E. and Whiteley, P. Perlegen sciences, inc. Pharmacogenomics, 6, 4 (Jun 2005), 439--442.
 
12
Cann, H. M., de Toma, C., Cazes, L., Legrand, M. F., Morel, V., Piouffre, L., Bodmer, J., Bodmer, W. F., Bonne-Tamir, B., Cambon-Thomsen, A., Chen, Z., Chu, J., Carcassi, C., Contu, L., Du, R., Excoffier, L., Ferrara, G. B., Friedlaender, J. S., Groot, H., Gurwitz, D., Jenkins, T., Herrera, R. J., Huang, X., Kidd, J., Kidd, K. K., Langaney, A., Lin, A. A., Mehdi, S. Q., Parham, P., Piazza, A., Pistillo, M. P., Qian, Y., Shu, Q., Xu, J., Zhu, S., Weber, J. L., Greely, H. T., Feldman, M. W., Thomas, G., Dausset, J. and Cavalli-Sforza, L. L. A human genome diversity cell line panel. Science, 296, 5566 (Apr 12 2002), 261--262.
 
13
Pritchard, J. K., Stephens, M. and Donnelly, P. Inference of population structure using multilocus genotype data. Genetics, 155, 2 (Jun 2000), 945--959.

Collaborative Colleagues:
Jorge Amigo: colleagues
Christopher Phillips: colleagues
Antonio Salas: colleagues