| Creating regular expressions as mRNA motifs with GP to predict human exon splitting |
| Full text |
Pdf
(413 KB)
|
Source
|
Genetic And Evolutionary Computation Conference
archive
Proceedings of the 11th Annual conference on Genetic and evolutionary computation
table of contents
Montreal, Québec, Canada
POSTER SESSION: Track 3: bioinformatics and computational biology
table of contents
Pages: 1789-1790
Year of Publication: 2009
ISBN:978-1-60558-325-9
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 8, Downloads (12 Months): 41, Citation Count: 0
|
|
|
ABSTRACT
RNAnet [3] http://bioinformatics.essex.ac.uk/users/wlangdon/rnanet/ allows the user to calculate correlations of gene expression, both between genes and between components within genes. We investigate all of Ensembl http://www.ensembl.org and find all the Homo Sapiens exons for which there are sufficient robust Affymetrix HG-U133 Plus 2 GeneChip probes. Calculating correlation between mRNA probe measurements for the same exon shows many exons whose components are consistently up regulated and down regulated. However we identify other Ensembl exons where sub-regions within them are self consistent but these transcript blocks are not well correlated with other blocks in the same exon. We suggest many current Ensembl exon definitions are incomplete. Secondly, having identified exon with substructure we use machine learning to try and identify patterns in the DNA sequence lying between blocks of high correlation which might yield biological or technological explanations. A Backus-Naur form (BNF) context-free grammar constrains strongly typed genetic programming (STGP) to evolve biological motifs in the form of regular expressions (RE) (e.g. TCTTT) which classify gene exons with potential alternative mRNA expression from those without. We show biological patterns can be data mined by a GP written in gawk and using egrep from NCBI's GEO http://www.ncbi.nlm.nih.gov/geo/ database. The automatically produced DNA motifs suggest that alternative polyadenylation is not responsible. (Full version in TR-09-02 [7].) Blocky exons can be found in http://bioinformatics.essex.ac.uk/users/wlangdon/tr-09-02.tar.gz
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Langdon, W. B. Evolving GeneChip correlation predictors on parallel graphics hardware. In 2008 IEEE World Congress on Computational Intelligence (Hong Kong, 1-6 June 2008), J. Wang, Ed., IEEE Computational Intelligence Society, IEEE Press, pp. 4152--4157.
|
| |
3
|
Langdon, W. B. A map of human gene expression. Tech. Rep. CES-486, Departments of Mathematical, Biological Sciences and Computing and Electronic Systems, University of Essex, Colchester, CO4 3SQ, UK, July 2008.
|
| |
4
|
Langdon, W. B., and Harrison, A. P. Evolving DNA motifs to predict GeneChip probe performance. Algorithms in Molecular Biology. In press.
|
| |
5
|
Langdon, W. B., McKay, R. I., and Spector, L. Genetic programming. In Handbook of Metaheuristics, J.-Y. Potvin and M. Gendreau, Eds., second ed. Springer, ch. 7.
|
| |
6
|
|
| |
7
|
Creating regular expressions as mRNA motifs with GP to predict human exon splitting. Tech. Rep. TR-09-02, Department of Computer Science, Crest Centre, King's College, London, Strand, London, WC2R 2LS, UK, 19 Mar. 2009.
|
| |
8
|
Langdon, W. B., Upton, G. J. G., da Silva Camargo, R., and Harrison, A. P. A survey of spatial defects in Homo Sapiens Affymetrix GeneChips. IEEE/ACM Transactions on Computational Biology and Bioinformatics (2009). In press.
|
| |
9
|
Poli, R., Langdon, W. B., and McPhee, N. F. A field guide to genetic programming. Published via http://lulu.com and freely available at http://www.gp-field-guide.org.uk, 2008. (With contributions by J. R. Koza).
|
| |
10
|
Retelska, D., et al. Similarities and differences of polyadenylation signals in human and fly. BMC Genomics 7, 1 (2006), 176.
|
| |
11
|
Sanchez-Graillet, O., Rowsell, J., Langdon, W. B., Stalteri, M. A., Arteaga Salas, J. M., Upton, G. J., and Harrison, A. P. Widespread existence of uncorrelated probe intensities from within the same probeset on Affymetrix GeneChips. Journal of Integrative Bioinformatics 5, 2 (2008), 98.
|
INDEX TERMS
Primary Classification:
J.
Computer Applications
J.3
LIFE AND MEDICAL SCIENCES
Subjects:
Biology and genetics
General Terms:
Experimentation
Keywords:
HDONA,
affymetrix genechip,
alternative splicing,
alternative splicing of homosapiens exons,
bioinformatics,
biological interpretation of computer generated motifs,
gene expression and regulation,
genetic algorithms,
genetic programming,
grammar,
integration of genetic programming into bioinformatics,
microarray analysis,
regular expression,
strongly typed genetic programming
|