ACM Home Page
Please provide us with feedback. Feedback
Operator equalisation, bloat and overfitting: a study on human oral bioavailability prediction
Full text PdfPdf (489 KB)
Source
Genetic And Evolutionary Computation Conference archive
Proceedings of the 11th Annual conference on Genetic and evolutionary computation table of contents
Montreal, Québec, Canada
SESSION: Track 10: genetic programming table of contents
Pages 1115-1122  
Year of Publication: 2009
ISBN:978-1-60558-325-9
Authors
Sara Silva  University of Coimbra, Coimbra, Portugal
Leonardo Vanneschi  University of Milano-Bicocca, Milan, Italy
Sponsors
SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 23,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1569901.1570051
What is a DOI?

ABSTRACT

Operator equalisation was recently proposed as a new bloat control technique for genetic programming. By controlling the distribution of program lengths inside the population, it can bias the search towards smaller or larger programs. In this paper we propose a new implementation of operator equalisation and compare it to a previous version, using a hard real-world regression problem where bloat and overfitting are major issues. The results show that both implementations of operator equalisation are completely bloat-free, producing smaller individuals than standard genetic programming, without compromising the generalization ability. We also show that the new implementation of operator equalisation is more efficient and exhibits a more predictable and reliable behavior than the previous version. We advance some arguable ideas regarding the relationship between bloat and overfitting, and support them with our results.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
S. Dignum and R. Poli. Crossover, sampling, bloat and the harmful effects of size limits. In M. O'Neill, et al., editors, Proceedings of the 11th European Conference on Genetic Programming, EuroGP 2008, volume 4971 of Lecture Notes in Computer Science, pages 158--169, Naples, 26-28 Mar. 2008. Springer.
 
4
S. Dignum and R. Poli. Operator equalisation and bloat free GP. In M. O'Neill, et al., editors, Proceedings of the 11th European Conference on Genetic Programming, EuroGP 2008, volume 4971 of Lecture Notes in Computer Science, pages 110--121, Naples, 26--28 Mar. 2008. Springer.
5
 
6
F. Yoshida and J. G. Topliss. QSAR model for drug human oral bioavailability. Journal of Medicinal Chemistry, 43:2575--2585, 2000.
 
7
H. Van de Waterbeemd and S. Rose. In The Practice of Medicinal Chemistry, 2nd edition. ed. Wermuth, L. G., 1367--1385,Academic Press, 2003.
 
8
I. Kola and J. Landis. Can the pharmaceutical industry reduce attrition rates? Nature Reviews Drug Discovery, 3:711--716, 2004.
 
9
C. Igel and K. Chellapilla. Investigating the influence of depth and degree of genotypic change on fitness in genetic programming. In W. Banzhaf, et al., editors, Proceedings of the Genetic and Evolutionary Computation Conference, volume 2, pages 1061--1068, Orlando, Florida, USA, 13--17 July 1999. Morgan Kaufmann.
 
10
 
11
 
12
 
13
 
14
 
15
R. Poli, W. B. Langdon, and S. Dignum. On the limiting distribution of program sizes in tree-based genetic programming. In M. Ebner, et al., editors, Proceedings of the 10th European Conference on Genetic Programming, volume 4445 of Lecture Notes in Computer Science, pages 193--204, Valencia, Spain, 11-13 Apr. 2007. Springer.
16
 
17
R. Poli, W. B. Langdon, and N. F. McPhee. A field guide to genetic programming. Published via http://lulu.com and freely available at http://www.gp-field-guide.org.uk, 2008. (With contributions by J. R. Koza).
 
18
R. Todeschini and V. Consonni. Handbook of Molecular Descriptors. Wiley-VCH, Weinheim, 2000.
 
19
J. Rissanen. Modeling by shortest data description. Automatica, 14:465--471, 1978.
 
20
 
21
S. David, Wishart, C. Knox, A. C. Guo, S. Shrivastava, M. Hassanali,P. Stothard, Z. Chang and J. Woolsey. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Research, 34:doi:10.1093/nar/gkj067, 2006.
 
22
S. Silva. GPLAB -- a genetic programming toolbox for MATLAB, version 3.0, 2007. http://gplab.sourceforge.net.
 
23
S. Silva and J. Almeida. Dynamic maximum tree depth. In E. Cantú-Paz, et al., editors, Genetic and Evolutionary Computation -- GECCO--2003, volume 2724 of LNCS, pages 1776---1787, Chicago, 12-16 July 2003. Springer--Verlag.
 
24
 
25
 
26
Simulation Plus Inc. a company that use both statistical methods and differential equations based simulations for ADME parameter estimation., 2006. See www.simulationsplus.com.
 
27
T. Kennedy. Managing the drug discovery/development interface. Drug Discovery Today, 2:436--444, 1997.
 
28
L. Vanneschi, M. Tomassini, P. Collard, and M. Clergue. Fitness distance correlation in structural mutation genetic programming. In C. Ryan, et al., editors, Genetic Programming, Proceedings of EuroGP'2003, volume 2610 of LNCS, pages 455--464, Essex, 14-16 Apr. 2003. Springer-Verlag.
 
29
W. B. Langdon and S. J. Barrett. Genetic Programming in data mining for drug discovery. in Evolutionary computing in data mining, pages 211--235, 2004.

Collaborative Colleagues:
Sara Silva: colleagues
Leonardo Vanneschi: colleagues