ACM Home Page
Please provide us with feedback. Feedback
Large scale data mining using genetics-based machine learning
Full text PdfPdf (16.36 MB)
Source
Genetic And Evolutionary Computation Conference archive
Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers table of contents
Montreal, Québec, Canada
TUTORIAL SESSION: Tutorials table of contents
Pages 3381-3412  
Year of Publication: 2009
ISBN:978-1-60558-505-5
Authors
Jaume Bacardit  University of Nottingham, Nottingham, United Kingdom
Xavier Llorà  University of Illinois at Urbana-Champaign, Urbana, IL, USA
Sponsors
SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 39,   Downloads (12 Months): 97,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1570256.1570424
What is a DOI?

ABSTRACT

We are living in the peta-byte era.We have larger and larger data to analyze, process and transform into useful answers for the domain experts. Robust data mining tools, able to cope with petascale volumes and/or high dimensionality producing human-understandable solutions are key on several domain areas. Genetics-based machine learning (GBML) techniques are perfect candidates for this task, among others, due to the recent advances in representations, learning paradigms, and theoretical modeling. If evolutionary learning techniques aspire to be a relevant player in this context, they need to have the capacity of processing these vast amounts of data and they need to process this data within reasonable time. Moreover, massive computation cycles are getting cheaper and cheaper every day, allowing researchers to have access to unprecedented parallelization degrees. Several topics are interlaced in these two requirements: (1) having the proper learning paradigms and knowledge representations, (2) understanding them and knowing when are they suitable for the problem at hand, (3) using efficiency enhancement techniques, and (4) transforming and visualizing the produced solutions to give back as much insight as possible to the domain experts are few of them.

This tutorial will try to answer this question, following a roadmap that starts with the questions of what large means, and why large is a challenge for data mining methods. Afterwards, we will discuss different facets in which we can overcome this challenge: Efficiency enhancement techniques, representations able to cope with large dimensionality spaces, scalability of learning paradigms, hardware solutions, parallel models and data-intensive computing. The roadmap continues with examples of real applications of GBML systems and finishes with an analysis of further directions.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
Jiang, M., Ryu, J., Kiraly, M., Duke, K., Reinke, V., and Kim, S.K., (2001). Genome-wide analysis of developmental and sex-regulated gene expression profiles in Caenorhabditis elegans. Proc. Natl. Acad. Sci. USA 98, 218--223
 
4
Bernadó, E., Ho, T.K., Domain of Competence of XCS Classifier System in Complexity Measurement Space, IEEE Transactions on Evolutionary Computation, 9: 82--104, 2005.
 
5
Physicists brace themselves for lhc 'data avalanche'." www.nature.com/news/2008/080722/full/news.2008.967.html
 
6
M. Pop and S. L. Salzberg, "Bioinformatics challenges of new sequencing technology," Trends in Genetics, vol. 24, no. 3, pp. 142 -- 149, 2008
 
7
 
8
K. Sastry, "Principled Efficiency-Enhancement Techniques", GECCO-2005 Tutorial
 
9
 
10
J. Bacardit, Pittsburgh Genetics-Based Machine Learning in the Data Mining era: Representations, generalization, and run-time. PhD thesis, Ramon Llull University, Barcelona, Spain, 2004
 
11
Jaume Bacardit, David E. Goldberg, Martin V. Butz, Xavier Llorà and Josep M. Garrell, Speeding-up Pittsburgh Learning Classifier Systems: Modeling Time and Accuracy, 8th International Conference on Parallel Problem Solving from Nature - PPSN VIII
 
12
D. Song, M.I. Heywood and A.N. Zincir-Heywood, Training genetic programming on half a million patterns: an example from anomaly detection, IEEE Transactions on Evolutionary Computation, vol. 9, no. 3, pp 225--239, 2005
 
13
Llora, X., Priya, A., and Bhragava, R. (2007), Observer-Invariant Histopathology using Genetics-Based Machine Learning. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2007), 2098--2105
 
14
Giráldez R, Aguilar-Ruiz JS, Santos JCR (2005) Knowledge-based fast evaluation for evolutionary learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C 35(2):254--261
 
15
J. Bacardit, E. K. Burke, and N. Krasnogor. Improving the scalability of rule-based evolutionary learning. Memetic Computing, in press, 2009.
16
17
 
18
Orriols-Puig, A., Bernadó-Mansilla, E., Sastry, K., and Goldberg, D. E. Substructrual surrogates for learning decomposable classification problems: implementation and first results. 10th International Workshop on Learning Classifier Systems, 2007
 
19
J. Bacardit and N. Krasnogor, Performance and Efficiency of Memetic Pittsburgh Learning Classifier Systems, Evolutionary Computation Journal, 17(3):(to appear), 2009
 
20
G. Wilson and W. Banzhaf, "Linear genetic programming gpgpu on microsoft's xbox 360," in Proceedings of the 2008 Congress on Evolutionary Computation, pp. 378--385. IEEE Press, 2008
 
21
 
22
 
23
24
 
25
 
26
 
27
J. Rissanen J. Modeling by shortest data description. Automatica vol. 14:465--471, 1978
 
28
 
29
Alba, E., Ed. Parallel Metaheuristics. Wiley, 2007.
 
30
31
 
32
Llora, X. Genetic Based Machine Learning using Fine-grained Parallelism for Data Mining. PhD thesis, Enginyeria i Arquitectura La Salle. Ramon Llull University, Barcelona, February, 2002.RFC2413, The Dublin Core Metadata Initiative, 2008.
 
33
 
34
M. Butz, Rule-Based Evolutionary Online Learning Systems: A Principled Approach toLCS Analysis and Design, Studies in Fuzziness and Soft Computing, vol 109. Springer, 2006
 
35
Hadoop (http://hadoop.apache.org/core/)
 
36
Meandre (http://seasr.org/meandre)
 
37

Collaborative Colleagues:
Jaume Bacardit: colleagues
Xavier Llorà: colleagues