|
ABSTRACT
Datasets with a large number of attributes are a difficult challenge for evolutionary learning techniques. The recently proposed attribute list rule representation has shown to be able to significantly improve the overall performance (e.g. run-time, accuracy, rule set size) of the BioHEL Iterative Evolutionary Rule Learning system. In this paper we, first, extend the attribute list rule representation so it can handle not only continuous domains, but also datasets with a very large number of mixed discrete-continuous attributes. Secondly, we benchmark the new representation with a diverse set of large-scale datasets and, third, we compare the new algorithms with several well-known machine learning methods. The experimental results we describe in the paper show that the new representation is equal or better than the state of-the-art in evolutionary rule representations both in terms of the accuracy obtained with the benchmark datasets used, as well as in terms of the computational time requirements needed to achieve these improved accuracies. The new attribute list representation puts BioHEL on an equal footing with other well-established machine learning techniques in terms of accuracy. In the paper, we also analyse and discuss the current weaknesses behind the current representation and indicate potential avenues for correcting them.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Bacardit. Pittsburgh Genetics-Based Machine Learning in the Data Mining era: Representations, generalization, and run-time. PhD thesis, Ramon Llull University, Barcelona, Spain, 2004.
|
| |
2
|
J. Bacardit, E. K. Burke, and N. Krasnogor. Improving the scalability of rule-based evolutionary learning. Memetic Computing, in press, 2009.
|
| |
3
|
J. Bacardit and N. Krasnogor. Performance and efficiency of memetic pittsburgh learning classifier systems. Evolutionary Computation Journal, 17(3):in press, 2009.
|
 |
4
|
Jaume Bacardit , Michael Stout , Jonathan D. Hirst , Kumara Sastry , Xavier Llorà , Natalio Krasnogor, Automated alphabet reduction method with evolutionary algorithms for protein structure prediction, Proceedings of the 9th annual conference on Genetic and evolutionary computation, July 07-11, 2007, London, England
[doi> 10.1145/1276958.1277033]
|
| |
5
|
J. Bacardit, M. Stout, J. D. Hirst, A. Valencia, R. E. Smith, and N. Krasnogor. Automated alphabet reduction for protein datasets. BMC Bioinformatics, 10:6, 2009.
|
| |
6
|
G. W. Bassel, P. Fung, T.-f. F. Chow, J. A. Foong, N. J. Provart, and S. R. Cutler. Elucidating the Germination Transcriptional Program Using Small Molecules. Plant Physiol., 147(1):143--155, 2008.
|
| |
7
|
C. Blake, E. Keogh, and C. Merz. UCI repository of machine learning databases, 1998. (www.ics.uci.edu/mlearn/MLRepository.html).
|
| |
8
|
M. V. Butz. Rule-Based Evolutionary Online Learning Systems: A Principled Approach to LCS Analysis and Design, volume 109 of Studies in Fuzziness and Soft Computing. Springer, 2006.
|
 |
9
|
Martin V. Butz , Pier Luca Lanzi , Xavier Llorà , Daniele Loiacono, An analysis of matching in learning classifier systems, Proceedings of the 10th annual conference on Genetic and evolutionary computation, July 12-16, 2008, Atlanta, GA, USA
[doi> 10.1145/1389095.1389359]
|
| |
10
|
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines. Department of Computer Science and Information Engineering, National Taiwan University, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
|
| |
11
|
K. A. De Jong and W. M. Spears. Learning concept classification rules using genetic algorithms. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 651--656. Morgan Kaufmann, 1991.
|
| |
12
|
|
| |
13
|
F. Divina, M. Keijzer, and E. Marchiori. A method for handling numerical attributes in GA-based inductive concept learners. In GECCO 2003: Proceedings of the Genetic and Evolutionary Computation Conference, pages 898--908. Springer-Verlag, 12-16 July 2003.
|
| |
14
|
|
| |
15
|
|
| |
16
|
J. H. Holland and J. S. Reitman. Cognitive systems based on adaptive algorithms. In D. Hayes-Roth and F. Waterman, editors, Pattern-directed Inference Systems, pages 313--329. Academic Press, New York, 1978.
|
 |
17
|
|
 |
18
|
|
| |
19
|
A. Orriols-Puig. New Challenges in Learning Classifier Systems: Mining Rarities and Evolving Fuzzy Models. PhD thesis, Ramon Llull University, Barcelona, Spain, 2008.
|
| |
20
|
J. Rissanen. Modeling by shortest data description. Automatica, vol. 14:465--471, 1978.
|
| |
21
|
C. Schumacher, M. D. Vose, and L. D. Whitley. The no free lunch and problem description length. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001, pages 565--570. Morgan Kaufmann, 2001.
|
| |
22
|
|
| |
23
|
H. Vafaie and K. A. De Jong. Genetic algorithms as a tool for feature selection in machine learning. In Proceeding of the 4th International Conference on Tools with Artificial Intelligence, pages 200--203, 1992.
|
| |
24
|
|
| |
25
|
S. W. Wilson. Get real! XCS with continuous-valued inputs. In L. Booker, S. Forrest, M. Mitchell, and R. L. Riolo, editors, Festschrift in Honor of John H. Holland, pages 111--121. Center for the Study of Complex Systems, 1999.
|
| |
26
|
|
| |
27
|
D. H. Wolpert and W. G. Macready. No free lunch theorems for search. Working Papers 95-02-010, Santa Fe Institute, Feb 1995. available at http://ideas.repec.org/p/wop/safiwp/95-02-010.html.
|
|