| Improving Markov chain classification using string transformations and evolutionary search |
| Full text |
Pdf
(1.67 MB)
|
Source
|
Genetic And Evolutionary Computation Conference
archive
Proceedings of the 11th Annual conference on Genetic and evolutionary computation
table of contents
Montreal, Québec, Canada
SESSION: Track 11: genetics-based machine learning
table of contents
Pages 1259-1266
Year of Publication: 2009
ISBN:978-1-60558-325-9
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): n/a, Downloads (12 Months): n/a, Citation Count: 0
|
|
|
ABSTRACT
Markov chain classification or n-gram modeling, as it is sometimes called, is a very common and powerful tool for many problems that involve sequences of finite tokens. It has been used in a wide range of tasks, including natural language modeling, author identification, protein similarity searches, and even bird-song recognition. Clearly, an improvement in the Markov chain classification will have broad implications in many fields. Our new system, called SCS, improves upon Markov chain classification by introducing a preprocessing step in which an arbitrary set of transformation functions are performed on the input sequences. Since the space of possible transformations is unbounded, a genetic algorithm search is used to search for functions that improve classification. We show that GA is able to consistently find preprocessing functions that substantially improve the performance of the Markov chain model.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Jaume Bacardit , Michael Stout , Jonathan D. Hirst , Kumara Sastry , Xavier Llorà , Natalio Krasnogor, Automated alphabet reduction method with evolutionary algorithms for protein structure prediction, Proceedings of the 9th annual conference on Genetic and evolutionary computation, July 07-11, 2007, London, England
[doi> 10.1145/1276958.1277033]
|
| |
2
|
L. R. Bahl, F. Jelinek, and R. L. Mercer. Likelihood approach to continuous speech recognition. Transactions on Pattern Analysis and Machine Intelligence, 2:179--190, 1983.
|
| |
3
|
A. Bairoch and R. Apweiler. The swiss-prot protein sequence data bank and its supplement trembl. Nucleic Acids Research, 25(1):31--36, 1997.
|
| |
4
|
Peter F. Brown , John Cocke , Stephen A. Della Pietra , Vincent J. Della Pietra , Fredrick Jelinek , John D. Lafferty , Robert L. Mercer , Paul S. Roossin, A statistical approach to machine translation, Computational Linguistics, v.16 n.2, p.79-85, June 1990
|
| |
5
|
|
| |
6
|
B. Y. M. Cheng, J. G. Carbonell, and J. Klein-Seetharaman. Protein classification based on text document classification techniques. Proteins: Structure, Function, and Bioinformatics, 58(4):955--970, 2005.
|
| |
7
|
|
| |
8
|
|
|