|
ABSTRACT
Selection of machine learning techniques requires a certain sensitivity to the requirements of the problem. In particular, the problem can be made more tractable by deliberately using algorithms that are biased toward solutions of the requisite kind. In this paper, we argue that recurrent neural networks have a natural bias toward a problem domain of which biological sequence analysis tasks are a subset. We use experiments with synthetic data to illustrate this bias. We then demonstrate that this bias can be exploitable using a data set of protein sequences containing several classes of subcellular localization targeting peptides. The results show that, compared with feed forward, recurrent neural networks will generally perform better on sequence analysis tasks. Furthermore, as the patterns within the sequence become more ambiguous, the choice of specific recurrent architecture becomes more critical.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
S.F. Altschul W. Gish W. Miller E.W. Meyers and D.J. Lipman, “Basic Local Alignment Search Tool,” <i>J. Molecular Biology,</i> vol. 215, no. 3, pp. 403-410, 1990.
|
| |
2
|
T. Bailey M.E. Baker C.P. Elkan and W.N. Grundy, “MEME, MAST, and Meta-MEME: New Tools for Motif Discovery in Protein Sequences,” <i>Pattern Discovery in Biomolecular Data: Tools, Techniques, and Applications,</i> J.T.L. Wang, B.A. Shapiro, and D. Shasha, eds., pp. 30-54, Oxford Univ. Press, 1999.
|
| |
3
|
|
| |
4
|
P. Baldi S. Brunak P. Frasconi G. Soda and G. Pollastri, “Exploiting the Past and the Future in Protein Secondary Structure Prediction,” <i>Bioinformatics,</i> vol. 15, pp. 937-946, 1999.
|
| |
5
|
M. Christiansen and N. Chater, “Toward a Connectionist Model of Recursion in Human Linguistic Performance,” <i>Cognitive Science,</i> vol. 23, pp. 157-205, 1999.
|
| |
6
|
J.L. Elman, “Finding Structure in Time,” <i>Cognitive Science,</i> vol. 14, pp. 179-211, 1990.
|
| |
7
|
O. Emanuelsson, “Predicting Protein Subcellular Localisation from Amino Acid Sequence Information,” <i>Briefings in Bioinformatics,</i> vol. 3, no. 4, pp. 361-376, 2002.
|
| |
8
|
O. Emanuelsson H. Nielsen S. Brunak and G. von Heijne, “Predicting Subcellular Localization of Proteins Based on Their N-Terminal Amino Acid Sequence,” <i>J. Molecular Biology,</i> vol. 300, no. 4, pp. 1005-1016, 2000.
|
| |
9
|
|
| |
10
|
R. Janulczyk and M. Rasmussen, “Improved Pattern for Genome-Based Screening Identifies Novel Cell Wall-Attached Proteins in Gram-Positive Bacteria,” <i>Infection and Immunity,</i> vol. 69, no. 6, pp. 4019-4026, 2001.
|
| |
11
|
L. Kall A. Krogh and E.L. L. Sonnhammer, “A Combined Transmembrane Topology and Signal Peptide Prediction Method,” <i>J. Molecular Biology,</i> vol. 338, no. 5, pp. 1027-1036, 2004.
|
| |
12
|
J.F. Kolen, “Recurrent Networks: State Machines or Iterated Function Systems?” <i>Proc. 1993 Connectionist Models Summer School,</i> pp. 203-210, 1994.
|
| |
13
|
B. Ma J. Tromp and M. Li, “Patternhunter: Faster and More Sensitive Homology Search,” <i>Bioinformatics,</i> vol. 18, pp. 440-445, 2002.
|
| |
14
|
T.M. Mitchell, “The Need for Biases in Learning Generalisations,” <i>Readings in Machine Learning,</i> J.W. Shavlik and T.G. Dietterich, eds., Morgan Kaufmann, 1980.
|
| |
15
|
|
| |
16
|
G. Pollastri D. Przybylski B. Rost and P. Baldi, “Improving the Prediction of Protein Secondary Strucure in Three and Eight Classes Using Recurrent Neural Networks and Profiles,” <i>Proteins,</i> vol. 47, pp. 228-235, 2002.
|
| |
17
|
T.D. Schneider and R.M. Stephens, “Sequence Logos: A New Way to Display Consensus Sequences,” <i>Nucleic Acids Research,</i> vol. 18, no. 20, pp. 6097-6100, 1990.
|
| |
18
|
P. Tino M. Cernansky and L. Benuskova, “Markovian Architectural Bias of Recurrent Neural Networks,” <i>IEEE Trans. Neural Networks,</i> vol. 15, no. 1, pp. 6-15, 2004.
|
| |
19
|
|
 |
20
|
|
| |
21
|
E.J.B. Williams C. Pal and L.D. Hurst, “The Molecular Evolution of Signal Peptides,” <i>Gene,</i> vol. 253, no. 2, pp. 313-322, 2000.
|
INDEX TERMS
Primary Classification:
F.
Theory of Computation
F.1
COMPUTATION BY ABSTRACT DEVICES
F.1.1
Models of Computation
Subjects:
Self-modifying machines (e.g., neural networks)
Additional Classification:
I.
Computing Methodologies
I.5
PATTERN RECOGNITION
I.5.1
Models
Subjects:
Neural nets
J.
Computer Applications
J.3
LIFE AND MEDICAL SCIENCES
Subjects:
Biology and genetics
General Terms:
Algorithms,
Design,
Experimentation,
Measurement,
Performance,
Theory
Keywords:
Index Terms- Machine learning,
neural network architecture,
recurrent neural network,
bias,
biological sequence analysis,
motif,
subcellular localization,
pattern recognition,
classifier design.
|