|
ABSTRACT
We describe a new computer program, Trilogy, for the automated discovery of sequence-structure patterns in proteins. Trilogy implements a pattern discovery algorithm that begins with an exhaustive analysis of flexible three-residue patterns; a subset of these patterns are selected as seeds for an extension process in which longer patterns are identified. A key feature of the method is explicit treatment of both the sequence and structure components of these motifs: each Trilogy pattern is a pair consisting of a sequence pattern and a structure pattern. Matches to both these component patterns are identified independently, allowing the program to assign a significance score to each sequence-structure pattern that assesses the degree of correlation between the corresponding sequence and structure motifs. Trilogy identifies several thousand high-scoring patterns that occur across protein families. These include both previously identified and novel motifs. We expect that these sequence-structure patterns will be useful in predicting protein structure from sequence, annotating newly determined protein structures, and identifying novel motifs of potential functional or structural significance.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
R. Apweiler et al. InterPro --- an integrated documentation resource for protein families,
|
| |
2
|
A. Bairoch and R. Apweiler, The SWISS-PROT protein database and its supplement TrEMBL in 2000, Nucleic Acids Res., 28:45--48, 2000.
|
| |
3
|
H. M. Berman, J. Westbrook, Z. Feng, et al. The Protein Data Bank, Nucleic Acids Res., 28:235--242, 2000.
|
| |
4
|
P. Bork, C. Grunwald. Recognition of different nucleotide-binding sites in primary structures using a property-pattern approach, Eur. J. Biochem., 191:347--58, 1990.
|
| |
5
|
S. E. Brenner, P. Koehl, and M. Levitt. The ASTRAL compendium for protein structure and sequence analysis, Nucleic Acids Res., 28:254--6, 2000.
|
| |
6
|
C. Bystroff and D. Baker, Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs, J. Mol. Biol., 281:565--577, 1998.
|
| |
7
|
J. A. Di Gennaro, N. Siew, B. T. Hoffman, L. Zhang, J. Skolnick, L. I. Neilson, and J. S. Fetrow. Enhanced functional annotation of protein sequences via the use of structural descriptors, J. Struct. Biol., 134:232--245, 2001.
|
| |
8
|
G. Dodson, and A. Wlodawer. Catalytic triads and their relatives, Trends Biochem. Sci., 23:347--352, 1998.
|
| |
9
|
A. J. Doherty, L. C. Serpell, and C. P. Ponting. The helix-hairpin-helix DNA-binding motif: a structural basis for non-sequence-specific recognition of DNA, Nucleic Acids Res., 24:2488--97, 1996.
|
| |
10
|
K. F. Han and D. Baker, Recurring Local Sequence Motifs in Proteins, J. Mol. Biol., 251:176--187, 1995.
|
| |
11
|
E. T. Harper, G. D. Rose. Helix stop signals in proteins and peptides: the capping box, Biochemistry, 32:7605--9, 1993.
|
| |
12
|
K. Hofmann, P.Bucher, L. Falquet, A. Bairoch. The PROSITE database, its status in 1999, Nucleic Acids Res., 27:215--219, 1999.
|
| |
13
|
E. G. Hutchinson, J. M. Thornton. A revised set of potentials for beta-turn formation in proteins, Protein Science, 3:2207--16, 1994.
|
| |
14
|
I. Jonassen, I. Eidhammer, W. R. Taylor. Discovery of Local Packing Motifs in Protein Structures, Proteins: Structure, Function and Genetics, 34:206--219, 1999.
|
| |
15
|
W. Kabsch and C. Sander, Dictionary of Protein Secondary Structure: pattern Recognition of
|
| |
16
|
A.G. Murzin and S.F. Brenner and T. Hubbard and C. Chothia, SCOP: a structural classification of proteins database for the investigation of sequences and structures, J. Mol. Biol., 297:536--540, 1995.
|
| |
17
|
D. A. Parry. Coiled-coils in alpha-helix-containing proteins: analysis of the residue types within the heptad repeat and the use of these data in the prediction of coiled-coils in other proteins, Biosci. Rep., 2:1017--24, 1982.
|
| |
18
|
J. S. Richardson, D. C. Richardson. Amino acid preferences for specific locations at the ends of alpha helices, Science, 240:1648--52, 1988.
|
| |
19
|
I. Rigoutsos, A. Floratos, C. Ouzounis, Y. Gao, and L. Parida. Dictionary building via unsupervised hierarchical motif discovery in the sequence space of natural proteins, Proteins, 37:264--77, 1999.
|
| |
20
|
R. B. Russell. Detection of Protein Three-dimensional Side-chain Patterns: new Examples of Convergent Evolution, J. Mol. Biol., 279:1211--1227, 1998.
|
| |
21
|
C. Sander and R. Schneider, Database of homology derived protein structures and the
|
| |
22
|
R. A. Sayle, E. J. Milner-White. RASMOL: biomolecular graphics for all, Trends Biochem. Sci., 20:374, 1995.
|
| |
23
|
X. Shao and N. Grishin, Common fold in helix-hairpin-helix proteins, Nucleid Acids Res., 28:2643--2650, 2000.
|
| |
24
|
B. L. Sibanda and J. M. Thornton. Beta-hairpin families in globular proteins, Nature, 316:170--174, 1985.
|
| |
25
|
E. Sonnhamer and S. Eddy and E. Birney and A. Bateman and R. Durbin, Pfam: multiple sequence alignments and HMM-profiles of protein domains, Nucleic Acids Res., 26(1):320--322, 1998.
|
| |
26
|
R. K. Wierenga, P. Terpstra, W. G. Hol. Prediction of the occurrence of the ADP-binding beta alpha beta-fold in proteins, using an amino acid sequence fingerprint, J. Mol. Biol., 187:101--7, 1986.
|
| |
27
|
B. T. Wimberly, D. E. Brodersen, W. M. Clemons, Jr and R. J. Morgan-Warren, A. P. Carter, C. Vonrhein, T. Hartsch, and V. Ramakrishnan. Structure of the 30S ribosomal subunit, Nature, 407:327--39, 2000.
|
CITED BY
|
|
Matthew Menke , Eben Scanlon , Jonathan King , Bonnie Berger , Lenore Cowen, Wrap-and-pack: a new paradigm for beta structural motif recognition with application to recognizing beta trefoils, Proceedings of the eighth annual international conference on Resaerch in computational molecular biology, p.298-307, March 27-31, 2004, San Diego, California, USA
|
|