|
ABSTRACT
In protein sequence alignment algorithms, a substitution matrix of 20x20 alignment parameters is used to describe the rates of amino acid substitutions over time. Development and evaluation of most substitution matrices including the BLOSUM family [1] was based almost entirely on fully structured proteins. Structurally disordered proteins (i.e. proteins that lack structure, either in part or as a whole) that have been shown to be very common in nature [2] have a significantly different amino acid composition than ordered (i.e. structured) proteins [3]. Furthermore, the sequence evolution rate is higher in unstructured as compared to structured regions of proteins containing both structured and unstructured regions [4]. These results cast doubt on appropriateness of the BLOSUM substitution matrices for alignment of structurally disordered proteins [5].To address this problem, we take into the account the concept of structural disorder by extending the alphabet for sequence representation from 20 to 2x20=40 symbols, 20 for amino acids in disordered regions and 20 for amino acids in ordered regions. A 40x40 substitution matrix is required for alignment of sequences represented in the extended alphabet. Such an expanded matrix contains 20x20 submatrices that correspond to matching ordered-ordered, ordered-disordered, and disordered-disordered pairs of residues. In this paper we describe an iterative procedure that we used to estimate such a 40x40 substitution matrix. The iterative procedure converged with stable results with respect to the choice of the sequences in the dataset. In the obtained 40x40 matrix we found substantial differences between the 20x20 submatrices corresponding to ordered-ordered, ordered-disordered, and disordered-disordered region matching. These differences provide evidence that for alignment of protein sequences that contain disordered segments, the discovered substitution matrix is more appropriate than the BLOSUM substitution matrices. At the same time, the new substitution matrix is applicable for sequence alignment of fully ordered proteins as its order-order submatrix is very similar to a BLOSUM matrix.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Henikoff, S. 1992. Amino Acid Substitution Matrices from Protein Blocks. PNAS 89: 10915--10919.
|
| |
2
|
Dunker, A. K., Lawson, J. D., Brown, C. J., Williams, R. M., Romero, P., Oh, J. S., Oldfield, C. J., Campen, A. M., Ratliff, C. M., Hipps, K. W., Ausio, J., Nissen, M. S., Reeves, R., Kang, C., Kissinger, C. R., Bailey, R. W., Griswold, M. D., Chiu, W., Garner, E. C., and Obradovic, Z. 2001. Intrinsically disordered protein. J Mol Graph Model 19, 26--59.
|
| |
3
|
Romero, P., Obradovic, Z., Li, X., Garner, E. C., Brown, C. J., and Dunker, A. K. 2001. Sequence complexity of disordered protein. Proteins 42, 38--48.
|
| |
4
|
Brown, C. J., Takayama, S., Campen, A., Vise, P., Marshall, T., Oldfield, C. J., and Dunker, A. K. 2002. Evolutionary rate heterogeneity in proteins with long disordered regions. J. Mol. Evol. 55: 102--107.
|
| |
5
|
Radivojac P., Obradovic Z., Brown C. J., and Dunker A. K. 2002. Improving sequence alignments for intrinsically disordered proteins. Pac Symp Biocomput. 589--600.
|
| |
6
|
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. 1990. Basic local alignment search tool. J Mol Biol 215 (3): 403--410.
|
| |
7
|
Chenna, R., Sugawara, H., Koike, T., Lopez, R., Gibson, T. J., Higgins, D. G., and Thompson, J. D. 2003. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res 31 (13): 3497--3500.
|
| |
8
|
Dayhoff, M. O., Schwartz, R. and Orcutt, B. C. 1978. A model of Evolutionary Change in Proteins, Atlas of protein sequence and structure (volume 5, supplement 3 ed.), Nat. Biomed. Res. Found., p. 345--358.
|
| |
9
|
Henikoff, J. G., and Henikoff, S. 1996. Blocks database and its applications. Methods Enzymol. 1996;266:88--105.
|
| |
10
|
Dunker, A. K., Brown, C. J., Lawson, J. D., Iakoucheva, L. M., and Obradovic, Z. 2002. Intrinsic disorder and protein function. Biochemistry 41, 6573--6582.
|
| |
11
|
Peng, K., Radivojac, P., Vucetic, S., Dunker, A. K., and Obradovic, Z. 2006. Length-dependent prediction of protein intrinsic disorder. BMC Bioinformatics 7, 208.
|
| |
12
|
Needleman, S. B., and Wunsch, C. D. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48 (3): 443--53.
|
| |
13
|
Smith, T. F., Waterman, M. S. 1981. Identification of Common Molecular Subsequences. J Mol Biol 147: 195--197.
|
|