|
ABSTRACT
Identifying residue coupling relationships within a protein family can provide important insights into the family's evolutionary record, and has significant applications in analyzing and optimizing sequence-structure-function relationships. We present the first algorithm to infer an undirected graphical model representing residue coupling in protein families. Such a model, which we call a residue coupling network, serves as a compact description of the joint amino acid distribution, focused on the independences among residues. This stands in contrast to current methods, which manipulate dense representations of co-variation and are focused on assessing dependence, which can conflate direct and indirect relationships. Our probabilistic model provides a sound basis for predictive (will this newly designed protein be folded and functional?), diagnostic (why is this protein not stable or functional?), and abductive reasoning (what if I attempt to graft features of one protein family onto another?). Further, our algorithm can readily incorporate, as priors, hypotheses regarding possible underlying mechanistic/energetic explanations for coupling. The resulting approach constitutes a powerful and discriminatory mechanism to identify residue coupling from protein sequences and structures. Analysis results on the G-protein coupled receptor (GPCR) and PDZ domain families demonstrate the ability of our approach to effectively uncover and exploit models of residue coupling.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
W. Atchley, W. Terhalle, and A. Dress. Positional Dependence, Cliques, and Predictive Motifs in the bHLH Protein Domain. Journal of Molecular Evolution, Vol. 48:501--516, 1999.
|
| |
2
|
|
| |
3
|
W. Buntine. Operations for Learning with Graphical Models. Journal of Artificial Intelligence Research, Vol. 2:159--225, 1994.
|
| |
4
|
|
| |
5
|
M. Drton and M. Perlman. Model Selection for Gaussian Concentration Graphs. Biometrika, Vol. 91(3):591--602, 2004.
|
| |
6
|
A. Fodor and R. Aldrich. Influence of Conservation on Calculations of Amino Acid Covariance in Multiple Sequence Alignments. Proteins: Structure, Function, and Bioinformatics, Vol. 56:211--221, 2004.
|
| |
7
|
A. Fodor and R. Aldrich. On Evolutionary Conservation of Thermodynamic Coupling in Proteins. Journal of Biological Chemistry, Vol. 279(18):19046--19050, Apr 2004.
|
| |
8
|
N. Friedman, I. Nachman, and D. Peer. Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm. In Proceedings of the Fifteenth International Conference on Uncertainty in Artificial Intelligence (UAI'99), pages 206--215, 1999.
|
| |
9
|
I. Grigoriev and S.-H. Kim. Detection of Protein Fold Similarity Based on Correlation of Amino Acid Properties. Proceedings of the National Academy of Sciences, USA, Vol. 96(25):14318--14323, Dec 1999.
|
| |
10
|
B. Harris and W. Lim. Mechanism and Role of PDZ Domains in Signaling Complex Assembly. Journal of Cell Science, Vol. 114:3219--3231, 2001.
|
| |
11
|
A. Horovitz. Double-Mutant Cycles: A Powerful Tool for Analyzing Protein Structure and Function. Fold. Des., Vol. 1:R121--R126, 1996.
|
 |
12
|
|
| |
13
|
A. Hung and M. Sheng. PDZ Domains: Structural Modules for Protein Complex Assembly. Journal of Biological Chemistry, Vol. 277(8):5699--5702, Feb 2002.
|
| |
14
|
|
| |
15
|
I. Kass and A. Horovitz. Mapping Pathways of Allosteric Communication in GroEL by Analysis of Correlated Mutations. Proteins: Structure, Function, and Genetics, Vol. 48:611--617, 2002.
|
| |
16
|
B. Korber, R. Farber, D. Wolpert, and A. Lapedes. Covariation of Mutations in the V3 Loop of HIV Type 1 Envelope Protein: An Information Theoretic Analysis. Proceedings of the National Academy of Sciences, USA, Vol. 90:7176--7180, Aug 1993.
|
| |
17
|
S. Lauritzen. Graphical Models. Oxford University Press, 1996.
|
| |
18
|
S. Lockless and R. Ranganathan. Evolutionarily Conserved Pathways of Energetic Connectivity in Protein Families. Science, Vol. 286(5438):295--299, Oct 1999.
|
| |
19
|
|
| |
20
|
M. Milik, S. Szalma, and K. Olszewski. Common Structural Cliques: A Tool for Protein Structure and Function Analysis. Protein Engineering, Vol. 16(8):542--552, 2003.
|
| |
21
|
O. Olmea, B. Rost, and A. Valencia. Effective Use of Sequence Correlation and Conservation in Fold Recognition. Journal of Molecular Biology, Vol. 295:1221--1239, 1999.
|
| |
22
|
W. Russ and R. Ranganathan. Knowledge-Based Potential Functions in Protein Design. Current Opinion in Structural Biology, Vol. 12:447--452, 2002.
|
| |
23
|
W. Sandberg and T. Terwilliger. Engineering Multiple Properties of a Protein by Combinatorial Mutagenesis. Proceedings of the National Academy of Sciences, USA, Vol. 90(18):8367--8371, Sep 1993.
|
| |
24
|
M. Saraf, G. Moore, and C. Maranas. Using Multiple Sequence Correlation Analysis to Characterize Functionally Important Protein Regions. Protein Engineering, Vol. 16(6):397--406, 2003.
|
| |
25
|
O. Schueler-Furman and D. Baker. Conserved Residue Clustering and Protein Structure Prediction. Proteins: Structure, Function, and Genetics, Vol. 52:225--235, 2003.
|
| |
26
|
G. Suel, S. Lockless, M. Wall, and R. Ranganathan. Evolutionary Conserved Networks of Residues Mediate Allosteric Communication in Proteins. Nature Structural Biology, Vol. 10:59--69, Jan 2003.
|
| |
27
|
W. Valdar. Scoring Residue Conservation. Proteins: Structure, Function, and Genetics, Vol. 48:227--241, 2002.
|
| |
28
|
C. Voigt, C. Martinez, Z.-G. Wang, S. Mayo, and F. Arnold. Protein Building Blocks Preserved by Recombination. Nature Structural Biology, Vol. 9(7):553--558, Jul 2002.
|
|