ACM Home Page
Please provide us with feedback. Feedback
On Subset Seeds for Protein Alignment
Full text PdfPdf (2.24 MB)
Source IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) archive
Volume 6 ,  Issue 3  (July 2009) table of contents
Pages 483-494  
Year of Publication: 2009
ISSN:1545-5963
Authors
Mikhail Roytberg  Institute of Mathematical Problems in Biology, Pushchino, Moscow
Anna Gambin  Warsaw University, Poland
Laurent Noe  LIFL/CNRS/INRIA, France
Slawomir Lasota  Warsaw University, Poland
Eugenia Furletova  Institute of Mathematical Problems in Biology, Pushchino, Moscow
Ewa Szczurek  Max Planck Institute for Molecular Genetics, Berlin
Gregory Kucherov  LIFL/CNRS/INRIA, France
Publisher
IEEE Computer Society Press  Los Alamitos, CA, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 44,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: 10.1109/TCBB.2009.4

ABSTRACT

We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets. We then perform a comparative analysis of seeds built over those alphabets and compare them with the standard Blastp seeding method [2], [3], as well as with the family of vector seeds proposed in [4]. While the formalism of subset seeds is less expressive (but less costly to implement) than the cumulative principle used in Blastp and vector seeds, our seeds show a similar or even better performance than Blastp on Bernoulli models of proteins compatible with the common BLOSUM62 matrix. Finally, we perform a large-scale benchmarking of our seeds against several main databases of protein alignments. Here again, the results show a comparable or better performance of our seeds versus Blastp.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
G. Kucherov, L. Noé, and M. Roytberg, "A Unifying Framework for Seed Sensitivity and Its Application to Subset Seeds," J. Bioinformatics and Computational Biology, vol. 4, no. 2, pp. 553- 570, Apr. 2006 (preliminary version in WABI '05).
 
2
S. Altschul, W. Gish, W. Miller, E. Myers, and D. Lipman, "Basic Local Alignment Search Tool," J. Molecular Biology, vol. 215, pp. 403-410, 1990.
 
3
S. Altschul, T. Madden, A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D. Lipman, "Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs," Nucleic Acids Research, vol. 25, no. 17, pp. 3389-3402, 1997.
 
4
 
5
B. Ma, J. Tromp, and M. Li, "PatternHunter: Faster and More Sensitive Homology Search," Bioinformatics, vol. 18, no. 3, pp. 440- 445, 2002.
 
6
W.J. Kent, "BLAT-The BLAST-Like Alignment Tool," Genome Research, vol. 12, pp. 656-664, 2002.
 
7
M. Li, B. Ma, D. Kisman, and J. Tromp, "PatternHunter II: Highly Sensitive and Fast Homology Search," J. Bioinformatics and Computational Biology, vol. 2, no. 3, pp. 417-439, 2004 (earlier version in GIW '03).
 
8
L. Noé and G. Kucherov, "YASS: Enhancing the Sensitivity of DNA Similarity Search," Nucleic Acid Research, vol. 33, pp. W540- W543, 2005.
 
9
 
10
 
11
12
 
13
G. Kucherov, L. Noé, and M. Roytberg, "Multi-Seed Lossless Filtration," Proc. 15th Ann. Combinatorial Pattern Matching Symp. (CPM '04), pp. 297-310, July 2004.
 
14
 
15
J. Xu, D. Brown, M. Li, and B. Ma, "Optimizing Multiple Spaced Seeds for Homology Search," Proc. 15th Symp. Combinatorial Pattern Matching, pp. 47-58, July 2004.
 
16
 
17
P. Peterlongo, L. Noé, D. Lavenier, G. Georges, J. Jacques, G. Kucherov, and M. Giraud, "Protein Similarity Search with Subset Seeds on a Dedicated Reconfigurable Hardware," Proc. Second Workshop Parallel Computational Biology, 2007.
 
18
V.H. Nguyen and D. Lavenier, "Speeding Up Subset Seed Algorithm for Intensive Protein Sequence Comparison," Proc. Sixth IEEE Int'l Conf. Research, Innovation & Vision for the Future (RIVF '08), pp. 57-63, 2008.
 
19
L. Noé and G. Kucherov, "Improved Hit Criteria for DNA Local Alignment," BMC Bioinformatics, vol. 5, no. 149, Oct. 2004.
 
20
L. Zhou, J. Stanton, and L. Florea, "Universal Seeds for cDNA-to-Genome Comparison," BMC Bioinformatics, vol. 9, no. 36, 2008.
 
21
B. Ma and H. Yao, "Seed Optimization is No Easier Than Optimal Golomb Ruler Design," Proc. Sixth Asia Pacific Bioinformatics Conf. (APBC '08), pp. 133-144, Jan. 2008.
 
22
 
23
T. Li, K. Fan, J. Wang, and W. Wang, "Reduction of Protein Sequence Complexity by Residue Grouping," J. Protein Eng., vol. 16, pp. 323-330, 2003.
 
24
L. Murphy, A. Wallqvist, and R. Levy, "Simplified Amino Acid Alphabets for Protein Fold Recognition and Implications for Folding," J. Protein Eng., vol. 13, pp. 149-152, 2000.
 
25
 
26
S. Henikoff and J. Henikoff, "Amino Acid Substitution Matrices from Protein Blocks," Proc. Nat'l Academy of Sciences USA, vol. 89, pp. 10915-10919, 1992.
 
27
S. Henikoff and J. Henikoff, "Automated Assembly of Protein Blocks for Database Searching," Nucleic Acids Research, vol. 19, no. 23, pp. 6565-6572, 1991.
28
 
29
L. Ilie and S. Ilie, "Long Spaced Seeds for Finding Similarities Between Biological Sequences," Proc. Second Int'l Conf. Bioinformatics & Computational Biology (BIOCOMP '07), pp. 3-8, 2007.
 
30
A. Bahr, J. Thompson, J. Thierry, and O. Poch, "BAliBASE (Benchmark Alignment dataBASE): Enhancements for Repeats, Transmembrane Sequences and Circular Permutations," Nucleic Acids Research, vol. 29, no. 1, pp. 323-326, 2001.
 
31
A. Stebbings and K. Mizuguchi, "HOMSTRAD: Recent Developments of the Homologous Protein Structure Alignment Database," Nucleic Acids Research, vol. 32, pp. D203-D207, 2004.
 
32
A.R. Subramanian, J. Weyer-Menkhoff, M. Kaufmann, and B. Morgenstern, "DIALIGN-T: An Improved Algorithm for Segment-Based Multiple Sequence Alignment," BMC Bioinformatics, vol. 6, no. 66, 2005.
 
33
G. Raghava, S. Searle, P. Audley, J. Barber, and G. Barton, "OXBench: A Benchmark for Evaluation of Protein Multiple Sequence Alignment Accuracy," BMC Bioinformatics, vol. 4, no. 47, 2003.
 
34
R. Finn, J. Mistry, B. Schuster-Bckler, S. Griffiths-Jones, V. Hollich, T. Lassmann, S. Moxon, M. Marshall, A. Khanna, R. Durbin, S. Eddy, E. Sonnhammer, and A. Bateman, "PFAM: Clans, Web Tools and Services," Nucleic Acids Research, vol. 34, pp. D247-D251, 2006.
 
35
R.C. Edgar, "MUSCLE: Multiple Sequence Alignment with High Accuracy and High Throughput," Nucleic Acids Research, vol. 32, no. 5, pp. 1792-1797, 2004.
 
36
I. Letunic, R. Copley, B. Pils, S. Pinkert, J. Schultz, and P. Bork, "SMART 5: Domains in the Context of Genomes and Networks," Nucleic Acids Research, vol. 34, no. 1, pp. D257-D260, 2006.
 
37
R. Nunez Miguel, J. Shi, and K. Mizuguchi, "Protein Fold Recognition and Comparative Modeling using HOMSTRAD, JOY and FUGUE," Protein Structure Prediction: Bioinformatic Approach, pp. 143-169, Int'l University Line Publishers, 2001.
 
38
 
39
P. Peterlongo, L. Noé, D. Lavenier, N.V.H., G. Kucherov, and M. Giraud, "Optimal Neighborhood Indexing for Protein Similarity Search," BMC Bioinformatics, vol. 9, no. 534, 2008.

Collaborative Colleagues:
Mikhail Roytberg: colleagues
Anna Gambin: colleagues
Laurent Noe: colleagues
Slawomir Lasota: colleagues
Eugenia Furletova: colleagues
Ewa Szczurek: colleagues
Gregory Kucherov: colleagues