|
ABSTRACT
Large-scale protein sequence comparison is an important but compute-intensive task in molecular biology. BLASTP is the most popular tool for comparative analysis of protein sequences. In recent years, an exponential increase in the size of protein sequence databases has required either exponentially more running time or a cluster of machines to keep pace. To address this problem, we have designed and built a high-performance FPGA-accelerated version of BLASTP, Mercury BLASTP. In this article, we describe the architecture of the portions of the application that are accelerated in the FPGA, and we also describe the integration of these FPGA-accelerated portions with the existing BLASTP software. We have implemented Mercury BLASTP on a commodity workstation with two Xilinx Virtex-II 6000 FPGAs. We show that the new design runs 11--15 times faster than software BLASTP on a modern CPU while delivering close to 99% identical results.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D. J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucl. Acids Res. 25, 17, 3389--3402.
|
| |
2
|
Altschul, S. F. and Gish, W. 1996. Local alignment statistics. Metho. Enzymol. 266, 460--80.
|
| |
3
|
Buhler, J. D., Lancaster, J. M., Jacob, A. C., and Chamberlain, R. D. 2007. Mercury BLASTN: Faster DNA sequence comparison using a streaming hardware architecture. In Proceedings of Reconfigurable Systems Summer Institute.
|
 |
4
|
Roger D. Chamberlain , Ron K. Cytron , Mark A. Franklin , Ronald S. Indeck, The Mercury system: exploiting truly fast hardware for data search, Proceedings of the international workshop on Storage network architecture and parallel I/Os, p.65-72, September 28-28, 2003, New Orleans, Louisiana
[doi> 10.1145/1162618.1162626]
|
| |
5
|
Chamberlain, R. D. and Shands, B. 2005. Streaming data from disk store to application. In Proceedings of the International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI). 17--23.
|
| |
6
|
Dayhoff, M. O., Schwartz, R., and Orcutt, B. C. 1978. A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure 5, 345--52.
|
| |
7
|
Henikoff S. and Henikoff, J. G. 1992. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. 89, 22, 10915--10919.
|
| |
8
|
Martin C. Herbordt , Josh Model , Yongfeng Gu , Bharat Sukhwani , Tom VanCourt, Single Pass, BLAST-Like, Approximate String Matching on FPGAs, Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, p.217-226, April 24-26, 2006
[doi> 10.1109/FCCM.2006.64]
|
| |
9
|
Martin C. Herbordt , Josh Model , Bharat Sukhwani , Yongfeng Gu , Tom VanCourt, Single pass streaming BLAST on FPGAs, Parallel Computing, v.33 n.10-11, p.741-756, November, 2007
[doi> 10.1016/j.parco.2007.09.003]
|
| |
10
|
Jeffrey D. Hirschberg , Richard Hughey , Kevin Karplus , Don Speck, Kestrel: A Programmable Array for Sequence Analysis, Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors, p.25, August 19-23, 1996
|
| |
11
|
Hoang, D. T. 1993. Searching genetic databases on Splash 2. In IEEE Workshop on FPGAs for Custom Computing Machines (FCCM). 185--191.
|
| |
12
|
Praveen Krishnamurthy , Jeremy Buhler , Roger Chamberlain , Mark Franklin , Kwame Gyang , Arpith Jacob , Joseph Lancaster, Biosequence Similarity Search on the Mercury System, Journal of VLSI Signal Processing Systems, v.49 n.1, p.101-121, October 2007
[doi> 10.1007/s11265-007-0087-0]
|
| |
13
|
Praveen Krishnamurthy , Jeremy Buhler , Roger Chamberlain , Mark Franklin , Kwame Gyang , Joseph Lancaster, Biosequence Similarity Search on the Mercury System, Proceedings of the Application-Specific Systems, Architectures and Processors, 15th IEEE International Conference, p.365-375, September 27-29, 2004
[doi> 10.1109/ASAP.2004.12]
|
| |
14
|
Lancaster, J., Buhler, J., and Chamberlain, R. D. 2005. Acceleration of ungapped extension in Mercury BLAST. In Proceedings of 7th Workshop on Media and Streaming Processors. 50--57.
|
| |
15
|
Lancaster, J., Buhler, J., and Chamberlain, R. D. 2008. Acceleration of ungapped extension in Mercury BLAST. Intl. J. of Embed. Sys. To appear.
|
| |
16
|
Lavenier, D., Guyetant, S., Derrien, S., and Rubini, S. 2003. A reconfigurable parallel disk system for filtering genomic banks. In Proceedings of Engineering of Reconfigurable Systems and Algorithms (ERSA). 154--166.
|
| |
17
|
|
| |
18
|
Margulies, M., Egholm, M., Altman, W. E., Attiya, S., Bader, J. S., et al. 2005. Genome sequencing in microfabricated high-density picoliter reactors. Nature 437, 326--7.
|
| |
19
|
McGinnis, S. and Madden, T. L. 2004. BLAST: At the core of a powerful and diverse set of sequence analysis tools. Nuc. Acids Res. 32, 20--5.
|
| |
20
|
|
| |
21
|
Portugaly, E. and Ninio, M. 2004. HMMERHEAD - accelerating HMM searches on large databases. In Proceedings of the International Conference on Research in Molecular Biology (RECOMB). 250--251.
|
| |
22
|
Rangwala, H., Lantz, E., Musselman, R., Pinnow, K., Smith, B., and Wallenfelt, B. 2005. Massively parallel BLAST for the Blue Gene/L. In High Availability and Performance Computing Workshop.
|
| |
23
|
Schaffer, A. A., Wolf, Y. I., Ponging, C. P., Koonin, E. V., Aravind, L., and Altschul, S. F. 1999. IMPALA: Matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics 15, 1000--11.
|
| |
24
|
Smith, T. F. and Waterman, M. S. 1981. Identification of common molecular subsequences. J. Molec. Biol. 147, 195--197.
|
| |
25
|
Sotiriades, E., Dollas, A., and Kozanitis, C. 2006. Some initial results on hardware BLAST acceleration with a reconfigurable architecture. In Proceedings of the 5th IEEE International Workshop on High Performance Computational Biology (HiCOMB).
|
| |
26
|
Swiss Institute of Bioinformatics. 2006. Growth of Swiss-Prot. http://www.expasy.org/sprot/ relnotes/#SPstat.
|
| |
27
|
Wang, T. and Stormo, G. D. 2005. Identifying the conserved network of cis-regulatory sites of a eukaryotic genome. Proc. Natl. Acad. Sci. 102, 17400--5.
|
| |
28
|
Yamaguchi, Y., Maruyama, T., and Konagaya, A. 2002. High speed homology search with FPGAs. In Proceedings of the Pacific Symposium on Biocomputing. 271--282.
|
|