|
ABSTRACT
The exponential growth of databases that contains biological information (such as protein and DNA data) demands great efforts to improve the performance of computational platforms. In this work we investigate how bioinformatics applications benefit from parallel architectures that combine different alternatives to exploit coarse- and fine-grain parallelism. As a case of analysis we study the performance behavior of the Ssearch application that implements the Smith-Waterman algorithm, which is a dynamic programing approach that explores the similarity between a pair of sequences. The inherent large parallelism of the algorithm makes it ideal for architectures supporting multiple dimensions of parallelism (TLP, DLP and ILP). We study how this algorithm can take advantage of different parallel machines like the SGI Altix, IBM Power6, Cell BE machines and MareNostrum. Our results show that a share memory architecture like the PowerPC 970MP of Marenostrum can surpass a heterogeneous machine like the current Cell BE. Our quantitative analysis includes not only a study of scalability of the performance in terms of speedup, but also includes the analysis of bottlenecks in the execution of the application. This analysis is carried out through the study of the execution phases that the application presents.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Dna data bank of japan. http://www.ddbj.nig.ac.jp/.
|
| |
2
|
Embl database. http://www.ebi.ac.uk/embl/.
|
| |
3
|
Marenostrum architecture. http://www.bsc.es/plantillaA.php?cat\_id=5.
|
| |
4
|
Swissprot protein database. http://www.expasy.org/sprot/.
|
| |
5
|
Bioinformatics market study for washington technology center, June 2003. www.altabiomedical.com.
|
| |
6
|
S. F. Altschul, T.L, A. Schffer, J. Zhang, Z. Zhang, W. Miller, and D. Lipman. Gapped blast and psi-blast: a new generation of protein database serach programs. Nucleic acids research, 25:3389--3402, 1997.
|
| |
7
|
|
| |
8
|
A. Deshpande, D. Richards, and W. Pearson. A platform for biological sequence comparison on parallel computers. Comput. Appl. Biosci., 1991.
|
| |
9
|
J. Henikoff, S. Henikoff and S. Pietrokovski. Blocks: a non-redundant database of protein alignment blocks derived from multiple compilations. Bioinformatics, 15, 1999.
|
| |
10
|
|
| |
11
|
Sebastian Isaza , Friman Sánchez , Georgi Gaydadjiev , Alex Ramirez , Mateo Valero, Preliminary Analysis of the Cell BE Processor Limitations for Sequence Alignment Applications, Proceedings of the 8th international workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation, July 21-24, 2008, Samos, Greece
[doi> 10.1007/978-3-540-70550-5_7]
|
| |
12
|
J. A. Kahle , M. N. Day , H. P. Hofstee , C. R. Johns , T. R. Maeurer , D. Shippy, Introduction to the cell multiprocessor, IBM Journal of Research and Development, v.49 n.4/5, p.589-604, July 2005
|
| |
13
|
H. Q. Le , W. J. Starke , J. S. Fields , F. P. O'Connell , D. Q. Nguyen , B. J. Ronchetti , W. M. Sauer , E. M. Schwarz , M. T. Vaden, IBM POWER6 microarchitecture, IBM Journal of Research and Development, v.51 n.6, p.639-662, November 2007
|
| |
14
|
W. Liu and B. Schmidt. Parallel design for computational biology and scientific computing applications. IEEE International Conference on Cluster Computing (CLUSTER 03), 2003.
|
| |
15
|
S. A. Manavski and G. Valle. Cuda compatible gpu cards as efficient hardwarer accelerator for smith-waterman sequence alignment. BMC Bioinformatics, 9, 2008.
|
| |
16
|
P. L. Miller, P. M. Nadkarni, and W. R. Pearson. Comparing machine-independent versus machine-specific parallelization of a software platform for biological sequence comparison. Comput. Appl. Biosci., 1992.
|
| |
17
|
W. R. Pearson. Searching protein sequence libraries: comparison of the sensitivity and selectivity of the smith-waterman and FASTA algorithms. Genomics, 1991.
|
| |
18
|
V. Pillet, J. Labarta, T. Cortes, and S. Girona. PARAVER: A tool to visualise and analyze parallel code. Proceeding of WoTUG-18: Transputer and occam Developments, 1995.
|
| |
19
|
T. Rognes. Rapid and sensitive methods for protein sequence comparison and database searching. Phd Thesis, Institue of Medical Microbiology. University of Oslo, 2000.
|
 |
20
|
Shane Ryoo , Christopher I. Rodrigues , Sara S. Baghsorkhi , Sam S. Stone , David B. Kirk , Wen-mei W. Hwu, Optimization principles and application performance evaluation of a multithreaded GPU using CUDA, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, February 20-23, 2008, Salt Lake City, UT, USA
[doi> 10.1145/1345206.1345220]
|
| |
21
|
V. Sachdeva, M. Kistler, E. Speight, Tzeng, and T.H.K. Exploring the viability of the cell broadband engine for bioinformatics applications. Proc. of the 6th Workshop on High Performance Computational Biology, 2007.
|
| |
22
|
F. Sanchez, E. Salami, A. Ramirez, and M. Valero. Performance analysis of sequence alignment applications. Proceedings of the IEEE International Symposium on Workload Characterization. IISWC 2006., 2006.
|
| |
23
|
E. Shpaer, M. Robinson, D. Yee, J. Candlin, R. Mines, and T. Hunkapiller. Sensitivity and selectivity in protein similarity searches: A comparison of smith-waterman in hardware to blast and fasta. Genomics, 38:179--191, 1996.
|
| |
24
|
T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. Journal of Mol. Biology, 1981.
|
 |
25
|
|
|