ACM Home Page
Please provide us with feedback. Feedback
Massively parallel genomic sequence search on the Blue Gene/P architecture
Full text PdfPdf (232 KB)
Source Conference on High Performance Networking and Computing archive
Proceedings of the 2008 ACM/IEEE conference on Supercomputing - Volume 00 table of contents
Austin, Texas
SECTION: Papers table of contents
Article No. 33  
Year of Publication: 2008
ISBN:978-1-4244-2835-9
Authors
Heshan Lin  North Carolina State University
Pavan Balaji  Argonne National Laboratory
Ruth Poole  IBM
Carlos Sosa  University of Minnesota, Minneapolis, MN
Xiaosong Ma  North Carolina State University
Wu-chun Feng  Virginia Tech
Publisher
IEEE Press  Piscataway, NJ, USA
Bibliometrics
Downloads (6 Weeks): 33,   Downloads (12 Months): 198,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

This paper presents our first experiences in mapping and optimizing genomic sequence search onto the massively parallel IBM Blue Gene/P (BG/P) platform. Specifically, we performed our work on mpiBLAST, a parallel sequence-search code that has been optimized on numerous supercomputing environments. In doing so, we identify several critical performance issues. Consequently, we propose and study different approaches for mapping sequence-search and parallel I/O tasks on such massively parallel architectures. We demonstrate that our optimizations can deliver nearly linear scaling (93% efficiency) on up to 32,768 cores of BG/P. In addition, we show that such scalability enables us to complete a large-scale bioinformatics problem --- sequence searching a microbial genome database against itself to support the discovery of missing genes in genomes --- in only a few hours on BG/P. Previously, this problem was viewed as computationally intractable in practice.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
P. Balaji, W. Feng, H. Lin, J. Archuleta, S. Matsuoka, A. Warren, J. Setubal, E. Lusk, R. Thakur, I. Foster, D. S. Katz, S. Jha, K. Shinpaugh, S. Coghlan, and D. Reed. Distributed Data I/O with ParaMEDIC: Experiences with a Worldwide Supercomputer. In Proceedings of the IEEE International Supercomputing Conference (ISC): Best paper award, Dresden, Germany, June 2008.
 
2
 
3
 
4
N. Camp, H. Cofer, and R. Gomperts. High-throughput BLAST. http://www.sgi.com/industries/sciences/chembio/resources/papers/HTBlast/HT_Whitepaper.html.
 
5
E. Chi, E. Shoop, J. Carlis, E. Retzel, and J. Riedl. Efficiency of shared-memory multiprocessors for a genetic sequence similarity search algorithm. Technical Report TR97--005, University of Minnesota, Computer Science Department, 1997.
6
 
7
A. Darling, L. Carey, and W. Feng. The design, implementation, and evaluation of mpiBLAST. In Proceedings of the ClusterWorld Conference and Expo, in conjunction with the 4th International Conference on Linux Clusters: The HPC Revolution, 2003.
8
 
9
 
10
M. Marra, S. Jones, C. Astell, R. Holt, A. Brooks-Wilson, Y. Butterfield, J. Khattra, J. Asano, S. Barber, S. Chan, A. Cloutier, S. Coughlin, D. Freeman, N. Gim, O. Griffith, S. Leach, M. Mayo, H. McDonald, S. Montgomery, P. Pandoh, A. Petrescu, G. Robertson, J. Schein, A. Siddiqui, D. Smailus, J. Stott, G. Yang, F. Plummer, A. Andonov, H. Artsob, N. Bastien, K. Bernard, T. Booth, D. Bowness, M. Drebot, L. Fernando, R. Flick, M. Garbutt, M. Gray, A. Grolla, S. Jones, H. Feldmann, A. Meyers, A. Kabani, Y. Li, S. Normand, U. Stroher, G. Tipples, S. Tyler, R. Vogrig, D. Ward, B. Watson, R. Brunham, M. Krajden, M. Petric, D. Skowronski, C. Upton, and R. Roper. The genome sequence of the sars-associated coronavirus. Science, 2003.
 
11
D. Mathog. Parallel BLAST on split databases. Bioirformatics, 19(14), 2003.
 
12
 
13
Message Passing Interface Forum. MPI-2: Extensions to the Message-Passing Standard, July 1997.
 
14
 
15
 
16
 
17
D. Quintero and M. Hennecke. GPFS Multicluster with the IBM System Blue Gene Solution and eHPS Clusters. IBM Redpaper, REDP-4168-00, October 24, 2006, http://www.redbooks.ibm.coin/abstracts/redp4168.html?Open.
 
18
H. Rangwala, E. Lantz, R. Musselman, K. Pinnow, B. Smith, and B. Wallenfelt. Massively Parallel BLAST for the Blue Gene/L. In High Availability and Performance Workshop, 2005.
 
19
 
20
 
21
 
22
C. Sosa and G. Lakner. IBM System Blue Gene Solution: Blue Gene/P Application Development. IBM Red-Book, SG24--7287, ISBN 0738488674, Rochester, Minnesoat, 2008. http://www.redbooks.ibm.com/abstracts/sg247287.html?Open.
 
23
 
24
 
25
26
 
27
28

Collaborative Colleagues:
Heshan Lin: colleagues
Pavan Balaji: colleagues
Ruth Poole: colleagues
Carlos Sosa: colleagues
Xiaosong Ma: colleagues
Wu-chun Feng: colleagues