| Integrating bioinformatics, distributed data management, and distributed computing for applied training in high performance computing |
| Full text |
Pdf
(188 KB)
|
Source
|
Conference On Information Technology Education (formerly CITC)
archive
Proceedings of the 8th ACM SIGITE conference on Information technology education
table of contents
Destin, Florida, USA
SESSION: High performance computing in IT education
table of contents
Pages 33-36
Year of Publication: 2007
ISBN:978-1-59593-920-3
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 12, Downloads (12 Months): 83, Citation Count: 1
|
|
|
ABSTRACT
The utilization of multi-core and multi-node parallel high performance computing (HPC) systems is growing rapidly to meet computational demands in the scientific computing arena. For example, the exponential growth of genomic data has outpaced increases in single CPU clock speeds by 15-fold over the last 20 years, placing great value on the use of parallel processing systems in bioinformatics. Fortunately, increased demand for multi-node architectures has resulted in decreased costs for distributed computing components making these architectures more affordable to organizations and institutions. As the demand for HPC computer architectures grows, so does the demand for professionals skilled in the implementation, utilization and administration of these systems. With the goal of training undergraduate and graduate students to meet this demand, a model HPC training module has been developed and implemented that integrates bioinformatics, distributed data management and distributed computing. In this HPC training module bioinformatics provides exposure to applied scientific computing as well as the rationale for multi-processor computing to overcome large computational problems. In addition, the parallelization of computing is explored from the classic divide-and-conquer approach, as well as the distributed data management perspective, which places emphasis on the network bandwidth and disk paging as detractors to HPC performance. Students participate in the HPC module through hands-on interactions with three different HPC cluster types: (1) Beowulf, (2) blade servers, and (3) multi-processor shared memory systems. The results of this training module include exploratory student projects to determine mathematical relationships between HPC performance and (1) processing nodes, (2) cluster type, (3) database size and segmentation methods, (4) bioinformatics application type, (5) RAM per node, and (6) network bandwidth. The outcome of this training module is hands-on training in HPC across multiple cluster types, and across multiple computer and information technology perspectives.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
National Center for Biotechnology Information (NCBI) News Volume 14 (2): November, 2005.http://www.ncbi.nlm.nih.gov/Web/Newsltr/V14N2/index.html
|
| |
3
|
Report to the President. Computational Science: Ensuring America's Competitiveness. The President's Information Technology Advisory Committee. June 2005. http://www.nitrd.gov/pitac.
|
| |
4
|
Friedman, C. P. et. al. Training the Next Generation of Informaticians: The Impact of "BISTI" and Bioinformatics--A Report from the American College of Medical Informatics. J Am Med Inform Assoc. 11(3): 167--172. 2004.
|
| |
5
|
Heintz, R. Biotech's Winning Formula for Steady Job Growth. California Job Journal. April 8 (2004).
|
| |
6
|
Life Sciences: A 21st Century Economic Driver for Central Indiana. A report for the Central Indiana Life Sciences Initiative prepared by the Technology Partnership Practice and Battelle Memorial Institute, February, 2002.
|
| |
7
|
|
| |
8
|
|
 |
9
|
|
 |
10
|
|
 |
11
|
|
| |
12
|
|
|