|
ABSTRACT
The performance of a typical chemical transport model is determined on two multicore processors: the heterogeneous Cell Broadband Engine and the homogeneous Intel Quad-Core Xeon shared-memory multiprocessor. Two problem decomposition techniques are discussed: dimension splitting for promoting parallelization in chemical transport models, and time splitting, for reducing truncation error. Additionally, a scalable method for accessing random rows or columns of a matrix of arbitrary size from the accelerator units of the Cell Broadband Engine is presented. This scalable access method increases chemical transport model efficiency by an average of 30% and significantly improves the scalability of dimension-splitting techniques on the Cell Broadband Engine. Experiments show that chemical transport models are 31% more efficient on the Cell Broadband Engine when only six accelerator units are used than on a shared-memory multiprocessor with eight executing cores. Our fully-optimized models achieve an average 118% speedup on the Cell Broadband Engine, and an average 87.5% speedup on a shared-memory multiprocessor with OpenMP.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
S. R. Alam and P. K. Agarwal. On the path to enable multi-scale biomolecular simulations on petaFLOPS supercomputer with multi-core processors. In IEEE International Parallel and Distributed Processing Symposium, 2007. IPDPS 2007., pages 1--8, Long Beach, CA, March 26--30, 2007.
|
| |
2
|
C. Benthin, I. Wald, M. Scherbaum, and H. Friedrich. Ray tracing on the cell processor. In IEEE Symposium on Interactive Ray Tracing 2006, pages 15--23, Salt Lake City, UT, September 2006.
|
 |
3
|
Filip Blagojevic , Dimitris S. Nikolopoulos , Alexandros Stamatakis , Christos D. Antonopoulos, Dynamic multigrain parallelization on the cell broadband engine, Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, March 14-17, 2007, San Jose, California, USA
[doi> 10.1145/1229428.1229445]
|
| |
4
|
Long Chen, Ziang Hu, Junmin Lin, and G. R. Gao. Optimizing the fast fourier transform on a multi-core architecture. In IEEE International Parallel and Distributed Processing Symposium. IPDPS 2007., pages 1--8, Long Beach, CA, March 26--30, 2007.
|
| |
5
|
|
| |
6
|
|
| |
7
|
B. Flachs et al. The microarchitecture of the streaming processor for a cell processor. In Proceedings of the IEEE International Solid-State Circuits Symposium, pages 184--185, February 2005.
|
| |
8
|
|
| |
9
|
W. Hundsdorfer. Numerical solution of advection-diffusion-reaction equations. Technical report, Centrum voor Wiskunde en Informatica, 1996.
|
| |
10
|
IBM, http://www-306.ibm.com/chips/techlib. PowerPC Microprocessor Family: Vector/SIMD Multimedia Extension Technology Programming Environments Manual.
|
| |
11
|
|
| |
12
|
J. Ray, C. A. Kennedy, S. Lefantzi, and H. N. Najm. High-order spatial discretizations and extended stability methods for reacting flows on structured adaptively refined meshes. In Proceedings of Third Joint Meeting of the U.S. Sections of the Combustion Institute Chicago, USA, March 2003.
|
| |
13
|
|
CITED BY 2
|
|
|
Scott Schneider , Jae-Seung Yeom , Benjamin Rose , John C. Linford , Adrian Sandu , Dimitrios S. Nikolopoulos, A comparison of programming models for multiprocessors with explicitly managed memory hierarchies, Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, February 14-18, 2009, Raleigh, NC, USA
|
|