ACM Home Page
Please provide us with feedback. Feedback
Implementing Wilson-Dirac operator on the cell broadband engine
Full text PdfPdf (2.46 MB)
Source
International Conference on Supercomputing archive
Proceedings of the 22nd annual international conference on Supercomputing table of contents
Island of Kos, Greece
SESSION: Algorithms & applications 1 table of contents
Pages 4-14  
Year of Publication: 2008
ISBN:978-1-60558-158-3
Authors
Khaled Z. Ibrahim  IRISA/INRIA, Rennes, France
Francois Bodin  IRISA/INRIA, Rennes, France
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 124,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1375527.1375532
What is a DOI?

ABSTRACT

Computing the actions of Wilson-Dirac operator contributes most of the CPU time for the grand challenge problem of simulating Lattice Quantum Chromodynamics (Lattice QCD). This routine exhibits many challenges in implementation on most computational environments because of the multiple patterns of accessing the same data, making it difficult to align the data efficiently at compile time. Additionally, the low computation to memory access ratio makes this computation bounded by the memory bandwidth and the memory latency.

In this work, we present an implementation of this routine on the Cell Broadband Engine. We propose runtime data fusion, an approach that aims at re-aligning data at runtime, for data that cannot be aligned optimally at compile time, thus improving the performance of SIMDized execution.

We also show a DMA optimization technique that reduces the impact of bandwidth limits on performance. Our implementation for this routine achieves 31.2 GFlops for single precision computations and 8.75 GFlops for double precision computations.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
D. A. Bader and V. Agarwal. FFTC: Fastest Fourier Transform for the IBM Cell Broadband Engine. 14th IEEE International Conference on High Performance Computing (HiPC), pages 172--84, Dec. 2007.
 
2
F. Banterle and R. Giacobazzi. A Fast Implementation of Octagon Abstract Domain on Graphics Hardware. Proc. 14th International Static Analysis Symposium (SAS07), 2007.
 
3
R. G. Belleman, J. Bedorf, and S. P. Zwart. High Performance Direct Gravitational N--body Simulations on Graphics Processing Units -- II: An implementation in CUDA. Journal of New Astronomy, 2007.
 
4
F. Belletti, G. Bilardi, M. Drochner, N. Eicker, Z. Fodor, D. Hierl, H. Kaldass, T. Lippert, T. Maurer, N. Meyer, A. Nobile, D. Pleiter, A. Schaefer, F. Schifano, H. Simma, S. Solbrig, T. Streuer, R. Tripiccione, and T. Wettig. QCD on the Cell Broadband Engine, Oct 2007.
 
5
Cell SDK 3.0. http://www.ibm.com/developerworks/power/cell/index.html, Oct. 2007.
 
6
J. Doi. Performance Evaluation and Tuning of Lattice QCD on the Next Generation Blue Gene. Proceedings of Science, Oct 2007.
 
7
G. I. Egri, Z. Fodor, C. Hoelbling, S. D. Katz, D. Nogradi, and K. K. Szabo. Lattice QCD as a Video Game. arXiv:hep-lat/0611022v2, 2007.
 
8
K. Z. Ibrahim, F. Bodin, and O. Pene. Fine-grained Parallelization of Lattice QCD Kernel Routine on GPUs. First Workshop on General Purpose Processing on Graphics Processing Units, Northeastern Univ., Boston, Oct 2007.
 
9
A. Kolb and N. Cuntz. Dynamic Particle Coupling for GPU-based Fluid Simulation. Proc. 18th Symposium on Simulation Technique, pages 722--727, 2005.
 
10
S. Motoki and A. Nakamura. Development of QCD Code on a Cell Machine. Proceeding of Science, Oct. 2007.
 
11
NVIDIA Cuda 1.1. http://developer.nvidia.com/object/cuda.html, 2007.
 
12
J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, and T. J. Purcell. A Survey of General-Purpose Computation on Graphics Hardware. Eurographics 2005, State of the Art Reports, pages 21--51, Aug 2005.
 
13
J. E. Stone, J. C. Phillips, P. L. Freddolino, D. J. Hardy, L. G. Trabuco, and K. Schulten. Accelerating Molecular Modeling Applications with Graphics Processors. Journal of Computational Chemistry, 2007.
 
14
C. Urbach. Lattice QCD with Two Light Wilson Quarks and Maximal Twist. The XXV International Symposium on Lattice Field Theory, 2007.
 
15
C. Urbach, K. Jansen, A. Shindler, and U. Wenger. HMC Algorithm with Multiple Time Scale Integration and Mass Preconditioning. Computer Physics Communications, 174:87, 2006.
16
 
17


Collaborative Colleagues:
Khaled Z. Ibrahim: colleagues
Francois Bodin: colleagues