| Multigrid and Gauss-Seidel smoothers revisited: parallelization on chip multiprocessors |
| Full text |
Pdf
(435 KB)
|
| Source
|
International Conference on Supercomputing
archive
Proceedings of the 20th annual international conference on Supercomputing
table of contents
Cairns, Queensland, Australia
SESSION: High performance computing--supercomputing
table of contents
Pages: 145 - 155
Year of Publication: 2006
ISBN:1-59593-282-8
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 8, Downloads (12 Months): 69, Citation Count: 0
|
|
|
ABSTRACT
Efficient solution of partial differential equations require a match between the algorithm and the target architecture. Many recent chip multiprocessors, CMPs (a.k.a. multi-core), feature low intra-thread communication costs and smaller per-thread caches compared to previous shared memory multi-processor systems. From an algorithmic point of view this means that data locality issues become more important than communication overheads. A fact that may require a re-evaluation of many existing algorithms.We have investigated parallel implementations of multi-grid methods using a parallel temporally blocked, naturally ordered smoother. Compared to the standard multigrid solution based on a red-black ordering, we improve the data locality often as much as ten times, while our use of a fine-grained locking scheme keeps the parallel efficiency high.Our algorithm was initially inspired by CMPs and it was surprising to see that our OpenMP multigrid implementation ran up to 40 percent faster than the standard red-black algorithm on a contemporary 8-way SMP system. Thanks to the temporal blocking introduced, our smoother implementation often allowed us to apply the smoother two times at the same cost as a single application of a red-black smoother. By executing our smoother on a 32-thread UltraSPARC T1 (Niagara) SMT/CMP and a simulated 32-way CMP we demonstrate that such architectures can tolerate the increased communication costs implied by the tradeoffs made in our implementation.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
L. M. Adams and J. M. Ortega. A Multi-Color SOR Method for Parallel Computation. In Proceedings of the International Conference on Parallel Processing, pages 53--58, 1982.
|
 |
2
|
Luiz André Barroso , Kourosh Gharachorloo , Robert McNamara , Andreas Nowatzyk , Shaz Qadeer , Barton Sano , Scott Smith , Robert Stets , Ben Verghese, Piranha: a scalable architecture based on single-chip multiprocessing, Proceedings of the 27th annual international symposium on Computer architecture, p.282-293, June 2000, Vancouver, British Columbia, Canada
|
| |
3
|
|
| |
4
|
E. Chow, R. Falgout, J. Hu, R. Tuminaro, and U. Yang. A Survey of Parallelization Techniques for Multigrid Solvers. Technical Report UCRL-BOOK-205864, LLNL, 2004.
|
| |
5
|
Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm , Dean M. Tullsen, Simultaneous Multithreading: A Platform for Next-Generation Processors, IEEE Micro, v.17 n.5, p.12-19, September 1997
[doi> 10.1109/40.621209]
|
| |
6
|
|
| |
7
|
|
| |
8
|
K. Krewell. Power5 Tops on Bandwidth. Microprocessor Report, December 22, 2003.
|
| |
9
|
K. Krewell. Double Your Opterons; Double Your Fun. Microprocessor Report, October 11, 2004.
|
| |
10
|
K. Krewell. Sparc Turns 90 nm. Microprocessor Report, October 25, 2004.
|
| |
11
|
Peter S. Magnusson , Magnus Christensson , Jesper Eskilson , Daniel Forsgren , Gustav Hållberg , Johan Högberg , Fredrik Larsson , Andreas Moestedt , Bengt Werner, Simics: A Full System Simulation Platform, Computer, v.35 n.2, p.50-58, February 2002
[doi> 10.1109/2.982916]
|
| |
12
|
J. McGregor. Ringside for 2006 Dual-Core Fights. Microprocessor Report, December 19, 2005.
|
| |
13
|
N. R. Patel and H. F. Jordan. A Parallel Point Rowwise Successive Over-Relaxation Method on a Multiprocessor. Parallel Computing, 1:207--222, 1984.
|
| |
14
|
|
| |
15
|
D. Wallin, H. Zeffer, M. Karlsson, and E. Hagersten. Vasa: A simulator infrastructure with adjustable fidelity. In Proceedings of the 17th IASTED International Conference on Parallel and Distributed Computing and Systems, 2005.
|
 |
16
|
Christian Weiß , Wolfgang Karl , Markus Kowarschik , Ulrich Rüde, Memory characteristics of iterative methods, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p.31-es, November 14-19, 1999, Portland, Oregon, United States
[doi> 10.1145/331532.331563]
|
| |
17
|
P. Wesseling. An Introduction to Multigrid Methods. John Wiley and Sons Ltd., Chichester, 1992.
|
| |
18
|
D. M. Young. Iterative Solution of Large Linear Systems. Academic Press, New York, 1971.
|
|