|
ABSTRACT
Applications with irregular data structures such as sparse matrices or finite element meshes account for a large fraction of engineering and scientific applications. Domain decomposition techniques are commonly used to partition these applications to reduce interprocessor communication on message passing parallel systems. Our work investigates the use of domain decomposition techniques on cache-coherent parallel systems.
Many good domain decomposition algorithms are now available. We show that further application improvements are attainable using data and program restructuring in conjunction with domain decomposition. We give techniques for data layout to reduce communication, blocking with subdomains to improve uniprocessor cache behavior, and insertion of prefetches to hide the latency of interprocessor communication.
This paper details our restructuring techniques and provides experimental results on the KSR1 multiprocessor for a sparse matrix application. The experimental results include counts of cache misses provided by the KSR PMON performance monitoring tool. Our data show that cache coherency traffic can be reduced by 30%–60% using our data layout scheme and that more than 53% of the remaining coherency cache misses can be eliminated using prefetch instructions.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Anant Agarwal, David Kranz, and Venkat Natarajan. Automatic partitioning of parallel loops for cachecoherent multiprocessors. In Proceedings of the international Conference on Parallel Processing volume 1, pages 2-11, 1993.
|
| |
2
|
Stephen Barnard and Horst Simon. A fast multilevel implementation of recursive spectral bisections for partitioning unstructured problems. Technical Report RNR-92-33, NASA Ames Research Center, NAS Systems Division, April 1993.
|
| |
3
|
|
 |
4
|
Eric L. Boyd , John-David Wellman , Santosh G. Abraham , Edward S. Davidson, Evaluating the communication performance of MPPs using synthetic sparse matrix multiplication workloads, Proceedings of the 7th international conference on Supercomputing, p.240-250, July 19-23, 1993, Tokyo, Japan
[doi> 10.1145/165939.165974]
|
| |
5
|
Peter Brezany Michael Gerndt, Viera Sipkova, and Hans Zima. SUPERB support for irregular scientific computations. In Proceedings of the Scalable High Performance Computing Conference, pages 314-321, 1992.
|
 |
6
|
David Callahan , Ken Kennedy , Allan Porterfield, Software prefetching, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.40-52, April 08-11, 1991, Santa Clara, California, United States
|
| |
7
|
A. Chatterjee, J.M. Jin, and J.L. Volakis. Application of edge-based finite elements and ABCs to 3-D scattering. IEEE Transactions on Antennas and Propagation, 1992. To be published.
|
| |
8
|
Charbel Farhat and Michel Lesoinne. Automatic partitioning of unstructured meshes for the parallel solution of problems in computational mechanics. International Journal for Numerical Methods in Engineering 36:745- 764, 1993.
|
| |
9
|
|
 |
10
|
|
| |
11
|
Reinhard v. Hanxleden and Ken Kennedy Relaxing SIMD control flow constraints using loop transformations. Technical Report Rice COMP TR92-181, Rice University Department of Computer Science, 1992.
|
| |
12
|
|
| |
13
|
Kendall Square Research Corporation, 170 Tracer Lane, Waltham, MA, 02154-1379. KSR1 Principles of Operation, 1991.
|
| |
14
|
|
 |
15
|
R. Mirchandaney , J. H. Saltz , R. M. Smith , D. M. Nico , K. Crowley, Principles of runtime support for parallel processors, Proceedings of the 2nd international conference on Supercomputing, p.140-152, June 1988, St. Malo, France
[doi> 10.1145/55364.55378]
|
| |
16
|
Ravi Ponnusamy Joel Saltz, Raja Das, Charles Koelbel, and Alok Choudhary A runtime data mapping scheme for irregular problems. In Proceedings of the Scalable High Performance Computing Conference, pages 216-219, 1992.
|
| |
17
|
Horst Simon. Partitioning of unstructured problems for parallel processing. Computing Systems in Engineering, 2(2/3):135-148, 1991.
|
| |
18
|
|
| |
19
|
Daniel Windheiser, Eric Boyd, Eric Hao, Santosh G. Abraham, and Edward Davidson. KSR1 multiprocessor: Analysis of latency hiding techniques in a sparse solver. In Proceedings of the International Parallel Processing Symposium, pages 454-461, 1993.
|
 |
20
|
|
CITED BY 2
|
|
|
Shubhendu S. Mukherjee , Shamik D. Sharma , Mark D. Hill , James R. Larus , Anne Rogers , Joel Saltz, Efficient support for irregular applications on distributed-memory machines, ACM SIGPLAN Notices, v.30 n.8, p.68-79, Aug. 1995
|
|