ACM Home Page
Please provide us with feedback. Feedback
Data and program restructuring of irregular applications for cache-coherent multiprocessor
Full text PdfPdf (1.33 MB)
Source International Conference on Supercomputing archive
Proceedings of the 8th international conference on Supercomputing table of contents
Manchester, England
Pages: 214 - 225  
Year of Publication: 1994
ISBN:0-89791-665-4
Authors
Karen A. Tomko  Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI
Santosh G. Abraham  Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 11,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/181181.181351
What is a DOI?

ABSTRACT

Applications with irregular data structures such as sparse matrices or finite element meshes account for a large fraction of engineering and scientific applications. Domain decomposition techniques are commonly used to partition these applications to reduce interprocessor communication on message passing parallel systems. Our work investigates the use of domain decomposition techniques on cache-coherent parallel systems. Many good domain decomposition algorithms are now available. We show that further application improvements are attainable using data and program restructuring in conjunction with domain decomposition. We give techniques for data layout to reduce communication, blocking with subdomains to improve uniprocessor cache behavior, and insertion of prefetches to hide the latency of interprocessor communication. This paper details our restructuring techniques and provides experimental results on the KSR1 multiprocessor for a sparse matrix application. The experimental results include counts of cache misses provided by the KSR PMON performance monitoring tool. Our data show that cache coherency traffic can be reduced by 30%–60% using our data layout scheme and that more than 53% of the remaining coherency cache misses can be eliminated using prefetch instructions.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Anant Agarwal, David Kranz, and Venkat Natarajan. Automatic partitioning of parallel loops for cachecoherent multiprocessors. In Proceedings of the international Conference on Parallel Processing volume 1, pages 2-11, 1993.
 
2
Stephen Barnard and Horst Simon. A fast multilevel implementation of recursive spectral bisections for partitioning unstructured problems. Technical Report RNR-92-33, NASA Ames Research Center, NAS Systems Division, April 1993.
 
3
4
 
5
Peter Brezany Michael Gerndt, Viera Sipkova, and Hans Zima. SUPERB support for irregular scientific computations. In Proceedings of the Scalable High Performance Computing Conference, pages 314-321, 1992.
6
 
7
A. Chatterjee, J.M. Jin, and J.L. Volakis. Application of edge-based finite elements and ABCs to 3-D scattering. IEEE Transactions on Antennas and Propagation, 1992. To be published.
 
8
Charbel Farhat and Michel Lesoinne. Automatic partitioning of unstructured meshes for the parallel solution of problems in computational mechanics. International Journal for Numerical Methods in Engineering 36:745- 764, 1993.
 
9
10
 
11
Reinhard v. Hanxleden and Ken Kennedy Relaxing SIMD control flow constraints using loop transformations. Technical Report Rice COMP TR92-181, Rice University Department of Computer Science, 1992.
 
12
 
13
Kendall Square Research Corporation, 170 Tracer Lane, Waltham, MA, 02154-1379. KSR1 Principles of Operation, 1991.
 
14
15
 
16
Ravi Ponnusamy Joel Saltz, Raja Das, Charles Koelbel, and Alok Choudhary A runtime data mapping scheme for irregular problems. In Proceedings of the Scalable High Performance Computing Conference, pages 216-219, 1992.
 
17
Horst Simon. Partitioning of unstructured problems for parallel processing. Computing Systems in Engineering, 2(2/3):135-148, 1991.
 
18
 
19
Daniel Windheiser, Eric Boyd, Eric Hao, Santosh G. Abraham, and Edward Davidson. KSR1 multiprocessor: Analysis of latency hiding techniques in a sparse solver. In Proceedings of the International Parallel Processing Symposium, pages 454-461, 1993.
20


Collaborative Colleagues:
Karen A. Tomko: colleagues
Santosh G. Abraham: colleagues