ACM Home Page
Please provide us with feedback. Feedback
An architecture for optimal all-to-all personalized communication
Full text PdfPdf (1.16 MB)
Source ACM Symposium on Parallel Algorithms and Architectures archive
Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures table of contents
Cape May, New Jersey, United States
Pages: 310 - 319  
Year of Publication: 1994
ISBN:0-89791-671-9
Authors
Susan Hinrichs  School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
Corey Kosak  School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
David R. O'Hallaron  School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
Thomas M. Stricker  School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
Riichiro Take  Fujitsu Laboratories Ltd., 1015 Kamikodanaka, Nakahara-ku, Kawasaki, 211, Japan and School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGARCH: ACM Special Interest Group on Computer Architecture
European Comp Soc : European Computer Society
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 30,   Citation Count: 20
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/181014.181427
What is a DOI?

ABSTRACT

In all-to-all personalized communication (AAPC), every node of a parallel system sends a potentially unique packet to every other node. AAPC is an important primitive operation for modern parallel compilers, since it is used to redistribute data structures during parallel computations. As an extremely dense communication pattern, AAPC causes congestion in many types of networks and therefore executes very poorly on general purpose, asynchronous message passsing routers. We present and evaluate a network architecture that executes all-to-all communication optimally on a two-dimensional torus. The router combines optimal partitions of the AAPC step with a self-synchronizing switching mechanism integrated into a conventional wormhole router. Optimality is achieved by routing along shortest paths while fully utilizing all links. A simple hardware addition for synchronized message switching can guarantee optimal AAPC routing in many existing network architectures. The flexible communication agent of the iWarp VLSI component allowed us to implement an efficient prototype for the evaluation of the hardware complexity as well as possible software overheads. The measured performance on an 8 × 8 torus exceeded 2 GigaBytes/sec or 80% of the limit set by the raw speed of the interconnects. We make a quantitative comparison of the AAPC router with a conventional message passing system. The potential gain of such a router for larger parallel programs is illustrated with the example of a two-dimensional Fast Fourier Transform.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
ADAMS, D. Cray T3D System Architecture Overview. Tech. rep., Cray Research Inc., September 1993. Revision 1.C.
2
3
 
4
BOKHARI, S. Multiphase complete exchange on a circuit switched hypercube. Tech. Rep. 91-5, ICASE, January 1991.
 
5
BOKHARI, S. H., AND BERRYMAN, H. Complete Exchange on a Circuit Switched Mesh. In Proc. Scalable High Performance Computing Conference (Williamsburg, VA, April 1992), pp. 300--306.
 
6
7
8
9
 
10
HIGH PERFORMANCE FORTRAN FORUM. High Performance Fortran Language Specification Version 1.0., May 1993.
 
11
HINRICHS, S. Compiler Resource Management for Connection-Based Communication. Internal document, 1994.
 
12
 
13
HORIE, T., AND HAYASHI, K. All-to-All Personalized Communication on a Wrap-around Mesh. In Proceedings of CAP Workshop (Canberra, Austrailia, November 1991 ).
 
14
 
15
INTEL CORP. Paragon X/PS Product Overview, March 1991.
 
16
17
 
18
THE MESSAGE PASSING INTERFACE FORUM. Draft Document for a Standard Message Passing Interface, November 1993.
 
19
SCOTT, D. S. Efficient AII-to-AH Communication Patterns in Hypercube and Mesh Topologies. In The Sixth Distributed Memory Computing Conference Proceedings (1991 ), pp. 398- 403.
20
 
21
 
22
STRICKER, T. Message Routing on Irregular 2D-Meshes and Tori. In Proceedings of the 6th Distributed Memory Computing Conference (Portland, OR, Apr. 1991 ), pp, 170-177. Also appeared as Technical Report CMU-CS-91-109, Carnegie Mellon School of Computer Science.
 
23
STmCKER, T., STICHNOTH, J., O'HALLARON, D., HtNmCttS, S., AND GROSS, T. Decoupling Communication Services of Compiled Parallel Programs. Tech. Rep. CMU-CS-94-139, Carnegie Mellon University, School of Computer Science, 1994.
24
 
25
 
26
TAKE, R. A Routing Method for All-to-all Burst on Hypercube Networks. In Proc. 35th National Conference of Information Processing SocietyofJapan(1987), pp. 151-152. In Japanese.
 
27
 
28
VARVARIGOS, E. A., AND BERTSEKAS, D. P. Communication algorithms for isotropic tasks in hypercubes and wraparound meshes. Parallel Computing 18 (1992), 1233-1257.

CITED BY  20

Collaborative Colleagues:
Susan Hinrichs: colleagues
Corey Kosak: colleagues
David R. O'Hallaron: colleagues
Thomas M. Stricker: colleagues
Riichiro Take: colleagues