| An architecture for optimal all-to-all personalized communication |
| Full text |
Pdf
(1.16 MB)
|
| Source
|
ACM Symposium on Parallel Algorithms and Architectures
archive
Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
table of contents
Cape May, New Jersey, United States
Pages: 310 - 319
Year of Publication: 1994
ISBN:0-89791-671-9
|
|
Authors
|
|
Susan Hinrichs
|
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
|
|
Corey Kosak
|
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
|
|
David R. O'Hallaron
|
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
|
|
Thomas M. Stricker
|
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
|
|
Riichiro Take
|
Fujitsu Laboratories Ltd., 1015 Kamikodanaka, Nakahara-ku, Kawasaki, 211, Japan and School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 30, Citation Count: 20
|
|
|
ABSTRACT
In all-to-all personalized communication (AAPC), every node of a parallel system sends a potentially unique packet to every other node. AAPC is an important primitive operation for modern parallel compilers, since it is used to redistribute data structures during parallel computations. As an extremely dense communication pattern, AAPC causes congestion in many types of networks and therefore executes very poorly on general purpose, asynchronous message passsing routers.
We present and evaluate a network architecture that executes all-to-all communication optimally on a two-dimensional torus. The router combines optimal partitions of the AAPC step with a self-synchronizing switching mechanism integrated into a conventional wormhole router. Optimality is achieved by routing along shortest paths while fully utilizing all links. A simple hardware addition for synchronized message switching can guarantee optimal AAPC routing in many existing network architectures.
The flexible communication agent of the iWarp VLSI component allowed us to implement an efficient prototype for the evaluation of the hardware complexity as well as possible software overheads. The measured performance on an 8 × 8 torus exceeded 2 GigaBytes/sec or 80% of the limit set by the raw speed of the interconnects. We make a quantitative comparison of the AAPC router with a conventional message passing system. The potential gain of such a router for larger parallel programs is illustrated with the example of a two-dimensional Fast Fourier Transform.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
ADAMS, D. Cray T3D System Architecture Overview. Tech. rep., Cray Research Inc., September 1993. Revision 1.C.
|
 |
2
|
|
 |
3
|
Pablo E. Berman , Luis Gravano , Gustavo D. Pifarré , Jorge L. C. Sanz, Adaptive deadlock- and livelock-free routing with all minimal paths in Torus networks, Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures, p.3-12, June 29-July 01, 1992, San Diego, California, United States
[doi> 10.1145/140901.140902]
|
| |
4
|
BOKHARI, S. Multiphase complete exchange on a circuit switched hypercube. Tech. Rep. 91-5, ICASE, January 1991.
|
| |
5
|
BOKHARI, S. H., AND BERRYMAN, H. Complete Exchange on a Circuit Switched Mesh. In Proc. Scalable High Performance Computing Conference (Williamsburg, VA, April 1992), pp. 300--306.
|
| |
6
|
S. Borkar , R. Cohn , G. Cox , S. Gleason , T. Gross, Warp: an integrated solution of high-speed parallel computing, Proceedings of the 1988 ACM/IEEE conference on Supercomputing, p.330-339, November 12-17, 1988, Orlando, Florida, United States
|
 |
7
|
Shekhar Borkar , Robert Cohn , George Cox , Thomas Gross , H. T. Kung , Monica Lam , Margie Levine , Brian Moore , Wire Moore , Craig Peterson , Jim Susman , Jim Sutton , John Urbanski , Jon Webb, Supporting systolic and memory communication in iWarp, Proceedings of the 17th annual international symposium on Computer Architecture, p.70-81, May 28-31, 1990, Seattle, Washington, United States
|
 |
8
|
David Culler , Richard Karp , David Patterson , Abhijit Sahay , Klaus Erik Schauser , Eunice Santos , Ramesh Subramonian , Thorsten von Eicken, LogP: towards a realistic model of parallel computation, Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.1-12, May 19-22, 1993, San Diego, California, United States
|
 |
9
|
|
| |
10
|
HIGH PERFORMANCE FORTRAN FORUM. High Performance Fortran Language Specification Version 1.0., May 1993.
|
| |
11
|
HINRICHS, S. Compiler Resource Management for Connection-Based Communication. Internal document, 1994.
|
| |
12
|
|
| |
13
|
HORIE, T., AND HAYASHI, K. All-to-All Personalized Communication on a Wrap-around Mesh. In Proceedings of CAP Workshop (Canberra, Austrailia, November 1991 ).
|
| |
14
|
Takeshi Horie , Hiroaki Ishihata , Morio Ikesaka, Design and Implementation of an Interconnection Network for the AP1000, Proceedings of the IFIP 12th World Computer Congress on Algorithms, Software, Architecture - Information Processing '92, Volume 1, p.555-561, September 07-11, 1992
|
| |
15
|
INTEL CORP. Paragon X/PS Product Overview, March 1991.
|
| |
16
|
|
 |
17
|
Charles E. Leiserson , Zahi S. Abuhamdeh , David C. Douglas , Carl R. Feynman , Mahesh N. Ganmukhi , Jeffrey V. Hill , Daniel Hillis , Bradley C. Kuszmaul , Margaret A. St. Pierre , David S. Wells , Monica C. Wong , Shaw-Wen Yang , Robert Zak, The network architecture of the Connection Machine CM-5 (extended abstract), Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures, p.272-285, June 29-July 01, 1992, San Diego, California, United States
[doi> 10.1145/140901.141883]
|
| |
18
|
THE MESSAGE PASSING INTERFACE FORUM. Draft Document for a Standard Message Passing Interface, November 1993.
|
| |
19
|
SCOTT, D. S. Efficient AII-to-AH Communication Patterns in Hypercube and Mesh Topologies. In The Sixth Distributed Memory Computing Conference Proceedings (1991 ), pp. 398- 403.
|
 |
20
|
|
| |
21
|
|
| |
22
|
STRICKER, T. Message Routing on Irregular 2D-Meshes and Tori. In Proceedings of the 6th Distributed Memory Computing Conference (Portland, OR, Apr. 1991 ), pp, 170-177. Also appeared as Technical Report CMU-CS-91-109, Carnegie Mellon School of Computer Science.
|
| |
23
|
STmCKER, T., STICHNOTH, J., O'HALLARON, D., HtNmCttS, S., AND GROSS, T. Decoupling Communication Services of Compiled Parallel Programs. Tech. Rep. CMU-CS-94-139, Carnegie Mellon University, School of Computer Science, 1994.
|
 |
24
|
Jaspal Subhlok , James M. Stichnoth , David R. O'Hallaron , Thomas Gross, Exploiting task and data parallelism on a multicomputer, Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.13-22, May 19-22, 1993, San Diego, California, United States
|
| |
25
|
|
| |
26
|
TAKE, R. A Routing Method for All-to-all Burst on Hypercube Networks. In Proc. 35th National Conference of Information Processing SocietyofJapan(1987), pp. 151-152. In Japanese.
|
| |
27
|
|
| |
28
|
VARVARIGOS, E. A., AND BERTSEKAS, D. P. Communication algorithms for isotropic tasks in hypercubes and wraparound meshes. Parallel Computing 18 (1992), 1233-1257.
|
CITED BY 20
|
|
|
|
|
Micah Adler , Phillip B. Gibbons , Vijaya Ramachandran , Yossi Matias, Modeling parallel bandwidth: local vs. global restrictions, Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, p.94-105, June 23-25, 1997, Newport, Rhode Island, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Micah Adler , Ramesh K. Sitaraman , Arnold L. Rosenberg , Walter Unger, Scheduling time-constrained communication in linear networks, Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures, p.269-278, June 28-July 02, 1998, Puerto Vallarta, Mexico
|
|
|
|
|
|
Elliot Waingold , Michael Taylor , Devabhaktuni Srikrishna , Vivek Sarkar , Walter Lee , Victor Lee , Jang Kim , Matthew Frank , Peter Finch , Rajeev Barua , Jonathan Babb , Saman Amarasinghe , Anant Agarwal, Baring It All to Software: Raw Machines, Computer, v.30 n.9, p.86-93, September 1997
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xin Yuan , R. Melhem , R. Gupta, Compiled communication for all-optical TDM networks, Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), p.25-es, January 01-01, 1996, Pittsburgh, Pennsylvania, United States
|
|
|
|
|
|
Hongzhang Shan , Erich Strohmaier , Ji Qiang , David H. Bailey , Kathy Yelick, Particles and contiuum---Performance modeling and optimization of a high energy colliding beam simulation code, Proceedings of the 2006 ACM/IEEE conference on Supercomputing, November 11-17, 2006, Tampa, Florida
|
|
|
|
|
|
|
|