|
ABSTRACT
The Convex SPP-1000 is the first commercial implementation of a new generation of scalable shared memory parallel computers with full cache coherence. It employs a hierarchical structure of processing communication and memory name-space management resources to provide a scalableNUMA environment. Ensembles of 8 HP PA-RISC7100 microprocessorsemploy an internal cross-bar switch and directory based cache coherence scheme to provide a tightly coupled SMP.Up to 16 processing ensembles are interconnected by a 4 ring network incorporating a full hardware implementation of the SCI protocol for a full system configuration of 128 processors. This paper presents the findings of a set of empirical studies using both synthetic test codes and full applications for the Earth and space sciences to characterize the performance properties of this new architecture. It is shown that overhead and latencies of global primitive mechanisms, while low in absolute time, are significantly more costly than similar functions local to an individual processor ensemble.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
The Generic Active Message Interface Specification. Available from http://now.cs.berkeley.edu/Papers/Papers/gam spec.ps, 1994.
|
 |
2
|
Robert Alverson , David Callahan , Daniel Cummings , Brian Koblenz , Allan Porterfield , Burton Smith, The Tera computer system, Proceedings of the 4th international conference on Supercomputing, p.1-6, June 11-15, 1990, Amsterdam, The Netherlands
|
| |
3
|
|
| |
4
|
|
| |
5
|
G. Armitage and K. Adams. How inefficient is IP over ATM anyway? IEEE Network, Jan/Feb 1995.
|
| |
6
|
Nanette J. Boden , Danny Cohen , Robert E. Felderman , Alan E. Kulawik , Charles L. Seitz , Jakov N. Seizovic , Wen-King Su, Myrinet: A Gigabit-per-Second Local Area Network, IEEE Micro, v.15 n.1, p.29-36, February 1995
[doi> 10.1109/40.342015]
|
| |
7
|
CCITT, SG XVIII, Report R34. Draft Recommendation I.150: B-ISDN ATM functional characteristics, June 1990.
|
| |
8
|
Andrew A. Chien, Vijay Karamcheti, John Plevyak, and Xingbin Zhang. Concurrent aggregates language report 2.0. Available via anonymous ftp from cs.uiuc.edu in /pub/csag or from http://www-csag.cs.uiuc.edu/, September 1993.
|
| |
9
|
D. Clark, V. Jacobson, J Romkey, and H. Salwen. An analysis of TCP processing overhead. IEEE Communication Magazine, 27(6):23-29, June 1989.
|
| |
10
|
|
| |
11
|
Cray Research, Inc. Cray T3D System Architecture Overview, March 1993.
|
 |
12
|
|
| |
13
|
Fiber-distributed data interface (FDDI)-Token ring media access control (MAC). American National Standard for Information Systems ANSI X3.139-1987, July 1987. American National Standards Institute.
|
| |
14
|
|
| |
15
|
H. Franke, C. E. Wu, M Riviere, P Pattnik, and M Snir. MPI programming environment for IBM SP1/SP2. In Proceedings of the International Symposium on Computer Architecture, 1995.
|
| |
16
|
F. Hady, R. Minnich, and D. Burns. The Memory Integrated Network Interface. In Proceedings of the IEEE Symposium on Hot Interconnects, 1994.
|
| |
17
|
Mark Henderson, Bill Nickless, and Rick Stevens. A scalable highperformance I/O system. In Proceedings of the Scalable High- Performance Computing Conference, pages 79-86, 1994.
|
| |
18
|
James Hoe and A. Boughton. Network substrate for parallel processing on a workstation cluster. In Proceedings of the IEEE Symposium on Hot Interconnects, 1994.
|
| |
19
|
H. Houh, J. Adam, M. Ismert, C. Lindblad, and D. Tennenhouse. The VuNet desk area network: Architecture, implementation and experience. IEEE Journal of Selected Areas in Communications, 1995.
|
| |
20
|
IBM 9076 Scalable POWERparallel 1: General information. IBM brochure GH26-7219-00, February 1993. Available from http://ibm.tc.cornell.edu/ibm/pps/sp2/index.html .
|
| |
21
|
Intel Corporation. Paragon XP/S Product Overview, 1991.
|
 |
22
|
|
| |
23
|
Vijay Karamcheti and Andrew A. Chien. FM-fast messaging on the Cray T3D. Available from http://www-csag.cs.uiuc.edu/papers/t3d-fmmanual. ps, February 1995.
|
| |
24
|
M. Liu, J. Hsieh, D. Hu, J. Thomas, and J. MacDonald. Distributed network computing over Local ATM Networks. In Supercomputing '94, 1995.
|
| |
25
|
R. Martin. HPAM: An Active Message layer for a network of HP workstation. In Proceedings of the IEEE Symposium on Hot Interconnects, 1994. Available from ftp://ftp.cs.berkeley.edu/ucb/CASTLE/Active Messages/hotipaper.ps.
|
| |
26
|
Meiko World Incorporated. Meiko Computing Surface Communications Processor Overview, 1993.
|
| |
27
|
|
| |
28
|
|
| |
29
|
Thinking Machines Corporation, 245 First Street, Cambridge, MA 02154-1264. The Connection Machine CM-5 Technical Summary, October 1991.
|
| |
30
|
|
 |
31
|
Thorsten von Eicken , David E. Culler , Seth Copen Goldstein , Klaus Erik Schauser, Active messages: a mechanism for integrated communication and computation, Proceedings of the 19th annual international symposium on Computer architecture, p.256-266, May 19-21, 1992, Queensland, Australia
|
CITED BY 83
|
|
Yuanyuan Zhou , Liviu Iftode , Jaswinder Pal Sing , Kai Li , Brian R. Toonen , Ioannis Schoinas , Mark D. Hill , David A. Wood, Relaxed consistency and coherence granularity in DSM systems: a performance evaluation, ACM SIGPLAN Notices, v.32 n.7, p.193-205, July 1997
|
|
|
Yanyong Zhang , Anand Sivasubramaniam , Jose Moreira , Hubertus Franke, A simulation-based study of scheduling mechanisms for a dynamic cluster environment, Proceedings of the 14th international conference on Supercomputing, p.100-109, May 08-11, 2000, Santa Fe, New Mexico, United States
|
|
|
|
|
|
Cheng Liao , Margaret Martonosi , Douglas W. Clark, Performance monitoring in a Myrinet-connected SHRIMP cluster, Proceedings of the SIGMETRICS symposium on Parallel and distributed tools, p.21-29, August 03-04, 1998, Welches, Oregon, United States
|
|
|
|
|
|
|
|
|
|
|
|
Henri E. Bal , Raoul Bhoedjang , Rutger Hofman , Ceriel Jacobs , Koen Langendoen , Tim Rühl , M. Frans Kaashoek, Performance evaluation of the Orca shared-object system, ACM Transactions on Computer Systems (TOCS), v.16 n.1, p.1-40, Feb. 1998
|
|
|
|
|
|
|
|
|
Shinji Sumimoto , Hiroshi Tezuka , Atsushi Hori , Hiroshi Harada , Toshiyuki Takahashi , Yutaka Ishikawa, The design and evaluation of high performance communication using a Gigabit Ethernet, Proceedings of the 13th international conference on Supercomputing, p.260-267, June 20-25, 1999, Rhodes, Greece
|
|
|
|
|
|
|
|
|
|
|
|
Soichiro Araki , Angelos Bilas , Cezary Dubnicki , Jan Edler , Koichi Konishi , James Philbin, User-space communication: a quantitative study, Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), p.1-16, November 07-13, 1998, San Jose, CA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Richard P. Martin , Amin M. Vahdat , David E. Culler , Thomas E. Anderson, Effects of communication latency, overhead, and bandwidth in a cluster architecture, ACM SIGARCH Computer Architecture News, v.25 n.2, p.85-97, May 1997
|
|
|
|
|
|
|
|
|
Remzi H. Arpaci-Dusseau , Eric Anderson , Noah Treuhaft , David E. Culler , Joseph M. Hellerstein , David Patterson , Kathy Yelick, Cluster I/O with River: making the fast case common, Proceedings of the sixth workshop on I/O in parallel and distributed systems, p.10-22, May 05-05, 1999, Atlanta, Georgia, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jack Dongarra , Ian Foster , Geoffrey Fox , William Gropp , Ken Kennedy , Linda Torczon , Andy White, References, Sourcebook of parallel computing, Morgan Kaufmann Publishers Inc., San Francisco, CA, 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
David S. Greenberg , Ron Brightwell , Lee Ann Fisk , Arthur Maccabe , Rolf Riesen, A system software architecture for high-end computing, Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM), p.1-15, November 15-21, 1997, San Jose, CA
|
|
|
|
|
|
|
|
|
|
|
|
Ian Foster , Jonathan Geisler , Carl Kesselman , Steven Tuecke, Multimethod communication for high-performance metacomputing applications, Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), p.41-es, January 01-01, 1996, Pittsburgh, Pennsylvania, United States
|
|
|
Jon Beecroft , David Addison , David Hewson , Moray McLaren , Duncan Roweth , Fabrizio Petrini , Jarek Nieplocha, QsNetII: Defining High-Performance Network Design, IEEE Micro, v.25 n.4, p.34-47, July 2005
|
|
|
A. Chien , M. Lauria , R. Pennington , M. Showerman , G. Iannello , M. Buchanan , K. Connelly , L. Giannini , G. Koeni , S. Krishnamurthy , Q. Liu , S. Pakin , G. Sampemane, Design and Evaluation of an HPVM-Based Windows NT Supercomputer, International Journal of High Performance Computing Applications, v.13 n.3, p.201-219, August 1999
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Osamu Tatebe , Umpei Nagashima , Satoshi Sekiguchi , Hisayoshi Kitabayashi , Yoshiyuki Hayashida, Design and implementation of FMPL, a fast message-passing library for remote memory operations, Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM), p.15-15, November 10-16, 2001, Denver, Colorado
|
|
|
|
|
|
|
|
|
Shailabh Nagar , Ajit Banerjee , Anand Sivasubramaniam , Chita R. Das, A closer look at coscheduling approaches for a network of workstations, Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures, p.96-105, June 27-30, 1999, Saint Malo, France
|
|
|
|
|
|
M. Farreras , T. Cortes , J. Labarta , G. Almasi, Scaling MPI to short-memory MPPs such as BG/L, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Wei Huang , Jiuxing Liu , Matthew Koop , Bulent Abali , Dhabaleswar Panda, Nomad: migrating OS-bypass networks in virtual machines, Proceedings of the 3rd international conference on Virtual execution environments, June 13-15, 2007, San Diego, California, USA
|
|
|
|
|
|
|
|
|
Chi-Chao Chang , Grzegorz Czajkowski , Chris Hawblitzel , Deyu Hu , Thorsten von Eicken, Security versus performance tradeoffs in RPC implementations for safe language systems, Proceedings of the 8th ACM SIGOPS European workshop on Support for composing distributed applications, p.158-161, September 1998, Sintra, Portugal
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Konosuke Watanabe , Tomohiro Otsuka , Junichiro Tsuchiya , Hiroaki Nishi , Junji Yamamoto , Noboru Tanabe , Tomohiro Kudoh , Hideharu Amano, Martini: A Network Interface Controller Chip for High Performance Computing with Distributed PCs, IEEE Transactions on Parallel and Distributed Systems, v.18 n.9, p.1282-1295, September 2007
|
|