|
ABSTRACT
We present the Stanford Parallel Applications for Shared-Memory (SPLASH), a set of parallel applications for use in the design and evaluation of shared-memory multiprocessing systems. Our goal is to provide a suite of realistic applications that will serve as a well-documented and consistent basis for evaluation studies. We describe the applications currently in the suite in detail, discuss some of their important characteristics, and explore their behavior by running them on a real multiprocessor as well as on a simulator of an idealized parallel architecture. We expect the current set of applications to act as a nucleus for a suite that will grow with time.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
[1] J.J. Dongarra, J.L. Martin and J. Worlton, "Evaluating Computers and Their Performance: Perspectives, Pitfalls, and Paths," IBM Research Report 12904, April 1987.
|
| |
2
|
[2] "SPEC Benchmark Suite Release 1.0," October, 1989.
|
| |
3
|
[3] E.L. Lusk and R.A. Overbeek, "Use of Monitors in FORTRAN: A Tutorial on the Barrier, Self-scheduling DO-Loop, and Askfor Monitors," Tech. Report No. ANL-84-51, Rev. 1, Argonne National Laboratory, June 1987.
|
| |
4
|
[4] "Using the Encore Multimax," Tech. Mem. No. 65, Rev. 1, Math. and Comp. Sci. Division, Argonne National Laboratory, Feb. 1987.
|
| |
5
|
[5] J.J. Dongarra, J. Bunch, C. Moler and G. Stewart, "LINPACK Users' Guide," SIAM Pub., Philadelphia, 1976.
|
| |
6
|
|
| |
7
|
|
| |
8
|
[8] G.H. Golub and C.F. Van Loan, Matrix Computations, Second Edition, Chap. 10, The Johns Hopkins University Press, 1989.
|
| |
9
|
|
| |
10
|
[10] J.P. Singh and J.L. Hennessy, "Data Locality and Memory System Performance in the Parallel Simulation of Ocean Eddy Currents," Proceedings of the Second Symposium on High Performance Computing, Montpelier, France, October 1991. Also Tech. Report. No. CSL-TR-91-490, Stanford University, Aug. 1991.
|
| |
11
|
[11] J.P. Singh and J.L. Hennessy, "Automatic and Explicit Parallelization of an N-body Simulation," submitted for publication.
|
| |
12
|
[12] G.C. Lie and E. Clementi, "Molecular-Dynamics Simulation of Liquid Water with an ab initio Flexible Water-Water Interaction Potential," Physical Review, Vol. A33, pp. 2679 ff., 1986.
|
| |
13
|
[13] O. Matsuoka, E. Clementi and M. Yoshimine, "CI Study of the Water Dimer Potential Surface," Journal of Chemical Physics, Vol. 64, No. 4, pp. 1351-61, Feb. 1976.
|
| |
14
|
[14] R. Bartlett, I. Shavitt and G. Purvis, "The Quartic Force Field of H2O Determined by Many-Body Methods that Include Quadruple Excitation Effects," Journal of Chemical Physics, Vol. 71, No. 1, pp. 281-291, July 1979.
|
| |
15
|
[15] M. Berry et. al., "The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers," CSRD Report No. 827, Center for Supercomputing Research and Develpment, Urbana, Illinois, May 1989.
|
| |
16
|
[16] J.E. Barnes and P. Hut, "A Hierarchical O(N log N) Force Calculation Algorithm", Nature, Vol. 324, No. 4, pp. 446-449, December 1986.
|
| |
17
|
[17] G.C. Fox, "A Graphical Approach to Load Balancing and Sparse Matrix Vector Multiplication on the Hypercube", in Numerical Algorithms for Modern Parallel Computer Architectures, ed. M. Schultz, Springer-Verlag, 1988, pp. 37-62.
|
 |
18
|
|
| |
19
|
[19] J.P. Singh, J.L. Hennessy and A. Gupta, "Implications of Hierarehical N-Body Techniques for Multiprocessor Architecture", Technical Report CSL-TR-92-506, Stanford University, February 1992.
|
| |
20
|
|
| |
21
|
[21] J.P. Singh, C. Holt, T. Totsuka, A. Gupta and J.L. Hennessy, "Load Balancing and Data Locality in Hierarchical N-body Methods", Technical Report CSL-TR-92-505, Stanford University, February 1992.
|
| |
22
|
[22] David R. Cheriton, Hendrik A. Goosen, and Philip Machanick, "Restructuring a parallel simulation to improve cache behavior in a shared-memory multiprocessor: A first experience, 1990," to appear in Proc. International Symposium on Shared-Memory Multiprocessing, April 1991.
|
| |
23
|
[23] Jeffrey D. McDonald, "A direct particle simulation method for hypersonic rarified flow," CS 411 - Final Project Report, Stanford University, March 1988.
|
| |
24
|
|
 |
25
|
|
| |
26
|
[26] J.S. Rose, "Parallel global routing for standard cells", IEEE Trans. Computer-Aided Design of Circuits and Systems, September 1990.
|
 |
27
|
|
| |
28
|
|
 |
29
|
|
| |
30
|
[30] A. George, M. Heath, J. Liu, and E. Ng, "Solution of sparse positive definite systems on a hypercube," Technical Report TM-10865, Oak Ridge National Laboratory, 1988.
|
| |
31
|
|
| |
32
|
|
CITED BY 180
|
|
Leonidas I. Kontothanassis , Michael L. Scott , Ricardo Bianchini, Lazy release consistency for hardware-coherent multiprocessors, Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), p.61-es, December 04-08, 1995, San Diego, California, United States
|
|
|
|
|
|
|
|
|
Robert Stets , Sandhya Dwarkadas , Nikolaos Hardavellas , Galen Hunt , Leonidas Kontothanassis , Srinivasan Parthasarathy , Michael Scott, Cashmere-2L: software coherent shared memory on a clustered remote-write network, ACM SIGOPS Operating Systems Review, v.31 n.5, p.170-183, Dec. 1997
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Alain Kägi , Nagi Aboulenein , Douglas C. Burger , James R. Goodman, Techniques for reducing overheads of shared-memory multiprocessing, Proceedings of the 9th international conference on Supercomputing, p.11-20, July 03-07, 1995, Barcelona, Spain
|
|
|
|
|
|
|
|
|
|
|
|
Chi-Chao Chang , Grzegorz Czajkowski , Thorsten von Eicken , Carl Kesselman, Evaluating the performance limitations of MPMD communication, Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM), p.1-10, November 15-21, 1997, San Jose, CA
|
|
|
Babak Falsafi , Alvin R. Lebeck , Steven K. Reinhardt , Ioannis Schoinas , Mark D. Hill , James R. Larus , Anne Rogers , David A. Wood, Application-specific protocols for user-level shared memory, Proceedings of the 1994 conference on Supercomputing, p.380-389, December 1994, Washington, D.C., United States
|
|
|
David A. Wood , Satish Chandra , Babak Falsafi , Mark D. Hill , James R. Larus , Alvin R. Lebeck , James C. Lewis , Shubhendu S. Mukherjee , Subbarao Palacharla , Steven K. Reinhardt, Mechanisms for cooperative shared memory, ACM SIGARCH Computer Architecture News, v.21 n.2, p.156-167, May 1993
|
|
|
|
|
|
|
|
|
|
|
|
Andrew Erlichson , Basem A. Nayfeh , Jaswinder P. Singh , Kunle Olukotun, The benefits of clustering in shared address space multiprocessors: an applications-driven investigation, Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), p.60-es, December 04-08, 1995, San Diego, California, United States
|
|
|
Anant Agarwal , Ricardo Bianchini , David Chaiken , Kirk L. Johnson , David Kranz , J. Kubiatowicz , B.-H. Lim , K. Mackenzie , D. Yeung, The MIT Alewife machine: architecture and performance, 25 years of the international symposia on Computer architecture (selected papers), p.509-520, June 27-July 02, 1998, Barcelona, Spain
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Anant Agarwal , Ricardo Bianchini , David Chaiken , Kirk L. Johnson , David Kranz , John Kubiatowicz , Beng-Hong Lim , Kenneth Mackenzie , Donald Yeung, The MIT Alewife machine: architecture and performance, ACM SIGARCH Computer Architecture News, v.23 n.2, p.2-13, May 1995
|
|
|
|
|
|
Honghui Lu , Sandhya Dwarkadas , Alan L. Cox , Willy Zwaenepoel, Message passing versus distributed shared memory on networks of workstations, Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), p.37-es, December 04-08, 1995, San Diego, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Manuel E. Acacio , José González , José M. García , José Duato, Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture, Proceedings of the 2002 ACM/IEEE conference on Supercomputing, p.1-12, November 16, 2002, Baltimore, Maryland
|
|
|
|
|
|
|
|
|
|
|
|
J. P. Singh , T. Joe , J. L. Hennessy , A. Gupta, An empirical comparison of the Kendall Square Research KSR-1 and Stanford DASH multiprocessors, Proceedings of the 1993 ACM/IEEE conference on Supercomputing, p.214-225, December 1993, Portland, Oregon, United States
|
|
|
Vijayaraghavan Soundararajan , Mark Heinrich , Ben Verghese , Kourosh Gharachorloo , Anoop Gupta , John Hennessy, Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors, ACM SIGARCH Computer Architecture News, v.26 n.3, p.342-355, June 1998
|
|
|
|
|
|
|
|
|
|
|
|
Liviu Iftode , Matthias Blumrich , Cezary Dubnicki , David L. Oppenheimer , Jaswinder Pal Singh , Kai Li, Shared virtual memory with automatic update support, Proceedings of the 13th international conference on Supercomputing, p.175-183, June 20-25, 1999, Rhodes, Greece
|
|
|
|
|
|
|
|
|
Douglas C. Burger , Rahmat S. Hyder , Barton P. Miller , David A. Wood, Paging tradeoffs in distributed-shared-memory multiprocessors, Proceedings of the 1994 conference on Supercomputing, p.590-599, December 1994, Washington, D.C., United States
|
|
|
John G. Holm , John A. Chandy , Steven Parkes , Sumit Roy , Venkatram Krishnaswamy , Gagan Hasteer , Prithviraj Banerjee, Performance evaluation of message-driven parallel VLSI CAD applications on general purpose multiprocessors, Proceedings of the 11th international conference on Supercomputing, p.172-179, July 07-11, 1997, Vienna, Austria
|
|
|
Steven K. Reinhardt , Mark D. Hill , James R. Larus , Alvin R. Lebeck , James C. Lewis , David A. Wood, The Wisconsin Wind Tunnel: virtual prototyping of parallel computers, ACM SIGMETRICS Performance Evaluation Review, v.21 n.1, p.48-60, June 1993
|
|
|
|
|
|
|
|
|
Y. Charlie Hu , Alan Cox , Willy Zwaenepoel, Improving fine-grained irregular shared-memory benchmarks by data reordering, Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), p.33-es, November 04-10, 2000, Dallas, Texas, United States
|
|
|
|
|
|
|
|
|
|
|
|
Henri E. Bal , Raoul Bhoedjang , Rutger Hofman , Ceriel Jacobs , Koen Langendoen , Tim Rühl , M. Frans Kaashoek, Performance evaluation of the Orca shared-object system, ACM Transactions on Computer Systems (TOCS), v.16 n.1, p.1-40, Feb. 1998
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
V. Puente , J. A. Gregorio , C. Izu , R. Beivide , F. Vallejo, Low-level router design and its impact on supercomputer system performance, Proceedings of the 13th international conference on Supercomputing, p.193-201, June 20-25, 1999, Rhodes, Greece
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Parthasarathy Ranganathan , Vijay S. Pai , Sarita V. Adve, Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models, Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, p.199-210, June 23-25, 1997, Newport, Rhode Island, United States
|
|
|
J. Kuskin , D. Ofelt , M. Heinrich , J. Heinlein , R. Simoni , K. Gharachorloo , J. Chapin , D. Nakahira , J. Baxter , M. Horowitz , A. Gupta , M. Rosenblum , J. Hennessy, The Stanford FLASH multiprocessor, ACM SIGARCH Computer Architecture News, v.22 n.2, p.302-313, April 1994
|
|
|
|
|
|
|
|
|
Jeffrey Kuskin , David Ofelt , Mark Heinrich , John Heinlein , Richard Simoni , K. Gharachorloo , J. Chapin , D. Nakahira , J. Baxter , M. Horowitz , A. Gupta , M. Rosenblum , J. Hennessy, The Stanford FLASH multiprocessor, 25 years of the international symposia on Computer architecture (selected papers), p.485-496, June 27-July 02, 1998, Barcelona, Spain
|
|
|
|
|
|
Rohit Chandra , Kourosh Gharachorloo , Vijayaraghavan Soundararajan , Anoop Gupta, Performance evaluation of hybrid hardware and software distributed shared memory protocols, Proceedings of the 8th international conference on Supercomputing, p.274-288, July 11-15, 1994, Manchester, England
|
|
|
Ioannis Schoinas , Babak Falsafi , Alvin R. Lebeck , Steven K. Reinhardt , James R. Larus , David A. Wood, Fine-grain access control for distributed shared memory, ACM SIGPLAN Notices, v.29 n.11, p.297-306, Nov. 1994
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mark Heinrich , Jeffrey Kuskin , David Ofelt , John Heinlein , Joel Baxter , Jaswinder Pal Singh , Richard Simoni , Kourosh Gharachorloo , David Nakahira , Mark Horowitz , Anoop Gupta , Mendel Rosenblum , John Hennessy, The performance impact of flexibility in the Stanford FLASH multiprocessor, ACM SIGPLAN Notices, v.29 n.11, p.274-285, Nov. 1994
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Wilson C. Hsieh , M. Frans Kaashoek , William E. Weihl, Dynamic computation migration in DSM systems, Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), p.44-es, January 01-01, 1996, Pittsburgh, Pennsylvania, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Leonidas Kontothanassis , Galen Hunt , Robert Stets , Nikolaos Hardavellas , Michał Cierniak , Srinivasan Parthasarathy , Wagner Meira, Jr. , Sandhya Dwarkadas , Michael Scott, VM-based shared memory on low-latency, remote-memory-access networks, ACM SIGARCH Computer Architecture News, v.25 n.2, p.157-169, May 1997
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Marco Galluzzi , Valentín Puente , Adrián Cristal , Ramón Beivide , José-Ángel Gregorio , Mateo Valero, A first glance at Kilo-instruction based multiprocessors, Proceedings of the 1st conference on Computing frontiers, April 14-16, 2004, Ischia, Italy
|
|
|
|
|
|
Babak Falsafi , Alvin R. Lebeck , Steven K. Reinhardt , Ioannis Schoinas , Mark D. Hill , James R. Larus , Anne Rogers , David A. Wood, Application-specific protocols for user-level shared memory, Proceedings of the 1994 ACM/IEEE conference on Supercomputing, November 14-18, 1994, Washington, D.C.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Evan Torrie , Chau-Wen Tseng , Margaret Martonosi , Mary W. Hall, Evaluating the impact of advanced memory systems on compiler-parallelized codes, Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, p.204-213, June 27-29, 1995, Limassol, Cyprus
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Milo M. K. Martin , Daniel J. Sorin , Harold W. Cain , Mark D. Hill , Mikko H. Lipasti, Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mirko Loghi , Martin Letis , Luca Benini , Massimo Poncino, Exploring the energy efficiency of cache coherence protocols in single-chip multi-processors, Proceedings of the 15th ACM Great Lakes symposium on VLSI, April 17-19, 2005, Chicago, Illinois, USA
|
|
|
Leonidas Kontothanassis , Robert Stets , Galen Hunt , Umit Rencuzogullari , Gautam Altekar , Sandhya Dwarkadas , Michael L. Scott, Shared memory computing on clusters with symmetric multiprocessors and system area networks, ACM Transactions on Computer Systems (TOCS), v.23 n.3, p.301-335, August 2005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Takashi Nakamura , Toshiyuki Iwamiya , Masahiro Yoshida , Yuichi Matsuo , Masahiro Fukuda, Simulation of the 3 dimensional cascade flow with numerical wind tunnel (NWT), Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), p.47-es, January 01-01, 1996, Pittsburgh, Pennsylvania, United States
|
|
|
Christof Krick , Friedhelm Meyer auf der Heide , Harald Räcke , Berthold Vöcking , Matthias Westermann, Data management in networks: experimental evaluation of a provably good strategy, Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures, p.165-174, June 27-30, 1999, Saint Malo, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
B. Brock , G. Carpenter , E. Chiprout , E. Elnozahy , M. Dean , D. Glasco , J. Peterson , R. Rajamony , F. Rawson , R. Rockhold , A. Zimmerman, Windows NT in a ccNUMA system, Proceedings of the 3rd conference on USENIX Windows NT Symposium, p.7-7, July 12-15, 1999, Seattle, Washington
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Austen McDonald , JaeWoong Chung , Brian D. Carlstrom , Chi Cao Minh , Hassan Chafi , Christos Kozyrakis , Kunle Olukotun, Architectural Semantics for Practical Transactional Memory, ACM SIGARCH Computer Architecture News, v.34 n.2, p.53-65, May 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Valentina Salapura , Matthias Blumrich , Alan Gara, Improving the accuracy of snoop filtering using stream registers, Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture, p.25-32, September 16-16, 2007, Brasov, Romania
|
|
|
|
|
|
|
|
|
|
|
|
Ping Zhou , Bo Zhao , Yu Du , Yi Xu , Youtao Zhang , Jun Yang , Li Zhao, Frequent value compression in packet-based NoC architectures, Proceedings of the 2009 Conference on Asia and South Pacific Design Automation, January 19-22, 2009, Yokohama, Japan
|
|
|
|
|
|
Cesare Ferri , Ruth Iris Bahar , Mirko Loghi , Massimo Poncino, Energy-optimal synchronization primitives for single-chip multi-processors, Proceedings of the 19th ACM Great Lakes symposium on VLSI, May 10-12, 2009, Boston Area, MA, USA
|
|