| Understanding application performance on shared virtual memory systems |
| Full text |
Pdf
(1.59 MB)
|
| Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 23rd annual international symposium on Computer architecture
table of contents
Philadelphia, Pennsylvania, United States
Pages: 122 - 133
Year of Publication: 1996
ISBN:0-89791-786-3
Also published in ...
|
|
Authors
|
|
Liviu Iftode
|
Department of Computer Science, Princeton University, Princeton, NJ
|
|
Jaswinder Pal Singh
|
Department of Computer Science, Princeton University, Princeton, NJ
|
|
Kai Li
|
Department of Computer Science, Princeton University, Princeton, NJ
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 19, Downloads (12 Months): 48, Citation Count: 18
|
|
|
ABSTRACT
Many researchers have proposed interesting protocols for shared virtual memory (SVM) systems, and demonstrated performance improvements on parallel programs. However, there is still no clear understanding of the performance potential of SVM systems for different classes of applications. This paper begins to fill this gap, by studying the performance of a range of applications in detail and understanding it in light of application characteristics.We first develop a brief classification of the inherent data sharing patterns in the applications, and how they interact with system granularities to yield the communication patterns relevant to SVM systems. We then use detailed simulation to compare the performance of two SVM approaches---Lazy Released Consistency (LRC) and Automatic Update Release Consistency (AURC)---with each other and with an all-hardware CC-NUMA approach. We examine how performance is affected by problem size, machine size, key system parameters, and the use of less optimized program implementations. We find that SVM can indeed perform quite well for systems of at leant up to 32 processors for several nontrivial applications. However, performance is much more variable across applications than on CC-NUMA systems, and the problem sizes needed to obtain good parallel performance are substantially larger. The hardware-assisted AURC system tends to perform significantly better than the all-software LRC under our system assumptions, particularly when realistic cache hierarchies are used.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
John K. Bennett , John B. Carter , Willy Zwaenepoel, Adaptive software cache management for distributed shared memory architectures, Proceedings of the 17th annual international symposium on Computer Architecture, p.125-134, May 28-31, 1990, Seattle, Washington, United States
|
| |
3
|
B.N. Bershad, M.J. Zekauskas, and W.A. Sawdon. The Midway Distributed Shared Memory System. in Proceedings of the IEEE COMPCON '93 Conference, February 1993.
|
 |
4
|
M. A. Blumrich , K. Li , R. Alpert , C. Dubnicki , E. W. Felten , J. Sandberg, Virtual memory mapped network interface for the SHRIMP multicomputer, Proceedings of the 21ST annual international symposium on Computer architecture, p.142-153, April 18-21, 1994, Chicago, Illinois, United States
|
 |
5
|
John B. Carter , John K. Bennett , Willy Zwaenepoel, Implementation and performance of Munin, Proceedings of the thirteenth ACM symposium on Operating systems principles, p.152-164, October 13-16, 1991, Pacific Grove, California, United States
|
 |
6
|
A. L. Cox , S. Dwarkadas , P. Keleher , H. Lu , R. Rajamony , W. Zwaenepoel, Software versus hardware shared-memory implementation: a case study, Proceedings of the 21ST annual international symposium on Computer architecture, p.106-117, April 18-21, 1994, Chicago, Illinois, United States
|
 |
7
|
Kourosh Gharachorloo , Daniel Lenoski , James Laudon , Phillip Gibbons , Anoop Gupta , John Hennessy, Memory consistency and event ordering in scalable shared-memory multiprocessors, Proceedings of the 17th annual international symposium on Computer Architecture, p.15-26, May 28-31, 1990, Seattle, Washington, United States
|
| |
8
|
Chris Holt , Mark Heinrich , Jaswinder P Singh , Edward Rothberg , John Hennessy, The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors, Stanford University, Stanford, CA, 1995
|
| |
9
|
|
| |
10
|
L. Iftode, J.P. Singh, and K. Li. irregular Applications under Software Shared Memory. Technical Report TR-514-96, Princeton, N J, February 1996.
|
| |
11
|
L. Iftode, J.P. Singh, and K. Li. Scope Consistency: a Bridge between Release Consistency and Entry Consistency. Technical Report TR-509-96, Princeton, NJ, January 1996.
|
| |
12
|
P. Keleher, A.L. Cox, S. Dwarkadas, and W. Zwaenepoel. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems. In Proceedings of the Winter USENIX Conference, pages 115-132, January 1994.
|
 |
13
|
|
| |
14
|
|
 |
15
|
|
 |
16
|
|
 |
17
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
CITED BY 19
|
|
Yuanyuan Zhou , Liviu Iftode , Jaswinder Pal Sing , Kai Li , Brian R. Toonen , Ioannis Schoinas , Mark D. Hill , David A. Wood, Relaxed consistency and coherence granularity in DSM systems: a performance evaluation, ACM SIGPLAN Notices, v.32 n.7, p.193-205, July 1997
|
|
|
Mark W. Goudreau , Kevin Lang , Girija Narlikar , Satish B. Rao, BOS is boss: a case for bulk-synchronous object systems, Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures, p.115-125, June 27-30, 1999, Saint Malo, France
|
|
|
|
|
|
|
|
|
Cheng Liao , Dongming Jiang , Liviu Iftode , Margaret Martonosi , Douglas W. Clark, Monitoring shared virtual memory performance on a Myrinet-based PC cluster, Proceedings of the 12th international conference on Supercomputing, p.251-258, July 1998, Melbourne, Australia
|
|
|
|
|
|
|
|
|
|
|
|
Dongming Jiang , Brian O'Kelley , Xiang Yu , Sanjeev Kumar , Angelos Bilas , Jaswinder Pal Singh, Application scaling under shared virtual memory on a cluster of SMPs, Proceedings of the 13th international conference on Supercomputing, p.165-174, June 20-25, 1999, Rhodes, Greece
|
|
|
Liviu Iftode , Matthias Blumrich , Cezary Dubnicki , David L. Oppenheimer , Jaswinder Pal Singh , Kai Li, Shared virtual memory with automatic update support, Proceedings of the 13th international conference on Supercomputing, p.175-183, June 20-25, 1999, Rhodes, Greece
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|