|
ABSTRACT
The goal of this work is to explore architectural mechanisms for supporting explicit communication in cache-coherent shared memory multiprocessors. The motivation stems from the observation that applications display wide diversity in terms of sharing characteristics and hence impose different communication requirements on the system. Explicit communication mechanisms would allow tailoring the coherence management under software control to match these differing needs and strive to provide a close approximation to a zero overhead machine from the application perspective. Toward achieving these goals, we first analyze the characteristics of sharing observed in certain specific applications. We then use these characteristics to synthesize explicit communication primitives. The proposed primitives allow selectively updating a set of processors, or requesting a stream of data ahead of its intended use. These primitives are essentially generalizations of prefetch and poststore, with the ability to specify the sharer set for poststore either statically or dynamically. The proposed primitives are to be used in conjunction with an underlying invalidation based protocol. Used in this manner, the resulting memory system can dynamically adapt itself to performing either invalidations or updates to match the communication needs. Through application driven performance study we show the utility of these mechanisms in being able to reduce and tolerate communication latencies.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Anant Agarwal , Ricardo Bianchini , David Chaiken , Kirk L. Johnson , David Kranz , John Kubiatowicz , Beng-Hong Lim , Kenneth Mackenzie , Donald Yeung, The MIT Alewife machine: architecture and performance, Proceedings of the 22nd annual international symposium on Computer architecture, p.2-13, June 22-24, 1995, S. Margherita Ligure, Italy
|
 |
3
|
|
| |
4
|
D. Bailey et al. The NAS Parallel Benchmarks. International Journal of Supercomputer Applications, 5(3):63-73, 1991.
|
 |
5
|
P. Bitar , A. M. Despain, Multiprocessor cache synchronization: issues, innovations, evolution, Proceedings of the 13th annual international symposium on Computer architecture, p.424-433, June 02-05, 1986, Tokyo, Japan
|
 |
6
|
|
| |
7
|
H. Cheong and A. V. Veidenbaum. Stale data detection and coherence enforcement using flow analysis. In Proceedings of the 1988 International Conference on Parallel Processing, pages I: 138-145, August 1988.
|
| |
8
|
Cray Research, Inc., Minnesota. The Cray T3D System Architecture Overview Manual, 1993.
|
 |
9
|
A. Krishnamurthy , D. E. Culler , A. Dusseau , S. C. Goldstein , S. Lumetta , T. von Eicken , K. Yelick, Parallel programming in Split-C, Proceedings of the 1993 ACM/IEEE conference on Supercomputing, p.262-273, December 1993, Portland, Oregon, United States
[doi> 10.1145/169627.169724]
|
| |
10
|
R. Cytron, S. Marlovsky, and K. P. McAuliffe. Automatic management of programmable caches. In Proceedings of the 1988 International Conference on Parallel Processing, pages II-229-238, August 1988.
|
 |
11
|
F. Dahlgren , M. Dubois , P. Stenström, Combined performance gains of simple cache protocol extensions, Proceedings of the 21ST annual international symposium on Computer architecture, p.187-197, April 18-21, 1994, Chicago, Illinois, United States
|
| |
12
|
F. Dahlgren and P. Stenstrom. Using write caches to improve performance of cache coherence protocols in shared memory multiprocessors. Technical report, Dept. of Comp. Eng., Lund Univ., April 1993.
|
 |
13
|
|
| |
14
|
Encore Computer Corporation, 257 Cedar Hill St., Marlboro, MA 01752. Multimax Technical Summary, 1986.
|
| |
15
|
Babak Falsafi , Alvin R. Lebeck , Steven K. Reinhardt , Ioannis Schoinas , Mark D. Hill , James R. Larus , Anne Rogers , David A. Wood, Application-specific protocols for user-level shared memory, Proceedings of the 1994 conference on Supercomputing, p.380-389, December 1994, Washington, D.C., United States
|
| |
16
|
Matthew I. Frank and Mary K. Vernon. A hybrid Shared Memory/Message Passing parallel machine. In Proceedings of the 1993 International Conference on Parallel Processing, August 1993.
|
 |
17
|
Kourosh Gharachorloo , Daniel Lenoski , James Laudon , Phillip Gibbons , Anoop Gupta , John Hennessy, Memory consistency and event ordering in scalable shared-memory multiprocessors, Proceedings of the 17th annual international symposium on Computer Architecture, p.15-26, May 28-31, 1990, Seattle, Washington, United States
|
 |
18
|
|
 |
19
|
Anoop Gupta , John Hennessy , Kourosh Gharachorloo , Todd Mowry , Wolf-Dietrich Weber, Comparative evaluation of latency reducing and tolerating techniques, Proceedings of the 18th annual international symposium on Computer architecture, p.254-263, May 27-30, 1991, Toronto, Ontario, Canada
|
 |
20
|
John Heinlein , Kourosh Gharachorloo , Scott Dresser , Anoop Gupta, Integration of message passing and shared memory in the Stanford FLASH multiprocessor, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.38-50, October 05-07, 1994, San Jose, California, United States
|
 |
21
|
Mark Heinrich , Jeffrey Kuskin , David Ofelt , John Heinlein , Joel Baxter , Jaswinder Pal Singh , Richard Simoni , Kourosh Gharachorloo , David Nakahira , Mark Horowitz , Anoop Gupta , Mendel Rosenblum , John Hennessy, The performance impact of flexibility in the Stanford FLASH multiprocessor, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.274-285, October 05-07, 1994, San Jose, California, United States
|
 |
22
|
David Kranz , Kirk Johnson , Anant Agarwal , John Kubiatowicz , Beng-Hong Lim, Integrating message-passing and shared-memory: early experience, Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.54-63, May 19-22, 1993, San Diego, California, United States
|
 |
23
|
J. Kuskin , D. Ofelt , M. Heinrich , J. Heinlein , R. Simoni , K. Gharachorloo , J. Chapin , D. Nakahira , J. Baxter , M. Horowitz , A. Gupta , M. Rosenblum , J. Hennessy, The Stanford FLASH multiprocessor, Proceedings of the 21ST annual international symposium on Computer architecture, p.302-313, April 18-21, 1994, Chicago, Illinois, United States
|
| |
24
|
L. Lamport. How to make a Multiprocessor Computer that Correctly executes Multiprocess Programs. IEEE Transactions on Computer Systems, C-28(9), 1979.
|
 |
25
|
|
 |
26
|
|
| |
27
|
Daniel Lenoski , James Laudon , Kourosh Gharachorloo , Wolf-Dietrich Weber , Anoop Gupta , John Hennessy , Mark Horowitz , Monica S. Lam, The Stanford Dash Multiprocessor, Computer, v.25 n.3, p.63-79, March 1992
[doi> 10.1109/2.121510]
|
| |
28
|
T. Lovett and S. Thakkar. The symmetry multiprocessor system. In Proceedings of the 1988 International Conference on Parallel Processing, pages 303-310, August 1988.
|
| |
29
|
S. L. Min and J-L. Baer. A Timestamp-based Cache Coherence Scheme. In Proceedings of the 1989 International Conference on Parallel Processing, pages I: 23-32, August 1989.
|
| |
30
|
H. Nilsson, P. Stenstrom, and M. Dubois. Implementation and evaluation of update-based cache protocols under relaxed memory consistency models. Technical report, Dept. of Comp. Eng., Lund Univ., July 1993.
|
| |
31
|
U. Ramachandran, G. Shah, S. Ravikumar, and J. Muthukumarasamy. Scalability study of the KSR-1. In Proceedings of the 1993 International Conference on Parallel Processing, pages I-237-240, August 1993.
|
 |
32
|
S. K. Reinhardt , J. R. Larus , D. A. Wood, Tempest and typhoon: user-level shared memory, Proceedings of the 21ST annual international symposium on Computer architecture, p.325-336, April 18-21, 1994, Chicago, Illinois, United States
|
| |
33
|
Kendall Square Research. Technical summary, 1992.
|
| |
34
|
|
 |
35
|
Anand Sivasubramaniam , Aman Singla , Umakishore Ramachandran , H. Venkateswaran, An approach to scalability study of shared memory parallel systems, Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.171-180, May 16-20, 1994, Nashville, Tennessee, United States
|
| |
36
|
|
 |
37
|
|
 |
38
|
|
 |
39
|
Steven Cameron Woo , Jaswinder Pal Singh , John L. Hennessy, The performance advantages of integrating block data transfer in cache-coherent multiprocessors, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.219-229, October 05-07, 1994, San Jose, California, United States
|
CITED BY 10
|
|
|
|
|
|
|
|
Aman Singla , Umakishore Ramachandran , Jessica Hodgins, Temporal notions of synchronization and consistency in Beehive, Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, p.211-220, June 23-25, 1997, Newport, Rhode Island, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ram Rangan , Neil Vachharajani , Adam Stoler , Guilherme Ottoni , David I. August , George Z. N. Cai, Support for High-Frequency Streaming in CMPs, Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, p.259-272, December 09-13, 2006
|
|
|
|
|