|
ABSTRACT
Applications with widely shared data do not perform well on cc-NUMA multiprocessors due to the hot-spots they create in the system. In this paper we address this problem by enhancing the memory controller with a forwarding mechanism capable of hiding the read latency of widely shared data, while potentially decreasing the memory and network contention. Based on the influx of requests, the memory anticipates the next read references and forwards the data in advance to the processors. To identify the set of processors the data is to be forwarded to we use a heuristic based on the spatial locality of memory blocks. To increase the forwarding effectiveness and minimize the number of messages, we incorporate simple filters combined with a feedback mechanism. We also show that further improvements are possible using a combined software-prefetching/hardware-forwarding approach. Our experimental results obtained with a detailed execution driven simulator with ILP processors show significant improvements in execution time (up to 37%).
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
|
| |
4
|
B. Brooks, R. Bruccoleri, B. Olafson, D. S. es, S. Swaminathan, and M. Karplus. Charmm: A program for macromolecular energy, minimization, and dynamics calculations. Journal of Computational Chemistry, 4:187, 1983.
|
 |
5
|
William Y. Chen , Scott A. Mahlke , Pohua P. Chang , Wen-mei W. Hwu, Data access microarchitectures for superscalar processors with compiler-assisted data prefetching, Proceedings of the 24th annual international symposium on Microarchitecture, p.69-73, September 1991, Albuquerque, New Mexico, Puerto Rico
[doi> 10.1145/123465.123478]
|
| |
6
|
|
 |
7
|
A. Krishnamurthy , D. E. Culler , A. Dusseau , S. C. Goldstein , S. Lumetta , T. von Eicken , K. Yelick, Parallel programming in Split-C, Proceedings of the 1993 ACM/IEEE conference on Supercomputing, p.262-273, December 1993, Portland, Oregon, United States
[doi> 10.1145/169627.169724]
|
 |
8
|
John W. C. Fu , Janak H. Patel , Bob L. Janssens, Stride directed prefetching in scalar processors, Proceedings of the 25th annual international symposium on Microarchitecture, p.102-110, December 01-04, 1992, Portland, Oregon, United States
|
 |
9
|
Kourosh Gharachorloo , Anoop Gupta , John Hennessy, Performance evaluation of memory consistency models for shared-memory multiprocessors, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.245-257, April 08-11, 1991, Santa Clara, California, United States
|
 |
10
|
Kourosh Gharachorloo , Daniel Lenoski , James Laudon , Phillip Gibbons , Anoop Gupta , John Hennessy, Memory consistency and event ordering in scalable shared-memory multiprocessors, Proceedings of the 17th annual international symposium on Computer Architecture, p.15-26, May 28-31, 1990, Seattle, Washington, United States
|
| |
11
|
A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir. The NYU ultracomputer - designing a MIMD, shared-memory parallel machine. IEEE Trans. on Computers, 32(2):175, Feb. 1983.
|
| |
12
|
W. Gunsteren and H. Berencisen. GROMOS: GROningen MOlecular Simulation software. Technical report, Laboratory of Physical Chemistry, University of Groningen, 1988.
|
| |
13
|
|
 |
14
|
|
 |
15
|
|
| |
16
|
|
 |
17
|
Daniel Lenoski , James Laudon , Truman Joe , David Nakahira , Luis Stevens , Anoop Gupta , John Hennessy, The DASH prototype: implementation and performance, Proceedings of the 19th annual international symposium on Computer architecture, p.92-103, May 19-21, 1992, Queensland, Australia
|
 |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
 |
22
|
Todd C. Mowry , Monica S. Lam , Anoop Gupta, Design and evaluation of a compiler algorithm for prefetching, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.62-73, October 12-15, 1992, Boston, Massachusetts, United States
|
| |
23
|
|
| |
24
|
V. S. Pai, P. Ranganathan, and S. V. Adve. RSIM: An Execution-Driven Simulator for ILP-Based Shared- Memory Multiprocessors and Uniprocessors. In Proceedings of the Third Workshop on Computer Architecture Education, February 1997.
|
| |
25
|
|
| |
26
|
D. K. Poulsen and P.-C. Yew. Data prefetching and data forwarding in shared memory multiprocessors. In Proceedings of the ~Jrd International Conference on Parallel Processing. Volume 2: Software, pages 276- 280, Aug. 1994.
|
| |
27
|
A. J. Smith. Sequential program prefetching in memory hierarchies. IEBE Computer, 11(12):7-21, Dec. 1978.
|
 |
28
|
|
 |
29
|
|
 |
30
|
|
 |
31
|
Kenneth M. Wilson , Kunle Olukotun , Mendel Rosenblum, Increasing cache port efficiency for dynamic superscalar microprocessors, Proceedings of the 23rd annual international symposium on Computer architecture, p.147-157, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE Design Automation Conference on
Gwo-Dong Chen
, Daniel D. Gajski
|