|
ABSTRACT
ILP processors can execute a large number of instructions at the same time. Thus it becomes more and more difficult to support traps efficiently. On the other hand a current trend in architecture is to support various memory functions in software rather than hardware, usually by trapping the execution processor on a cache miss, TLB miss or a failed access to a local or remote memory. These late memory traps block the faulting instruction at the top of the active list, backing up the pipeline. Moreover the support for late memory traps may affect the performance of non-faulting memory instructions as well.In this paper we analyze the overhead caused by late memory traps in ILP processors and define several measures for this overhead. In order to tolerate late memory traps, we propose hardware prefetching of exception conditions and a tagged Store buffer to implement deferred traps on Stores. We show that, with these hardware optimizations, the overhead added by the lateness of traps is significantly reduced relative to the overhead of early traps. Because of caching effects the frequency of late memory traps usually decreases as they are taken deeper in the memory hierarchy and their overall impact on the execution time becomes negligible.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
|
 |
4
|
|
 |
5
|
|
 |
6
|
|
 |
7
|
|
 |
8
|
Kourosh Gharachorloo , Anoop Gupta , John Hennessy, Performance evaluation of memory consistency models for shared-memory multiprocessors, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.245-257, April 08-11, 1991, Santa Clara, California, United States
|
| |
9
|
Kourosh Gharachorloo, Anoop Gupta, and John Hennessy, "Two Techniques to Enhance the Performance of Memory Consistency Models", In Proceedings of the International Conference on Parallel Processing, pages 1355-1364, 1991
|
 |
10
|
|
 |
11
|
|
 |
12
|
Mark Horowitz , Margaret Martonosi , Todd C. Mowry , Michael D. Smith, Informing memory operations: providing memory performance feedback in modern processors, Proceedings of the 23rd annual international symposium on Computer architecture, p.260-270, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
| |
13
|
|
 |
14
|
|
| |
15
|
|
 |
16
|
David Nagle , Richard Uhlig , Tim Stanley , Stuart Sechrest , Trevor Mudge , Richard Brown, Design tradeoffs for software-managed TLBs, Proceedings of the 20th annual international symposium on Computer architecture, p.27-38, May 16-19, 1993, San Diego, California, United States
|
| |
17
|
Vijay Pai, Parthasarathy Ranganathan, Sarita Adve, "RSIM Reference Manual", Technical Report 9705, Department of Electrical and Computer Engineering, Rice University, Aug, 1997
|
 |
18
|
Vijay S. Pai , Parthasarathy Ranganathan , Sarita V. Adve , Tracy Harton, An evaluation of memory consistency models for shared-memory systems with ILP processors, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.12-23, October 01-04, 1996, Cambridge, Massachusetts, United States
|
 |
19
|
|
 |
20
|
Parthasarathy Ranganathan , Vijay S. Pai , Sarita V. Adve, Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models, Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, p.199-210, June 23-25, 1997, Newport, Rhode Island, United States
[doi> 10.1145/258492.258512]
|
 |
21
|
Ioannis Schoinas , Babak Falsafi , Alvin R. Lebeck , Steven K. Reinhardt , James R. Larus , David A. Wood, Fine-grain access control for distributed shared memory, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.297-306, October 05-07, 1994, San Jose, California, United States
|
 |
22
|
|
| |
23
|
Yong Ho Song and Michel Dubois,"Assisted Execution", Technical Report #CENG 98-25, Department of EE-Systems, University of Southern California, October 1998.
|
 |
24
|
|
 |
25
|
Madhusudhan Talluri , Shing Kong , Mark D. Hill , David A. Patterson, Tradeoffs in supporting two page sizes, Proceedings of the 19th annual international symposium on Computer architecture, p.415-424, May 19-21, 1992, Queensland, Australia
|
| |
26
|
Patricia Teller, and Allan Gottlieb. "Locating Multiprocessor TLBs at Memory," In Proceedings of the 27th Annual Hawaii International Conference on System Science, pages 554-563, 1994.
|
| |
27
|
David Weaver and Tom Germond, "The SPARC Architecture Manual", version 9, Prentice Hall, t994.
|
 |
28
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
| |
29
|
|
 |
30
|
Donald Yeung , John Kubiatowicz , Anant Agarwal, MGS: a multigrain shared memory system, Proceedings of the 23rd annual international symposium on Computer architecture, p.44-55, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
CITED BY 7
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Håkan Zeffer , Zoran Radović , Martin Karlsson , Erik Hagersten, TMA: a trap-based memory architecture, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
|
|
|
|
|
|
|
|