| Boosting trace cache performance with nonhead miss speculation |
| Full text |
Pdf
(180 KB)
|
| Source
|
International Conference on Supercomputing
archive
Proceedings of the 16th international conference on Supercomputing
table of contents
New York, New York, USA
SESSION: Memory-wall
table of contents
Pages: 179 - 188
Year of Publication: 2002
ISBN:1-58113-483-5
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 19, Citation Count: 0
|
|
|
ABSTRACT
Trace caches are used to help dynamic branch prediction make multiple predic¿tions in a cycle by embedding some of the predictions in the trace. In this work, we evaluate a trace cache that is capable of delivering a trace consisting of a variable number of instructions via a linked list mechanism. We evaluate several schemes in the context of an x86 processor model that stores decoded instructions. By developing a new classification for trace cache accesses, we are able to target those misses that cause the largest performance loss. We have pro¿posed a hardware speculation technique, called NonHead Miss Speculation, which removes much of the penalty associated with nonhead misses in the eight applica¿tions we studied. Performance improvements ranged from 2% to 20%, with an average speedup of around 10% across our application suite.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Daniel Holmes Friendly , Sanjay Jeram Patel , Yale N. Patt, Alternative fetch and issue policies for the trace cache fetch mechanism, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.24-33, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
3
|
R. Krick, G. Hinton, M. Upton, D. Sager, C. Lee, "Trace Based Instruction Caching," U. S. Patent 6,018,786, October 1997.
|
| |
4
|
S. W. Melvin , M. C. Shebanow , Y. N. Patt, Hardware support for large atomic units in dynamically scheduled machines, Proceedings of the 21st annual workshop on Microprogramming and microarchitecture, p.60-63, November 28-December 02, 1988, San Diego, California, United States
|
| |
5
|
|
| |
6
|
A. Peleg and U. Weiser, "Dynamic Flow Instruction Cache Memory Organized Around Trace Segments Independent of Virtual Address Line," U.S. Patent Number 5,381,533, 1994.
|
| |
7
|
|
| |
8
|
Standard Performance Evaluation Corporation, CPU2000 documentation. http://www.spec.org/osg/cpu2000/docs
|
| |
9
|
S. Vlaovic, "TAXI: Trace Analysis for X86 Interpretation," PhD Dissertation, University of Michigan, 2002.
|
 |
10
|
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE Design Automation Conference on
Gwo-Dong Chen
, Daniel D. Gajski
|