| Trace preconstruction |
| Full text |
Pdf
(121 KB)
|
| Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 27th annual international symposium on Computer architecture
table of contents
Vancouver, British Columbia, Canada
Pages: 37 - 46
Year of Publication: 2000
ISBN:1-58113-232-8
Also published in ...
|
|
Authors
|
|
Quinn Jacobson
|
Sun Microsystems, 901 San Antonio Road, Palo Alto, CA
|
|
James E. Smith
|
University of Wisconsin, Madison, Department of Electrical & Computer Engineering, Madison, WI
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 10, Downloads (12 Months): 25, Citation Count: 7
|
|
|
ABSTRACT
Trace caches enable high bandwidth, low latency instruction supply, but have a high miss penalty and relatively large working sets. Consequently, their performance may suffer due to capacity and compulsory misses. Trace preconstruction augments a trace cache by performing a function analogous to prefetching. The trace preconstruction mechanism observes the processor's instruction dispatch stream to detect opportunities for jumping ahead of the processor. After doing so, the preconstruction mechanism fetches static instructions from the predicted future region of the program, and constructs a set of traces in advance of when they are needed.
Trace preconstruction can significantly increase both the performance of the trace cache and the robustness of the trace cache to varying workloads. All but one of the SPECint95 benchmarks see a notable reduction in trace cache miss rates from preconstruction. The three benchmarks that have the largest working set (gee, go and vortex) see a 30% to 80% reduction in trace cache misses. We also consider the integration of preconstruction with another trace-specific mechanism (preprocessing) to produce a high performance frontend. When combined, preconstruction and trace preprocessing produce an average speedup of 14% for the SPECint95 benchmarks.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
D. Burger, T. Austin and S. Bennett, "Evaluating Future Microprocessors: The SimpleScalar Tool Set," University of Wisconsin- Madison Technical Report #1308, July 1996.
|
| |
3
|
|
| |
4
|
|
 |
5
|
Anoop Gupta , John Hennessy , Kourosh Gharachorloo , Todd Mowry , Wolf-Dietrich Weber, Comparative evaluation of latency reducing and tolerating techniques, Proceedings of the 18th annual international symposium on Computer architecture, p.254-263, May 27-30, 1991, Toronto, Ontario, Canada
|
| |
6
|
|
| |
7
|
Quinn Jacobson , Eric Rotenberg , James E. Smith, Path-based next trace prediction, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.14-23, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
8
|
|
| |
9
|
S. Patel, D. Friendly and Y. Patt, "Critical Issues Regarding the Trace Cache Fetch Mechanism." University of Michigan Technical Report CSE-TR- 335-97, 1997.
|
| |
10
|
|
| |
11
|
Eric Rotenberg , Quinn Jacobson , Yiannakis Sazeides , Jim Smith, Trace processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.138-148, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
12
|
A. J. Smith, "Sequential Program Prefetching in Memory Hierarchies," IEEE Computer 11 (12), pp. 7- 21, Dec 1978.
|
| |
13
|
|
| |
14
|
|
| |
15
|
C. Young, E. Shekita, "An Intelligent I-Cache Prefetch Mechanism," in Proceedings of the International Conference on Computer Design, pp. 44-49, Oct 1993.
|
CITED BY 7
|
|
Roni Rosner , Micha Moffie , Yiannakis Sazeides , Ronny Ronen, Selecting long atomic traces for high coverage, Proceedings of the 17th annual international conference on Supercomputing, June 23-26, 2003, San Francisco, CA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yoav Almog , Roni Rosner , Naftali Schwartz , Ari Schmorak, Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, p.137, March 20-24, 2004, Palo Alto, California
|
|
|
Juan C. Moure , Domingo Benítez , Dolores I. Rexachs , Emilio Luque, Wide and efficient trace prediction using the local trace predictor, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
|
|