|
ABSTRACT
The ever-increasing computational power of contemporary microprocessors reduces the execution time spent on arithmetic computations (i.e., the computations not involving slow memory operations such as cache misses) significantly. Therefore, for memory intensive workloads, it becomes more important to overlap multiple cache misses than to overlap slow memory operations with other computations. In this paper, we propose a novel technique to parallelize sequential cache misses, thereby increasing memory-level parallelism (MLP). Our idea is based on the value prediction, which was proposed originally as an instruction-level-parallelism (ILP) optimization to break true data dependencies. In this paper, we advocate value prediction in its capability to enhance MLP instead of ILP. We propose to use value prediction and value speculative execution only for prefetching so that the complex prediction validation and misprediction recovery mechanisms are avoided and only minor changes in the microarchitecture are needed. The same hardware modifications also enable aggressive memory disambiguation for prefetching. The experimental results show that our technique enhances MLP effectively and achieves significant speedups even with a simple stride value predictor.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Santosh G. Abraham , Rabin A. Sugumar , Daniel Windheiser , B. R. Rau , Rajiv Gupta, Predictability of load/store instruction latencies, Proceedings of the 26th annual international symposium on Microarchitecture, p.139-152, December 01-03, 1993, Austin, Texas, United States
|
 |
2
|
Michael Bekerman , Stephan Jourdan , Ronny Ronen , Gilad Kirshenboim , Lihu Rappoport , Adi Yoaz , Uri Weiser, Correlated load-address predictors, Proceedings of the 26th annual international symposium on Computer architecture, p.54-63, May 01-04, 1999, Atlanta, Georgia, United States
|
 |
3
|
|
| |
4
|
B. Calder and G. Reinman, "A comparative survey of load speculation architecures", Journal of Instruction-Level Parallelism, 2000.
|
| |
5
|
|
 |
6
|
Jamison D. Collins , Hong Wang , Dean M. Tullsen , Christopher Hughes , Yong-Fong Lee , Dan Lavery , John P. Shen, Speculative precomputation: long-range prefetching of delinquent loads, Proceedings of the 28th annual international symposium on Computer architecture, p.14-25, June 30-July 04, 2001, Göteborg, Sweden
|
 |
7
|
|
 |
8
|
|
| |
9
|
F. Gabbay and A. Mendelson, "Speculative execution based on value prediction," EE Department Tech Report 1080, Tachnion - Israel Institute of Technology, Nov. 1996.
|
 |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
T. Karkhanis and J. Smith, "A Day in the Life of a Cache Miss", Proceeding of the 2nd Annual Workshop on Memory Performance Issues (WMPI 2002), 2002.
|
 |
14
|
Alvin R. Lebeck , Jinson Koppanalil , Tong Li , Jaidev Patwardhan , Eric Rotenberg, A large, fast instruction window for tolerating cache misses, Proceedings of the 29th annual international symposium on Computer architecture, p.59, May 25-29, 2002, Anchorage, Alaska
|
| |
15
|
|
| |
16
|
|
 |
17
|
Mikko H. Lipasti , Christopher B. Wilkerson , John Paul Shen, Value locality and load value prediction, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.138-147, October 01-04, 1996, Cambridge, Massachusetts, United States
|
 |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
|
 |
23
|
|
| |
24
|
|
| |
25
|
|
 |
26
|
|
| |
27
|
|
 |
28
|
|
| |
29
|
H. Zhou and T. Conte, "Performance modeling of memory latency hiding techniques", Technical Report, ECE Department, N. C. State University, Dec. 2002.
|
 |
30
|
|
CITED BY 11
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Thomas R. Puzak , A. Hartstein , P. G. Emma , V. Srinivasan , Jim Mitchell, An analysis of the effects of miss clustering on the cost of a cache miss, Proceedings of the 4th international conference on Computing frontiers, May 07-09, 2007, Ischia, Italy
|
|
|
|
|
|
Smruti R. Sarangi , Wei Liu, Josep Torrellas , Yuanyuan Zhou, ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.257-270, November 12-16, 2005, Barcelona, Spain
|
|
|
|
|
|
|
|
|
Thomas R. Puzak , A. Hartstein , P. G. Emma , V. Srinivasan , Arthur Nadas, Pipeline spectroscopy, Proceedings of the 2007 workshop on Experimental computer science, p.15-es, June 13-14, 2007, San Diego, California
|
|
|
|
|