| Execution-based prediction using speculative slices |
| Full text |
Pdf
(1.03 MB)
|
| Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 28th annual international symposium on Computer architecture
table of contents
Göteborg, Sweden
Pages: 2 - 13
Year of Publication: 2001
ISBN:0-7695-1162-7
Also published in ...
|
|
Authors
|
|
Craig Zilles
|
Computer Sciences Dept. University of Wisconsin - Madison, 1210 West Dayton Street, Madison, WI
|
|
Gurindar Sohi
|
Computer Sciences Dept. University of Wisconsin - Madison, 1210 West Dayton Street, Madison, WI
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 29, Citation Count: 55
|
|
|
ABSTRACT
A relatively small set of static instructions has significant leverage on program execution performance. These problem instructions contribute a disproportionate number of cache misses and branch mispredictions because their behavior cannot be accurately anticipated using existing prefetching or branch prediction mechanisms.
The behavior of many problem instructions can be predicted by executing a small code fragment called a speculative slice. If a speculative slice is executed before the corresponding problem instructions are fetched, then the problem instructions can move smoothly through the pipeline because the slice has tolerated the latency of the memory hierarchy (for loads) or the pipeline (for branches). This technique results in speedups up to 43 percent over an aggressive baseline machine.
To benefit from branch predictions generated by speculative slices, the predictions must be bound to specific dynamic branch instances. We present a technique that invalidates predictions when it can be determined (by monitoring the program's execution path) that they will not be used. This enables the remaining predictions to be correctly correlated.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Santosh G. Abraham , Rabin A. Sugumar , Daniel Windheiser , B. R. Rau , Rajiv Gupta, Predictability of load/store instruction latencies, Proceedings of the 26th annual international symposium on Microarchitecture, p.139-152, December 01-03, 1993, Austin, Texas, United States
|
| |
2
|
D. Burger and T. Austin. The SimpleScalar Tool Set, Version 2.0. Technical Report CS-TR-97-1342, University of Wisconsin-Madison, Jun. 1997.
|
 |
3
|
Robert S. Chappell , Jared Stark , Sangwook P. Kim , Steven K. Reinhardt , Yale N. Patt, Simultaneous subordinate microthreading (SSMT), Proceedings of the 26th annual international symposium on Computer architecture, p.186-195, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
Alexandre Farcy , Olivier Temam , Roger Espasa , Toni Juan, Dataflow analysis of branch mispredictions and its application to early resolution of branch outcomes, Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, p.59-68, November 1998, Dallas, Texas, United States
|
| |
8
|
|
| |
9
|
|
 |
10
|
Amir Roth , Andreas Moshovos , Gurindar S. Sohi, Dependence based prefetching for linked data structures, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.115-126, October 02-07, 1998, San Jose, California, United States
|
 |
11
|
|
 |
12
|
|
| |
13
|
|
| |
14
|
Y. Song and M. Dubois. Assisted Execution. Technical Report #CENG 98-25, Department of EE-Systems, University of Southern California, Oct. 1998.
|
 |
15
|
|
 |
16
|
Dean M. Tullsen , Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm, Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, Proceedings of the 23rd annual international symposium on Computer architecture, p.191-202, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
17
|
|
| |
18
|
M Weiser. Program Slicing. IEEE Transactions on Software Engineering, 10(4):352-357, 1984.
|
 |
19
|
|
CITED BY 55
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Tor M. Aamodt , Pedro Marcuello , Paul Chow , Antonio González , Per Hammarlund , Hong Wang , John P. Shen, A framework for modeling and optimization of prescient instruction prefetch, ACM SIGMETRICS Performance Evaluation Review, v.31 n.1, June 2003
|
|
|
|
|
|
|
|
|
Perry H. Wang , Jamison D. Collins , Hong Wang , Dongkeun Kim , Bill Greene , Kai-Ming Chan , Aamir B. Yunus , Terry Sych , Stephen F. Moore , John P. Shen, Helper threads via virtual multithreading on an experimental itanium® 2 processor-based platform, ACM SIGPLAN Notices, v.39 n.11, November 2004
|
|
|
Marco Galluzzi , Valentín Puente , Adrián Cristal , Ramón Beivide , José-Ángel Gregorio , Mateo Valero, A first glance at Kilo-instruction based multiprocessors, Proceedings of the 1st conference on Computing frontiers, April 14-16, 2004, Ischia, Italy
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mingju Li , Elizabeth Varki , Swapnil Bhatia , Arif Merchant, TaP: table-based prefetching for storage caches, Proceedings of the 6th USENIX Conference on File and Storage Technologies, p.1-16, February 26-29, 2008, San Jose, California
|
|
|
Dongkeun Kim , Steve Shih-wei Liao , Perry H. Wang , Juan del Cuvillo , Xinmin Tian , Xiang Zou , Hong Wang , Donald Yeung , Milind Girkar , John P. Shen, Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, p.27, March 20-24, 2004, Palo Alto, California
|
|
|
|
|
|
|
|
|
|
|
|
Gregorio Bernabé , Ricardo Fernández , Jose M. García , Manuel E. Acacio , José González, An efficient implementation of a 3D wavelet transform based encoder on hyper-threading technology, Parallel Computing, v.33 n.1, p.54-72, February, 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yong Chen , Surendra Byna , Xian-He Sun , Rajeev Thakur , William Gropp, Hiding I/O latency with pre-execution prefetching for parallel applications, Proceedings of the 2008 ACM/IEEE conference on Supercomputing, November 15-21, 2008, Austin, Texas
|
|
|
|
|
|
|
|
|
|
|
|
Carlos Madriles , Pedro López , Josep M. Codina , Enric Gibert , Fernando Latorre , Alejandro Martinez , Raúl Martinez , Antonio Gonzalez, Boosting single-thread performance in multi-core systems through fine-grain multi-threading, ACM SIGARCH Computer Architecture News, v.37 n.3, June 2009
|
|