|
ABSTRACT
Although some instructions hurt performance more than others, current processors typically apply scheduling and speculation as if each instruction was equally costly. Instruction cost can be naturally expressed through the critical path: if we could predict it at run-time, egalitarian policies could be replaced with cost-sensitive strategies that will grow increasingly effective as processors become more parallel.
This paper introduces a hardware predictor of instruction criticality and uses it to improve performance. The predictor is both effective and simple in its hardware implementation. The effectiveness at improving performance stems from using a dependence-graph model of the microarchitectural critical path that identifies execution bottlenecks by incorporating both data and machine-specific dependences. The simplicity stems from a token-passing algorithm that computes the critical path without actually building the dependence graph.
By focusing processor policies on critical instructions, our predictor enables a large class of optimizations. It can (i) give priority to critical instructions for scarce resources (functional units, ports, predictor entries); and (ii) suppress speculation on non-critical instructions, thus reducing “useless” misspeculations. We present two case studies that illustrate the potential of the two types of optimization, we show that (i) critical-path-based dynamic instruction scheduling and steering in a clustered architecture improves performance by as much as 21% (10% on average); and (ii) focusing value prediction only on critical instructions improves performance by as much as 5%, due to removing nearly half of the misspeculations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Vikas Agarwal , M. S. Hrishikesh , Stephen W. Keckler , Doug Burger, Clock rate versus IPC: the end of the road for conventional microarchitectures, Proceedings of the 27th annual international symposium on Computer architecture, p.248-259, June 2000, Vancouver, British Columbia, Canada
|
 |
2
|
Amirali Baniasadi , Andreas Moshovos, Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.337-347, December 2000, Monterey, California, United States
[doi> 10.1145/360128.360165]
|
 |
3
|
Paul Barford , Mark Crovella, Critical path analysis of TCP transactions, Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, p.127-138, August 28-September 01, 2000, Stockholm, Sweden
|
| |
4
|
D. C. Burger and T. M. Austin. The simplescalar tool set, version 2.0. Technical Report CS-TR-1997-1342, University of Wisconsin, Madison, June 1997.
|
 |
5
|
Brad Calder , Glenn Reinman , Dean M. Tullsen, Selective value prediction, Proceedings of the 26th annual international symposium on Computer architecture, p.64-74, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
6
|
Keith I. Farkas , Paul Chow , Norman P. Jouppi , Zvonko Vranesic, The multicluster architecture: reducing cycle time through partitioning, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.149-159, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
7
|
|
| |
8
|
L. Gwennap. Digital 21264 sets new standard. Microprocessor Report, 10:9-15, October 1996.
|
| |
9
|
|
| |
10
|
G. Kemp and M. Franklin. PEWs: A decentralized dynamic scheduling algorithm for ILP processing. In International Conference on Parallel Processing, pages 239-246, Aug 1996.
|
| |
11
|
|
 |
12
|
Subbarao Palacharla , Norman P. Jouppi , J. E. Smith, Complexity-effective superscalar processors, Proceedings of the 24th annual international symposium on Computer architecture, p.206-218, June 01-04, 1997, Denver, Colorado, United States
|
| |
13
|
Eric Rotenberg , Quinn Jacobson , Yiannakis Sazeides , Jim Smith, Trace processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.138-148, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
14
|
|
 |
15
|
Michael Schlansker , Vinod Kathail , Sadun Anik, Height reduction of control recurrences for ILP processors, Proceedings of the 27th annual international symposium on Microarchitecture, p.40-51, November 30-December 02, 1994, San Jose, California, United States
[doi> 10.1145/192724.192729]
|
 |
16
|
Michael Schlansker , Scott Mahlke , Richard Johnson, Control CPR: a branch height reduction optimization for EPIC architectures, Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, p.155-168, May 01-04, 1999, Atlanta, Georgia, United States
|
 |
17
|
Srikanth T. Srinivasan , Roy Dz-ching Ju , Alvin R. Lebeck , Chris Wilkerson, Locality vs. criticality, Proceedings of the 28th annual international symposium on Computer architecture, p.132-143, June 30-July 04, 2001, Göteborg, Sweden
|
| |
18
|
|
| |
19
|
Jared Stark , Paul Racunas , Yale N. Patt, Reducing the performance impact of instruction cache misses by writing instructions into the reservation stations out-of-order, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.34-43, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
20
|
D. Tullsen and B. Calder. Computing along the critical path. Technical report, University of California, San Diego, Oct 1998.
|
| |
21
|
|
| |
22
|
|
 |
23
|
Adi Yoaz , Mattan Erez , Ronny Ronen , Stephan Jourdan, Speculation techniques for improving load related instruction scheduling, Proceedings of the 26th annual international symposium on Computer architecture, p.42-53, May 01-04, 1999, Atlanta, Georgia, United States
|
CITED BY 47
|
|
R. González , A. Cristal , M. Pericas , M. Valero , A. Veidenbaum, An asymmetric clustered processor based on value content, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Youfeng Wu , Ryan Rakvic , Li-Ling Chen , Chyi-Chang Miao , George Chrysos , Jesse Fang, Compiler managed micro-cache bypassing for high performance EPIC processors, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Karthikeyan Sankaralingam , Ramadass Nagarajan , Robert McDonald , Rajagopalan Desikan , Saurabh Drolia , M. S. Govindan , Paul Gratz , Divya Gulati , Heather Hanson , Changkyu Kim , Haiming Liu , Nitya Ranganathan , Simha Sethumadhavan , Sadia Sharif , Premkishore Shivakumar , Stephen W. Keckler , Doug Burger, Distributed Microarchitectural Protocols in the TRIPS Prototype Processor, Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, p.480-491, December 09-13, 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fernando Latorre , Grigorios Magklis , José González , Pedro Chaparro , Antonio González, Building a large instruction window through ROB compression, Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture, p.41-48, September 16-16, 2007, Brasov, Romania
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|