ACM Home Page
Please provide us with feedback. Feedback
Exploring the limits of early register release: Exploiting compiler analysis
Full text PdfPdf (1.10 MB)
Source
ACM Transactions on Architecture and Code Optimization (TACO) archive
Volume 6 ,  Issue 3  (September 2009) table of contents
Article No. 12  
Year of Publication: 2009
ISSN:1544-3566
Authors
Timothy M. Jones  University of Edinburgh, Edinburgh, UK
Michael F. P. O'Boyle  University of Edinburgh, Edinburgh, UK
Jaume Abella  Intel Labs Barcelona—UPC
Antonio González  Intel Labs Barcelona—UPC
Oğuz Ergin  TOBB University of Economics and Technology
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 61,   Downloads (12 Months): 61,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1582710.1582714
What is a DOI?

ABSTRACT

Register pressure in modern superscalar processors can be reduced by releasing registers early and by copying their contents to cheap back-up storage. This article quantifies the potential benefits of register occupancy reduction and shows that existing hardware-based schemes typically achieve only a small fraction of this potential. This is because they are unable to accurately determine the last use of a register and must wait until the redefining instruction enters the pipeline. On the other hand, compilers have a global view of the program and, using simple dataflow analysis, can determine the last use. This article evaluates the extent to which compiler analysis can aid early releasing, explores the design space, and introduces commit and issue-based early releasing schemes, quantifying their benefits. Using simple compiler analysis and microarchitecture changes, we achieve 70% of the potential register file occupancy reduction. By adding more hardware support, we can increase this to 94%. Our schemes are compared to state-of-the-art approaches for varying register file sizes and are shown to outperform these existing techniques.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Abella, J. and González, A. 2003. On reducing register pressure and energy in multiple- banked register files. In Proceedings of the 21st International Conference on Computer Design (ICCD-21). IEEE, Los Alamitos, CA.
 
2
Appel, A. W. 2002. Modern Compiler Implementation in Java. Cambridge University Press, Cambridge, UK.
 
3
Balasubramonian, R., Dwarkadas, S., and Albonesi, D. H. 2001. Reducing the complexity of the register file in dynamic super-scalar processors. In Proceedings of the 34th International Symposium on Microarchitecture (MICRO-34).ACM, New York.
 
4
Borch, E., Manne, S., Emer, J., and Tune, E. 2002. Loose loops sink chips. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture (HPCA-8). IEEE, Los Alamitos, CA.
 
5
Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA-27). ACM, New York.
 
6
Burger, D. and Austin, T. 1997. The simple-scalar tool set, version 2.0. Tech. rep. TR1342, University of Wisconsin-Madison.
 
7
Butts, J. A. 2004. Optimizing inter-instruction value communication through degree of use prediction. Ph.D. thesis, University of Wisconsin-Madison.
 
8
Butts, J. A. and Sohi, G. S. 2004. Use-based register caching with decoupled indexing. In Proceedings of the 31st International Symposium on Computer Architecture (ISCA-31). ACM, New York.
 
9
Canal, R. and González, A. 2001. Reducing the complexity of the issue logic. In Proceedings of the 15th International Conference on Super-Computing (ICS-15). ACM, New York.
 
10
Cruz, J.-L., González, A., Valero, M., and Topham, N. P. 2000. Multiple-banked register file architectures. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA-27). ACM, New York.
 
11
Emer, J. 2001. Ev8: The post-ultimate alpha. In Proceedings of the 10th International Conference on Parallel Architectures and Compilation Techniques (PACT'01). (Keynote.) ACM, New York.
 
12
Ergin, O., Balkan, D., Ghose, K., and Ponomarev, D. 2004. Register packing: Exploiting narrow-width operands for reducing register file pressure. In Proceedings of the 37th International Symposium on Microarchitecture (MICRO-37). ACM, New York.
 
13
Ergin, O., Balkan, D., Ponomarev, D., and Ghose, K. 2004. Increasing processor performance through early register release. In Proceedings of the 22nd International Conference on Computer Design (ICCD-22). IEEE, Los Alamitos, CA.
 
14
Franklin, M. and Sohi, G. S. 1992. Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors. In Proceedings of the 25th International Symposium on Microarchitecture (MICRO-25). ACM, New York.
 
15
González, A., González, J., and Valero, M. 1998. Virtual-physical registers. In Proceedings of the 4th International Symposium on High Performance Computer Architecture (HPCA-4). IEEE, Los Alamitos, CA.
 
16
Gunther, S. H., Binns, F., Carmean, D. M., and Hall, J. C. 2001. Managing the impact of increasing microprocessor power consumption. Intel Tech. J. Q1.
 
17
Hu, Z. and Martonosi, M. 2000. Reducing register file power consumption by exploiting value lifetime. In Proceedings of the Workshop on Complexity Effective Design (WCED) in Conjunction with the 27th International Symposium on Computer Architecture (ISCA-27). ACM, New York.
 
18
Jones, T. M., O'Boyle, M. F., Abella, J., and González, A. 2005. Software directed issue queue power reduction. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA-11). IEEE, Los Alamitos, CA.
 
19
Jones, T. M., O'Boyle, M. F. P., Abella, J., González, A., and Ergin, O. 2005. Compiler directed early register release. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT). ACM, New York.
 
20
Kim, N. S., Flautner, K., Blaauw, D., and Mudge, T. 2004. Single-VDD and single-VT super-drowsy techniques for low-leakage high-performance instruction caches. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED). ACM, New York.
 
21
Kim, N. S. and Mudge, T. 2003. The microarchitecture of a low-power register file. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED). ACM, New York.
 
22
Lipasti, M. H., Mestan, B. R., and Gunadi, E. 2004. Physical register in lining. In Proceedings of the 31st International Symposium on Computer Architecture (ISCA-31). ACM, New York.
 
23
Lo, J. L., Parekh, S. S., Eggers, S. J., Levy, H. M., and Tullsen, D. M. 1999. Software-directed register deallocation for simultaneous multithreaded processors. IEEE Trans. Paral. Distrib. Syst. 10, 9.
 
24
Martin, M. M., Roth, A., and Fischer, C. N. 1997. Exploiting dead value information. In Proceedings of the 30th International Symposium on Microarchitecture (MICRO-30). ACM, New York.
 
25
Martinez, J. F., Renau, J., Huang, M. C., Prvulovic, M., and Torrellas, J. 2002. Cherry: Check-pointed early resource recycling in out-of-order microprocessors. In Proceedings of the 35th International Symposium on Microarchitecture (MICRO-35). ACM, New York.
 
26
Monreal, T., Viñals, V., González, A., and Valero, M. 2002. Hardware schemes for early register release. In Proceedings of the International Conference on Parallel Processing (ICPP). IEEE, Los Alamitos, CA.
 
27
Moudgill, M., Pingali, K., and Vassiliadis, S. 1993. Register renaming and dynamic speculation: An alternative approach. In Proceedings of the 26th International Symposium on Microarchitecture (MICRO-26). ACM, New York.
 
28
Park, I., Powell, M. D., and Vijaykumar, T. N. 2002. Reducing register ports for higher speed and lower energy. In Proceedings of the 35th International Symposium on Microarchitecture (MICRO-35). ACM, New York.
 
29
Savransky, G., Ronen, R., and González, A. 2004. Lazy retirement: A power aware register management mechanism. In Proceedings of the Workshop on Complexity Effective Design (WCED) in Conjunction with the 27th International Symposium on Computer Architecture (ISCA-27). ACM, New York.
 
30
Smith, M. D. and Holloway, G. 2000. The Machine-SUIF documentation set. http://www.eecs. harvard.edu/machsuif/software/software.html.
 
31
Tarjan, D., Thoziyoor, S., and Jouppi, N. P. 2006. CACTI 4.0. Tech. rep. HPL-2006-86, HP Laboratories Palo Alto.
 
32
Tran, L., Nelson, N., Ngai, F., Dropsho, S., and Huang, M. 2004. Dynamically reducing pressure on the physical register file through simple register sharing. In Proceedings of the International Symposium on Performance Analysis of Systems and Software. IEEE, Los Alamitos, CA.
 
33
Tseng, J. H. and Asanović, K. 2003. Banked multiported register files for high-frequency super-scalar microprocessors. In Proceedings of the 30th International Symposium on Computer Architecture (ISCA-30). ACM, New York.
 
34
Wallace, S. and Bagherzadeh, N. 1996. A scalable register file architecture for dynamically scheduled processors. In Proceedings of the 5th International Conference on Parallel Architectures and Compilation Techniques (PACT). ACM, New York.