ACM Home Page
Please provide us with feedback. Feedback
Checkpoint allocation and release
Full text PdfPdf (1.06 MB)
Source
ACM Transactions on Architecture and Code Optimization (TACO) archive
Volume 6 ,  Issue 3  (September 2009) table of contents
Article No. 10  
Year of Publication: 2009
ISSN:1544-3566
Authors
Amit Golander  Tel-Aviv University, Tel-Aviv, Israel
Shlomo Weiss  Tel-Aviv University, Tel-Aviv, Israel
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 64,   Downloads (12 Months): 64,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1582710.1582712
What is a DOI?

ABSTRACT

Out-of-order speculative processors need a bookkeeping method to recover from incorrect speculation. In recent years, several microarchitectures that employ checkpoints have been proposed, either extending the reorder buffer or entirely replacing it. This work presents an in-dept-study of checkpointing in checkpoint-based microarchitectures, from the desired content of a checkpoint, via implementation trade-offs, and to checkpoint allocation and release policies. A major contribution of the article is a novel adaptive checkpoint allocation policy that outperforms known policies. The adaptive policy controls checkpoint allocation according to dynamic events, such as second-level cache misses and rollback history. It achieves 6.8% and 2.2% speedup for the integer and floating point benchmarks, respectively, and does not require a branch confidence estimator. The results show that the proposed adaptive policy achieves most of the potential of an oracle policy whose performance improvement is 9.8% and 3.9% for the integer and floating point benchmarks, respectively. We exploit known techniques for saving leakage power by adapting and applying them to checkpoint-based microarchitectures. The proposed applications combine to reduce the leakage power of the register file to about one half of its original value.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Akkary, H., Rajwar, R., and Srinivasan, S. 2004. An analysis of a resource efficient checkpoint architecture. ACM Trans. Archit. Code Optim. 1, 4, 418--444.
 
2
Akkary, H., Srinivasan, S., Koltur, R., Patil, Y., and Refaai, W. 2004. Perceptron-based branch confidence estimation. In Proceedings of the 10th IEEE International Symposium on High-Performance Computer Architecture. IEEE, Los Alamitos, CA, 265.
 
3
Akl, P. and Moshovos, A. 2006. Branchtap: Improving performance with very few checkpoints through adaptive speculation control. In Proceedings of the 20th Annual International Conference on Super-Computing. ACM, New York, 36--45.
 
4
Ayala, J., Lopez-Vallejo, M., Veidenbaum, A., and Lopez, C. 2003. Energy aware register file implementation through instruction predecode. In Proceedings of the International Conference on Application-Specific Systems, Architectures, and Processors. IEEE, Los Alamitos, CA, 86--96.
 
5
Burger, D. and Austin, T. 1997. The simple scalar tool set. SIGARCH Comput. Archit. News 25, 3, 13--25.
 
6
Ceze, L., Strauss, K., Tuck, J., Torrellas, J., and Renau, J. 2006. Cava: Using checkpoint-assisted value prediction to hide L2 misses. ACM Trans. Archit. Code Optim. 3, 2, 182--208.
 
7
Cristal, A., Santana, O., Valero, M., and Martinez, J. 2004. Toward kilo-instruction processors. ACM Trans. Archit. Code Optim. 1, 4, 389--417.
 
8
Ergin, O., Balkan, D., Ghose, K., and Ponomarev, D. 2004. Register packing: Exploiting narrow-width operands for reducing register file pressure. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, New York, 304--315.
 
9
Ergin, O., Balkan, D., Ponomarev, D., and Ghose, K. 2006. Early register deallocation mechanisms using checkpointed register files. IEEE Trans. Comput. 55, 9, 1153--1166.
 
10
Gandhi, A., Akkary, H., Rajwar, R., Srinivasan, S., and Lai, K. 2005. Scalable load and store processing in latency tolerant processors. In Proceedings of the 32nd Annual International Symposium on Computer Architecture. ACM, New York, 446--457.
 
11
Golander, A. and Weiss, S. 2007. Re-execution and selective reuse in checkpoint processors. HiPEAC J. 2, 3, 129--152.
 
12
Golander, A. and Weiss, S. 2008. Hiding the misprediction penalty of a resource-efficient high-performance processor. ACM Trans. Archit. Code Optim. 4, 4.
 
13
Goto, M. and Sato, T. 2004. Leakage energy reduction in register renaming. In Proceedings of the 24th International Conference on Distributed Computing Systems Workshops. IEEE, Los Alamitos, CA, 890--895.
 
14
Grunwald, D., Klauser, A., Manne, S., and Pleszkun, A. 1998. Confidence estimation for speculation control. In Proceedings of the 25th Annual International Symposium on Computer Architecture. IEEE, Los Alamitos, CA, 122--131.
 
15
Hwu, W. and Patt, Y. 1987. Checkpoint repair for high-performance out-of-order execution machines. IEEE Trans. Comput. 36, 12, 1496--1514.
 
16
Jacobsen, E., Rotenberg, E., and Smith, J. 1996. Assigning confidence to conditional branch predictions. In Proceedings of the 29th Annual International Symposium on Microarchitecture. ACM, New York, 142--152.
 
17
Kalla, R., Sinharoy, B., and Tendler, J. 2004. IBM POWER5 chip: A dual-core multi-threaded processor. IEEE Micro. 24, 2, 40--47.
 
18
Kessler, R. 1999. The Alpha 21264 microprocessor. IEEE Micro. 19, 2, 24--36.
 
19
Khasawneh, S. and Ghose, K. 2005. An adaptive technique for reducing leakage and dynamic power in register files and reorder buffers. In Proceedings of the 15th International Workshop on Integrated Circuit and System Design, Power and Timing Modeling, Optimization and Simulation. Springer, Berlin, 498--507.
 
20
Kim, N., Flautner, K., Blaauw, D., and Mudge, T. 2004. Circuit and microarchitectural techniques for reducing cache leakage power. IEEE Trans. VLSI Syst. 12, 2, 167--184.
 
21
Kirman, M., Kirman, N., and Martynez, J. 2005. Cherry-mp: Correctly integrating checkpointed early resource recycling in chip multiprocessors. In Proceedings of the 38th Annual International Symposium on Microarchitecture. ACM, New York, 245--256.
 
22
Kirman, N., Kirman, M., Chaudhuri, M., and Martinez, J. 2005. Checkpointed early load retirement. In Proceedings of the 11th IEEE International Symposium on High-Performance Computer Architecture. IEEE, Los Alamitos, CA, 16--27.
 
23
Lipasti, M. and Shen, J. 1996. Exceeding the dataflow limit via value prediction. In Proceedings of the 29th Annual International Symposium on Microarchitecture. ACM, New York, 226--237.
 
24
Maliniak, D. 2007. Stanch the bleeding of leakage power at 65nm. Tech. rep. http://electronicdesign.com/Articles/ArticleID/17402/17402.html
 
25
Manne, S., Klauser, A., and Grunwald, D. 1998. Pipeline gating: Speculation control for energy reduction. In Proceedings of the 25th Annual International Symposium on Computer Architecture. IEEE, Los Alamitos, CA, 132--141.
 
26
Martinez, J., Renau, J., Huang, M., Prvulovic, M., and Torrellas, J. 2002. Cherry: Checkpointed early resource recycling in out-of-order microprocessors. In Proceedings of the 35th Annual International Symposium on Microarchitecture. ACM, New York, 3--14.
 
27
McGhan, H. 2006. Power: The sixth generation. Microprocessor Rep. 10.
 
28
Meaney, P., Swaney, S., Sanda, P., and Painhower, L. 2005. IBM z990 soft error detection and recovery. IEEE Trans. Device Mater. Reliab. 5, 3, 419--427.
 
29
Moshovos, A. 2003. Checkpointing alternatives for high-performance, power-aware processors. In Proceedings of the International Symposium on Low-Power Electronics and Design. ACM, New York, 318--321.
 
30
Moudgill, M., Pingali, K., and Vassiliadis, S. 1993. Register renaming and dynamic speculation: An alternative approach. In Proceedings of the 26th Annual International Symposium on Microarchitecture. ACM, New York, 202--213.
 
31
Mutlu, O., Stark, J., Wilkerson, C., and Patt, Y. 2003. Run ahead execution: An alternative to very large instruction windows for out-of-order processors. In Proceedings of the 9th IEEE International Symposium on High-Performance Computer Architecture. IEEE, Los Alamitos, CA, 129--140.
 
32
Powell, M., Yang, S., Falsafi, B., Roy, K., and Vijaykumar, T. 2000. Gated-vdd: A circuit technique to reduce leakage in cache memories. In Proceedings of the International Symposium on Low-Power Electronics and Design. ACM, New York, 90--95.
 
33
Racunas, P., Constantinides, K., Manne, S., and Mukherjee, S. 2007. Perturbation-based fault screening. In Proceedings of the 13th International IEEE Symposium on High-Performance Computer Architecture. IEEE, Los Alamitos, CA, 169--180.
 
34
Ramirez, T., Pajuelo, A., Santana, O., and Valero, M. 2006. Kilo-instruction processors, run ahead and prefetching. In Proceedings of the 3rd Conference on Computing Frontiers. ACM, New York, 269--278.
 
35
Sangireddy, R. 2006. Reducing rename logic complexity for high-speed and low-power front-end architectures. IEEE Trans. Comput. 55, 6, 672--685.
 
36
Seznec, A. and Michaud, P. 2006. A case for (partially) TAgged GEometric history length branch prediction. J. Instruc.-Level Paral. 8.
 
37
Shieh, W. and Hsu, S. 2006. Power-aware register assignment for multi-banked register files. In Proceedings of the International Computer Symposium. ACM, New York, 99--104.
 
38
Skadron, K., Ahuja, P., Martonosi, M., and Clark, D. 1998. Improving prediction for procedure returns with return-address-stack repair mechanisms. In Proceedings of the 31st Annual International Symposium on Microarchitecture. ACM, New York, 259--271.
 
39
Skadron, K., Martonosi, M., and Clark, D. 2000. Speculative updates of local and global branch history: A quantitative analysis. J. Instruc.-Level Paral. 2.
 
40
Smith, J. and Pleszkun, A. 1988. Implementing precise interrupts in pipelined processors. IEEE Trans. Comput. 37, 5, 562--573.
 
41
Srinivasan, S., Rajwar, R., Akkary, H., Gandhi, A., and Upton, M. 2004. Continual flow pipelines. In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, 107--119.
 
42
Tarjan, D., Thoziyoor, S., and Jouppi, N. 2006. Cacti 4.0. Tech. rep. HPL-2006-86, HP Laboratories Palo Alto.
 
43
Wilkes, M. 2001. The memory gap and the future of high-performance memories. SIGARCH Comput. Archit. News 29, 1, 2--7.
 
44
Yeager, K. 1996. The MIPS R10000 super-scalar microprocessor. IEEE Micro. 16, 2, 28--40.