|
ABSTRACT
This article presents Cool-Mem, a family of memory system architectures that integrate conventional memory system mechanisms, energy-aware address translation, and compiler-enabled cache disambiguation techniques, to reduce energy consumption in general-purpose architectures. The solutions provided in this article leverage on interlayer tradeoffs between architecture, compiler, and operating system layers. Cool-Mem achieves power reduction by statically matching memory operations with energy-efficient cache and virtual memory access mechanisms. It combines statically speculative cache access modes, a dynamic content addressable memory-based (CAM-based) Tag-Cache used as backup for statically mispredicted accesses, different conventional multilevel associative cache organizations, embedded protection checking along all cache access mechanisms, as well as architectural organizations to reduce the power consumed by address translation in virtual memory. Because it is based on speculative static information, a superset of the predictable program information available at compile-time, our approach removes the burden of provable correctness in compiler analysis passes that extract static information. This makes Cool-Mem highly practical, applicable for large and complex applications, without having any limitations due to complexity issues in our compiler passes or the presence of precompiled static libraries. Based on extensive evaluation, for both SPEC2000 and Mediabench applications, we obtain from 6% to 19% total energy savings in the processor, with performance ranging from 1.5% degradation to 6% improvement, for the applications studied. We have also compared Cool-Mem to several prior arts and have found Cool-Mem to perform better in almost all cases.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Rajeev Balasubramonian , David Albonesi , Alper Buyuktosunoglu , Sandhya Dwarkadas, Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.245-257, December 2000, Monterey, California, United States
[doi> 10.1145/360128.360153]
|
 |
3
|
|
| |
4
|
Borkar, S., Ye, Y., and De, V. 1998. A technique for standby leakage reduction in high-performance circuits. In Symposium on VLSI Circuits. 40--41.
|
 |
5
|
|
| |
6
|
Burger, D. C. and Austin, T. M. 1997. The SimpleScalar tool set, version 2.0. Tech. rep. CS-TR-1997--1342, University of Wisconsin-Madison, Madison, WI.
|
| |
7
|
|
| |
8
|
Chase, J. S., Levy, H. M., Lazowska, E. D., and Baker-Harvey, M. 1992. Lightweight shared objects in a 64-bit operating system. Tech. rep. 92-03-09. University of Washington, Seattle, WA (March).
|
 |
9
|
|
| |
10
|
Cheng, R. 1987. Virtual address cache in Unix. In Proceedings of the 1987 Summer Usenix Conference. 217--224.
|
 |
11
|
|
| |
12
|
|
| |
13
|
Digital Equipment Corporation. 1997. 21164 Alpha Microprocessor Hardware Reference Manual. Digital Equipment Corporation, Maynard, MA.
|
 |
14
|
Krisztián Flautner , Nam Sung Kim , Steve Martin , David Blaauw , Trevor Mudge, Drowsy caches: simple techniques for reducing leakage power, Proceedings of the 29th annual international symposium on Computer architecture, p.148, May 25-29, 2002, Anchorage, Alaska
|
 |
15
|
|
 |
16
|
|
 |
17
|
Michael K. Gowan , Larry L. Biro , Daniel B. Jackson, Power considerations in the design of the Alpha 21264 microprocessor, Proceedings of the 35th annual conference on Design automation, p.726-731, June 15-19, 1998, San Francisco, California, United States
[doi> 10.1145/277044.277226]
|
| |
18
|
|
 |
19
|
Zhigang Hu , Philo Juang , Phil Diodato , Stefanos Kaxiras , Kevin Skadron , Margaret Martonosi , Douglas W. Clark, Managing leakage for transient data: decay and quasi-static 4T memory cells, Proceedings of the 2002 international symposium on Low power electronics and design, August 12-14, 2002, Monterey, California, USA
[doi> 10.1145/566408.566423]
|
 |
20
|
Michael Huang , Jose Renau , Seung-Moon Yoo , Josep Torrellas, L1 data cache decomposition for energy efficiency, Proceedings of the 2001 international symposium on Low power electronics and design, p.10-15, August 2001, Huntington Beach, California, United States
[doi> 10.1145/383082.383086]
|
 |
21
|
Koji Inoue , Tohru Ishihara , Kazuaki Murakami, Way-predicting set-associative cache for high performance and low energy consumption, Proceedings of the 1999 international symposium on Low power electronics and design, p.273-275, August 16-17, 1999, San Diego, California, United States
[doi> 10.1145/313817.313948]
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
 |
25
|
Toni Juan , Tomas Lang , Juan J. Navarro, Reducing TLB power requirements, Proceedings of the 1997 international symposium on Low power electronics and design, p.196-201, August 18-20, 1997, Monterey, California, United States
[doi> 10.1145/263272.263332]
|
| |
26
|
Kao, J. T. and Chandrakasan, A. P. 2000. Dual-threshold voltage techniques for low-power digital circuits. IEEE J. Solid-State Circ. 35, 7 (July), 1009--1018.
|
| |
27
|
Johnson Kin , Munish Gupta , William H. Mangione-Smith, The filter cache: an energy efficient memory structure, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.184-193, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
28
|
|
| |
29
|
Kuroda, T., Suzuki, K., Mira, S., Fujita, T., Yamane, F., Sano, F., Akihiko, C., Watanabe, Y., Yoshinori, M., Matsuda, K., Maeda, T., Sakurai, T., and Tohru, F. 1998. Variable supply-voltage scheme for low-power high-speed CMOS digital design. IEEE J. Solid-State Circ. 33, 3 (March), 454--462.
|
| |
30
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
31
|
Ma, A., Zhang, M., and Asanovic, K. 2001. Way memoization to reduce fetch energy in instruction caches. In Workshop on Complexity Effective Design, 28th International Symposium on Computer Architecture (ISCA '01, July).
|
| |
32
|
James Montanaro , Richard T. Witek , Krishna Anne , Andrew J. Black , Elizabeth M. Cooper , Daniel W. Dobberpuhl , Paul M. Donahue , Jim Eno , Gregory W. Hoeppner , David Kruckemyer , Thomas H. Lee , Peter C. M. Lin , Liam Madden , Daniel Murray , Mark H. Pearce , Sribalan Santhanam , Kathryn J. Snyder , Ray Stephany , Stephen C. Thierauf, A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor, Digital Technical Journal, v.9 n.1, p.49-62, 1997
|
| |
33
|
|
| |
34
|
Moritz, C. A., Frank, M., Lee, W., and Amarasinghe, S. 1999. Hot Pages: Software caching for raw microprocessors. MIT-LCS Tech. Memo LCS-TM-599. MIT, Cambridge, MA.
|
| |
35
|
Mutoh, S., Douseki, T., Aoki, Y. M. T., Shingematsu, S., and Yamada, J. 1995. 1-V power supply high-speed digital circuit technology with multi-threshold CMOS technology. IEEE J. Solid-State Circ. 30, 8 (Aug.), 847--854.
|
| |
36
|
|
 |
37
|
Michael Powell , Se-Hyun Yang , Babak Falsafi , Kaushik Roy , T. N. Vijaykumar, Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories, Proceedings of the 2000 international symposium on Low power electronics and design, p.90-95, July 25-27, 2000, Rapallo, Italy
[doi> 10.1145/344166.344526]
|
| |
38
|
Michael D. Powell , Amit Agarwal , T. N. Vijaykumar , Babak Falsafi , Kaushik Roy, Reducing set-associative cache energy via way-prediction and selective direct-mapping, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
| |
39
|
Reinman, G. and Jouppi, N. 2000. An integrated cache timing and power model. Compaq WRL Res. rep. 2000/70 Compaq Computer Corporation Western Research Laboratory, Palo Alto, CA.
|
| |
40
|
Sair, S. and Charney, M. 2000. Memory behaviour of the SPEC2000 benchmark suite. IBM T. J. Watson Research Center technical report. IBM T. J. Watson Research Center, Yorktown Heights, NY.
|
| |
41
|
Scott, M. L., LeBlanc, T. J., and Marsh, B. D. 1988. Design rationale for Psyche, a general-purpose multiprocessor operating system. In Proceedings of the 1988 International Conference on Parallel Processing.
|
| |
42
|
Shigematsu, S. et al. 1997. A 1-V high-speed MTCMOS circuit scheme for power-down application circuits. IEEE J. Solid-State Circ. 32, 6 (June), 861--869.
|
 |
43
|
|
| |
44
|
Osman S. Unsal , Raksit Ashok , Israel Koren , C. Mani Krishna , Csaba Andras Moritz, Cool-cache for hot multimedia, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
| |
45
|
Unsal, O. S., Koren, I., Krishna, C. M., and Moritz, C. A. 2002. Cool-Fetch: Compiler-enabled power-aware fetch throttling. In IEEE Comput. Architect. Lett. 1.
|
 |
46
|
|
 |
47
|
|
 |
48
|
|
| |
49
|
|
 |
50
|
D. A. Wood , S. J. Eggers , G. Gibson , M. D. Hill , J. M. Pendleton, An in-cache address translation mechanism, Proceedings of the 13th annual international symposium on Computer architecture, p.358-365, June 02-05, 1986, Tokyo, Japan
|
| |
51
|
Zhang, M. and Asanovic, K. 2000. Highly-associative caches for low-power processors. In Kool Chips Workshop, 33rd Annual Symposium on Microarchitecture (MICRO '00, Dec.).
|
|