| A one-shot configurable-cache tuner for improved energy and performance |
| Full text |
Pdf
(227 KB)
|
| Source
|
Design, Automation, and Test in Europe
archive
Proceedings of the conference on Design, automation and test in Europe
table of contents
Nice, France
SESSION: Novel directions in architectural simulation and validation
table of contents
Pages: 755 - 760
Year of Publication: 2007
ISBN:978-3-9810801-2-4
|
|
Authors
|
|
Ann Gordon-Ross
|
University of California, Riverside
|
|
Pablo Viana
|
Universidade Federal de Alagoas, Arapiraca-AL, Brazil and University of California, Irvine
|
|
Frank Vahid
|
University of California, Riverside
|
|
Walid Najjar
|
University of California, Riverside
|
|
Edna Barros
|
Federal University of Pernambuco, Recife-PE, Brazil
|
|
| Sponsors |
|
| Publisher |
EDA Consortium
San Jose, CA, USA
|
| Bibliometrics |
Downloads (6 Weeks): 2, Downloads (12 Months): 29, Citation Count: 2
|
|
|
ABSTRACT
We introduce a new non-intrusive on-chip cache-tuning hardware module capable of accurately predicting the best configuration of a configurable cache for an executing application. Previous dynamic cache tuning approaches change the cache configuration several times as part of the tuning search process, executing the application using inferior configurations and temporarily causing energy and performance overhead. The introduced tuner uses a different approach, which non-intrusively collects data on addresses issued by the microprocessor, analyzes that data to predict the best cache configuration, and then updates the cache to the new best configuration in "one-shot," without ever having to examine inferior configurations. The result is less energy and less performance overhead, meaning that cache tuning can be applied more frequently. We show through experiments that the one-shot cache tuner can reduce memory-access related energy for instructions by 35% and comes within 4% of a previous intrusive approach, and results in 4.6 times less energy overhead and a 7.7 times speedup in tuning time compared to a previous intrusive approach, at the main expense of 12% larger size.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
B. Agrawal, T. Sherwood. Modeling TCAM power for next generation network devices. IEEE International Symposium on Performance Analysis of Systems and Software, 2006.
|
| |
2
|
|
| |
3
|
ARM, www.arm.com.
|
| |
4
|
Artisan. www.artisan.com
|
 |
5
|
Rajeev Balasubramonian , David Albonesi , Alper Buyuktosunoglu , Sandhya Dwarkadas, Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.245-257, December 2000, Monterey, California, United States
[doi> 10.1145/360128.360153]
|
| |
6
|
|
| |
7
|
D. Burger, T. Austin, S. Bennet. Evaluating future microprocessors: the simplescalar toolset. University of Wisconsin-Madison. Computer Science Department Technical Report CS-TR-1308, July 2000
|
| |
8
|
|
| |
9
|
T. Givargis. F. Vahid. Platune: a tuning framework for system-on-a-chip platforms. IEEE Transactions on Computer Aided Design, November 2002.
|
| |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
R. Kempke. A. McAuley. Ternary CAM memory architecture and methodology. U.S. Patent 5 841 874, Aug 13 1996
|
| |
14
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
15
|
|
| |
16
|
R. Mattson, J. Gecsei, D. Slutz, I. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 1970
|
| |
17
|
MicroBlaze, www.xilinx.com
|
 |
18
|
|
 |
19
|
Joshua J. Pieper , Alain Mellan , JoAnn M. Paul , Donald E. Thomas , Faraydon Karim, High level cache simulation for heterogeneous multiprocessors, Proceedings of the 41st annual conference on Design automation, June 07-11, 2004, San Diego, CA, USA
[doi> 10.1145/996566.996652]
|
| |
20
|
|
| |
21
|
S. Segars. Low power design techniques for microprocessors, International Solid State Circuit Conf, 2001
|
| |
22
|
R. Sugumar, S. Abraham. Efficient simulation of multiple cache configurations using binomial trees. Technical Report CSE-TR-111-91, 1991.
|
 |
23
|
|
 |
24
|
Dinesh C. Suresh , Walid A. Najjar , Frank Vahid , Jason R. Villarreal , Greg Stitt, Profiling tools for hardware/software partitioning of embedded applications, Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems, June 11-13, 2003, San Diego, California, USA
|
| |
25
|
Synopsys, www.synopsys.com
|
| |
26
|
Tensilica, Xtensa Processor Generator, www.tensilica.com/.
|
| |
27
|
P. Viana. A methodology to explore the design space of memory hierarchies for embedded systems. PhD Thesis, 2006
|
 |
28
|
|
CITED BY 2
|
|
|
|
|
Pablo Viana , Ann Gordon-Ross , Edna Barros , Frank Vahid, A table-based method for single-pass cache optimization, Proceedings of the 18th ACM Great Lakes symposium on VLSI, May 04-06, 2008, Orlando, Florida, USA
|
|