| Configurable cache subsetting for fast cache tuning |
| Full text |
Pdf
(2.25 MB)
|
| Source
|
Annual ACM IEEE Design Automation Conference
archive
Proceedings of the 43rd annual Design Automation Conference
table of contents
San Francisco, CA, USA
SESSION: Session 39: parallelism and memory optimizations
table of contents
Pages: 695 - 700
Year of Publication: 2006
ISBN:1-59593-381-6
|
|
Authors
|
|
Pablo Viana
|
Federal University of Pernambuco, Recife-PE, Brazil
|
|
Ann Gordon-Ross
|
University of California, Riverside, Riverside-CA
|
|
Eamonn Keogh
|
University of California, Riverside, Riverside-CA
|
|
Edna Barros
|
Federal University of Pernambuco, Recife-PE, Brazil
|
|
Frank Vahid
|
University of California, Riverside, Riverside-CA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 29, Citation Count: 2
|
|
|
ABSTRACT
Numerous variations of configurable caches, having variable parameters like total size, line size, and associativity, have been proposed in commercial microprocessors in recent years. Tuning a configurable cache to a target application has been shown to reduce memory-access power by over 50%. However, searching the configuration space for the best configuration can require much time or power, even when using recent cache tuning heuristics. We sought to determine, for a particular domain of applications, the smallest subset of cache configurations that would still enable effective tuning. For a suite of 34 benchmarks and a cache with 18 possible configurations, we determine through an exhaustive search of all possible subsets, that only 3 or 4 candidate configurations are necessary to support tuning. We introduce a new heuristic, adapted from an efficient and effective heuristic developed for data mining, to quickly determine the best configurations for any sized subset, with near optimal results. We then consider a configurable cache with 17,640 possible configurations and improve our heuristic to include a pre-pruning step, yielding near optimal tuning results. We conclude that only 3 or 4 possible cache configurations are needed to offer a near optimal configuration for every benchmark in our suite - resulting in a 91% reduction in design space exploration time over a state-of-the-art cache tuning heuristic.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Arc international. In http://www.arccores.com, 2005.
|
| |
2
|
Arm embedded processor. In http://www.arm.com, 2005.
|
| |
3
|
Nios embedded processors. In http://www.altera.com, 2005.
|
| |
4
|
D. H. Albonesi. Selective cache ways: On-demand cache resource allocation. Journal of Instruction-Level Parallelism, 2, May 2000.
|
 |
5
|
Rajeev Balasubramonian , David Albonesi , Alper Buyuktosunoglu , Sandhya Dwarkadas, Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.245-257, December 2000, Monterey, California, United States
[doi> 10.1145/360128.360153]
|
| |
6
|
D. Burger, T. M. Austin, and S. Bennet. Evaluating future microprocessors: the simplescalar tool set. Technical Report CS-TR-1996-1308, Computer Sciences Department, University of Wisconsin, Madison, WI, August 1996.
|
| |
7
|
EEMBC. The Embedded Microprocessor Benchmark Consortium. In http://www.eembc.org, 2005.
|
| |
8
|
|
 |
9
|
|
| |
10
|
P. S. Heckbert and M. Garland. Survey of polygonal surface simplification algorithms, multiresolution surface modeling course. In Proceedings of the 24th International Conference on Computer Graphics and Interactive Techiniques, 1997.
|
| |
11
|
|
| |
12
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
13
|
|
| |
14
|
G. Reinman and N. Jouppi. Cacti 2.0: An integrated cache timing and power model. Technical report, COMPAQ Western Research Lab, 1999.
|
| |
15
|
Tensilica. Xtensa Processor Generator. In http://www.tensilica.com, 2005.
|
| |
16
|
|
 |
17
|
|
|