|
ABSTRACT
Modern mobile devices need to be extremely energy efficient. Due to the growing complexity of these devices, energy-aware design exploration has become increasingly important. Current exploration tools often do not support energy estimation, or require the design to be very detailed before estimation is possible. It is important to get early feedback on both performance and energy consumption during all phases of the design and at higher abstraction levels. This article presents a unified optimization and exploration framework to explore source-level transformation to processor architecture design space. The proposed retargetable compiler and simulator framework can map applications to a range of processors and memory configurations, simulate, and report detailed performance and energy estimates. An accurate and consistent energy modeling approach is introduced which can estimate the energy consumption of processor and memories at a component level, which can help to guide the design process. Fast energy-aware architecture exploration is illustrated by modeling both state-of-the-art processors as well as other architectures. Various design trade-offs are also illustrated on different academic as well as industrial benchmarks from both the wireless communication and multimedia domain. We also illustrate a design space exploration on different applications and show that there is large trade-off space between application performance, energy consumption, and area. We show that the proposed framework is consistent, accurate, and covers a large design space including various novel low-power extensions in a unified framework.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Tom Vander Aa , Murali Jayapala , Francisco Barat , Geert Deconinck , Rudy Lauwereins , Francky Catthoor , Henk Corporaal, Instruction buffering exploration for low energy VLIWs with instruction clusters, Proceedings of the 2004 Asia and South Pacific Design Automation Conference, January 27-30, 2004, Yokohama, Japan
|
| |
2
|
|
| |
3
|
Ascia, G., Catania, V., Palesi, M., and Patti, D. 2003. Epic-Explorer: A parameterized VLIW-based platform framework for design space exploration. In Proceedings of the ESTIMedia Conference, 3--4.
|
 |
4
|
Rajeshwari Banakar , Stefan Steinke , Bo-Sik Lee , M. Balakrishnan , Peter Marwedel, Scratchpad memory: design alternative for cache on-chip memory in embedded systems, Proceedings of the tenth international symposium on Hardware/software codesign, May 06-08, 2002, Estes Park, Colorado
[doi> 10.1145/774789.774805]
|
| |
5
|
Baron, M. 2005. Cortex a8: High speed, low power. In Microprocessor Report.
|
| |
6
|
Benini, L., Bruni, D., Chinosi, M., Silvano, C., and Zaccaria, V. 2002. A power modeling and estimation framework for VLIW-based embedded system. ST J. Syst. Res. 3, 1, 110--118.
|
| |
7
|
Brockmeyer, E., Ghez, C., Baetens, W., and Catthoor, F. 2000. Unified Low-Power Design Flow for Data-Dominated Multi-Media and Telecom Applications. Kluwer Academic, Boston, MA.
|
 |
8
|
|
| |
9
|
Cadence, Inc. 2006. Cadence SoC Encounter User Guide. Cadence, Inc.
|
 |
10
|
|
 |
11
|
Albert Cohen , Marc Sigler , Sylvain Girbal , Olivier Temam , David Parello , Nicolas Vasilache, Facilitating the search for compositions of program transformations, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
[doi> 10.1145/1088149.1088169]
|
| |
12
|
CoWare, Inc. 2008. CoWare processor designer. www.coware.com/products/processordesigner.php.
|
 |
13
|
|
| |
14
|
Kevin Fan , Manjunath Kudlur , Hyunchul Park , Scott Mahlke, Cost Sensitive Modulo Scheduling in a Loop Accelerator Synthesis System, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.219-232, November 12-16, 2005, Barcelona, Spain
[doi> 10.1109/MICRO.2005.17]
|
 |
15
|
Kevin Fan , Hyun hul Park , Manjunath Kudlur , S ott Mahlke, Modulo scheduling for highly customized datapaths to increase hardware reusability, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization, April 05-09, 2008, Boston, MA, USA
[doi> 10.1145/1356058.1356075]
|
| |
16
|
Faraday Technology Corporation. 2007. Faraday UMC 90nm RVT Standard Cell Library. http://www.faraday-tech.com.
|
| |
17
|
Anup Gangwar , M. Balakrishnan , Anshul Kumar, Impact of intercluster communication mechanisms on ILP in clustered VLIW architectures, ACM Transactions on Design Automation of Electronic Systems (TODAES), v.12 n.1, p.1-es, January 2007
|
| |
18
|
Sylvain Girbal , Nicolas Vasilache , Cédric Bastoul , Albert Cohen , David Parello , Marc Sigler , Olivier Temam, Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies, International Journal of Parallel Programming, v.34 n.3, p.261-317, June 2006
[doi> 10.1007/s10766-006-0012-3]
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
|
| |
23
|
Murali Jayapala , Francisco Barat , Tom Vander Aa , Francky Catthoor , Henk Corporaal , Geert Deconinck, Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors, IEEE Transactions on Computers, v.54 n.6, p.672-683, June 2005
[doi> 10.1109/TC.2005.92]
|
| |
24
|
Manjunath Kudlur , Kevin Fan , Michael Chu , Scott Mahlke, Automatic Synthesis of Customized Local Memories for Multicluster Application Accelerators, Proceedings of the Application-Specific Systems, Architectures and Processors, 15th IEEE International Conference, p.304-314, September 27-29, 2004
[doi> 10.1109/ASAP.2004.10]
|
| |
25
|
Lambrechts, A., Raghavan, P., Jayapala, M., Catthoor, F., and Verkest, D. 2007. Energy vs. performance trade-offs and interconnect-aware design for coarse grained reconfigurable processors. In Proceedings of the Asia and South Pacific Design Automation Conference Ph.D. Forum.
|
 |
26
|
Yuan Lin , Hyunseok Lee , Mark Woh , Yoav Harel , Scott Mahlke , Trevor Mudge , Chaitali Chakrabarti , Krisztian Flautner, SODA: A Low-power Architecture For Software Radio, Proceedings of the 33rd annual international symposium on Computer Architecture, p.89-101, June 17-21, 2006
|
| |
27
|
LSF. 2002. LSF: Liberty simulation framework 1.0. http://liberty.princeton.edu/Software/LSE.
|
| |
28
|
Mediabench. Mediabench homepage.http://www.cs.ucla.edu/leec/mediabench.
|
| |
29
|
Mei, B., Vernalde, S., Verkest, D., Man, H. D., and Lauwereins, R. 2003. ADRES: An architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix. In Proceedings of the Conference on Field-Programmable Logic and Applications.
|
| |
30
|
|
| |
31
|
Rabbah, R. M., Bratt, I., Asanovic, K., and Agarwal, A. 2004. Versatility and versabench: A new metric and a benchmark suite for flexible architectures. http://groups.csail.mit.edu/cag/versabench/MIT-LCS-TM-646.pdf.
|
| |
32
|
Praveen Raghavan , Andy Lambrechts , Murali Jayapala , Francky Catthoor , Diederik Verkest, Distributed loop controller architecture for multi-threading in uni-threaded VLIW processors, Proceedings of the conference on Design, automation and test in Europe: Proceedings, March 06-10, 2006, Munich, Germany
|
| |
33
|
Praveen Raghavan , Andy Lambrechts , Murali Jayapala , Francky Catthoor , Diederik Verkest , Henk Corporaal, Very wide register: an asymmetric register file organization for low power embedded processors, Proceedings of the conference on Design, automation and test in Europe, April 16-20, 2007, Nice, France
|
| |
34
|
Rixner, S., Dally, W. J., Khailany, B., Mattson, P. R., Kapasi, U. J., and Owens, J. D. 2000. Register organization for media processing. In Proceedings of the International Symposium on High-Performance Computer Architectures (HPCA'00), 375--386.
|
| |
35
|
Schneider, M., Blume, H., and Noll, T. G. 2004. Power estimation on functional level for programmable processors. Adv. Radio Sci. 2, 215--219.
|
| |
36
|
Schuster, T., Bougard, B., Raghavan, P., Priewasser, R., Novo, D., Vanderperre, L., and Catthoor, F. 2007. Design of a low power pre-synchronization ASIP for multimode SDR terminals. In Proceedings of the International Symposium on Systems, Architectures, Modeling and Simulation (SAMOS'07).
|
| |
37
|
Hartej Singh , Ming-Hau Lee , Guangming Lu , Nader Bagherzadeh , Fadi J. Kurdahi , Eliseu M. Chaves Filho, MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications, IEEE Transactions on Computers, v.49 n.5, p.465-481, May 2000
[doi> 10.1109/12.859540]
|
 |
38
|
|
| |
39
|
Starcore DSP Techology. 2000. SC140 DSP Core Reference Manual. Starcore DSP Techology, http://www.starcore-dsp.com.
|
| |
40
|
SUIF. 2001. SUIF2 compiler system. http://suif.stanford.edu.
|
| |
41
|
Synfora, Inc. 2008. PICO express. http://www.synfora.com.
|
| |
42
|
Synopsys, Inc. 2006a. Design Compiler User Guide. Synopsys, Inc.
|
| |
43
|
Synopsys, Inc. 2006b. Prime Power User Guide. Synopsys, Inc.
|
| |
44
|
Target. 2008. IP designer. http://www.retarget.com.
|
| |
45
|
Texas Instruments, Inc. 2006. TMS320C64x/C64x+ DSP CPU and Instruction Set Reference Guide. http://www.ti.com/.
|
| |
46
|
|
| |
47
|
Trimaran. 1999. Trimaran 2.0: An infrastructure for research in instruction-level parallelism. http://www.trimaran.org.
|
| |
48
|
Wiegand, T., Sullivan, G. J., Bjontegaard, G., and Luthra, A. 2003. Overview of the H.264/AVC video coding standard. IEEE Trans. Circ. Syst. Video Technol. 13, 7, 560--576.
|
 |
49
|
W. Ye , N. Vijaykrishnan , M. Kandemir , M. J. Irwin, The design and use of simplepower: a cycle-accurate energy estimation tool, Proceedings of the 37th Annual Design Automation Conference, p.340-345, June 05-09, 2000, Los Angeles, California, United States
[doi> 10.1145/337292.337436]
|
INDEX TERMS
Primary Classification:
B.
Hardware
B.3
MEMORY STRUCTURES
B.3.3
Performance Analysis and Design Aids**
Additional Classification:
C.
Computer Systems Organization
C.1
PROCESSOR ARCHITECTURES
C.1.1
Single Data Stream Architectures
C.1.2
Multiple Data Stream Architectures (Multiprocessors)
Subjects:
Single-instruction-stream, multiple-data-stream processors (SIMD)
D.
Software
D.3
PROGRAMMING LANGUAGES
D.3.4
Processors
Subjects:
Compilers;
Retargetable compilers
General Terms:
Performance
Keywords:
Energy,
VLIW,
architecture exploration,
area,
compiler-architecture interaction,
design,
embedded systems,
loop transformations,
power estimation,
power-performance trade-off,
processors
|