| Methodology for operation shuffling and L0 cluster generation for low energy heterogeneous VLIW processors |
| Full text |
Pdf
(1.56 MB)
|
Source
|
ACM Transactions on Design Automation of Electronic Systems (TODAES)
archive
Volume 12 , Issue 4 (September 2007)
table of contents
Article No. 41
Year of Publication: 2007
ISSN:1084-4309
|
|
Authors
|
|
Yuki Kobayashi
|
Graduate School of Information Science and Technology, Osaka University, Osaka, Japan
|
|
Murali Jayapala
|
IMEC vzw., Leuven, Belgium
|
|
Praveen Raghavan
|
IMEC vzw., Katholieke Universitait Leuven, Leuven, Belgium
|
|
Francky Catthoor
|
IMEC vzw., Katholieke Universitait Leuven, Leuven, Belgium
|
|
Masaharu Imai
|
Graduate School of Information Science and Technology, Osaka University, Osaka, Japan
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 31, Citation Count: 0
|
|
|
ABSTRACT
Clustering L0 buffers is effective for energy reduction in the instruction memory hierarchy of embedded VLIW processors. However, the efficiency of the clustering depends on the schedule of the target application. Especially in heterogeneous or data clustered VLIW processors, determining energy efficient scheduling is more constraining. This article proposes a realistic technique supported by a tool flow to explore operation shuffling for improving generation of L0 clusters. The tool flow explores assignment of operations for each cycle and generates various schedules. This approach makes it possible to reduce energy consumption for various processor architectures. However, the computational complexity is large because of the huge exploration space. Therefore, some heuristics are also developed, which reduce the size of the exploration space while the solution quality remains reasonable. Furthermore, we also propose a technique to support VLIW processors with multiple data clusters, which is essential to apply the methodology to real world processors. The experimental results indicate potential gains of up to 27.6% in energy in L0 buffers, through operation shuffling for heterogeneous processor architectures as well as a homogeneous architecture. Furthermore, the proposed heuristics drastically reduce the exploration search space by about 90%, while the results are comparable to full search, with average differences of less than 1%. The experimental results indicate that energy efficiency can be improved in most of the media benchmarks by the proposed methodology, where the average gain is around 10% in comparison with generating clusters without operation shuffling.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Raminder S. Bajwa , Mitsuru Hiraki , Hirotsugu Kojima , Douglas J. Gorny , Kenichi Nitta , Avadhani Shridhar , Koichi Seki , Katsuro Sasaki, Instruction buffering to reduce power in processors for signal processing, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, v.5 n.4, p.417-424, Dec. 1997
[doi> 10.1109/92.645068]
|
| |
2
|
Benini, L., Bruni, D., Chinosi, M., Silvano, C., Zaccaria, V., and Zafalon, R. 2001. A power modeling and estimation framework for VLIW-based embedded systems. In Proceedings of the IEEE International Workshop on Power And Timing Modeling, Optimization and Simulation, Yverdon-Les-Bains, IEEE. Switzerland.
|
 |
3
|
A. Bona , M. Sami , D. Sciuto , V. Zaccaria , C. Silvano , R. Zafalon, Energy estimation and optimization of embedded VLIW processors based on instruction clustering, Proceedings of the 39th conference on Design automation, June 10-14, 2002, New Orleans, Louisiana, USA
[doi> 10.1145/513918.514137]
|
| |
4
|
A. Bona , M. Sami , D. Sciuto , V. Zaccaria , C. Silvano , R. Zafalon, An Instruction-Level Methodology for Power Estimation and Optimization of Embedded VLIW Cores, Proceedings of the conference on Design, automation and test in Europe, p.1128, March 04-08, 2002
|
 |
5
|
|
| |
6
|
Clear Speed. http://www.clearspeed.com/.
|
| |
7
|
|
 |
8
|
Paolo Faraboschi , Geoffrey Brown , Joseph A. Fisher , Giuseppe Desoli , Fred Homewood, Lx: a technology platform for customizable VLIW embedded processing, Proceedings of the 27th annual international symposium on Computer architecture, p.203-213, June 2000, Vancouver, British Columbia, Canada
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
Murali Jayapala , Francisco Barat , Tom Vander Aa , Francky Catthoor , Henk Corporaal , Geert Deconinck, Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors, IEEE Transactions on Computers, v.54 n.6, p.672-683, June 2005
[doi> 10.1109/TC.2005.92]
|
| |
13
|
Jayapala, M., Vander Aa, T., Barat, F., Catthoor, F., Coporaal, H., and Deconinck, G. 2004. L0 cluster synthesis and operation shuffling. In Proceedings of the IEEE International Workshop on Power And Timing Modeling, Optimization and Simulation. Santorini, Greece. IEEE, 311--321.
|
| |
14
|
Andy Lambrechts , Praveen Raghavan , Anthony Leroy , Guillermo Talavera , Tom Vander Aa , Murali Jayapala , Francky Catthoor , Diederik Verkest , Geert Deconinck , Henk Corporaal , Frédéric Robert , Jordi Carrabina, Power Breakdown Analysis for a Heterogeneous NoC Platform Running a Video Application, Proceedings of the 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors, p.179-184, July 23-25, 2005
|
| |
15
|
Andy Lambrechts , Tom Vander Aa , Murali Jayapala , Guillermo Talavera , Anthony Leroy , Adelina Shickova , Francisco Barat , Bingfeng Mei , Francky Catthoor , Diederik Verkest , Geert Deconinck , Henk Corporaal , Frederic Robert , Jordi Carrabina Bordoll, Design Style Case Study for Embedded Multi Media Compute Nodes, Proceedings of the 25th IEEE International Real-Time Systems Symposium, p.104-113, December 05-08, 2004
[doi> 10.1109/REAL.2004.18]
|
 |
16
|
Lea Hwang Lee , Bill Moyer , John Arends, Instruction fetch energy reduction using loop caches for embedded applications with small tight loops, Proceedings of the 1999 international symposium on Low power electronics and design, p.267-269, August 16-17, 1999, San Diego, California, United States
[doi> 10.1145/313817.313944]
|
| |
17
|
MediaBench. http://cares.icsl.ucla.edu/MediaBench/.
|
| |
18
|
Rixner, S., Dally, W. J., Khailany, B., Mattson, P., Kapasi, U. J., and Owens, J. D. 2000. Register organization for media processing. In Proceedings of the International Symposium on High-Performance Computer Architecture. Toulouse, France, 375--386.
|
| |
19
|
Scarpazza, D. P., Raghavan, P., Novo, D., Catthoor, F., and Verkest, D. 2006. Software simultaneous multi-threading, a technique to exploit task-level parallelism to improve instruction- and data-level parallelism. In Proceedings of the Power and Timing Modeling, Optimization and Simulation. Montpellier, France, Springer Verlag, 12--23.
|
| |
20
|
Silicon Hive. http://www.silicon-hive.com/.
|
 |
21
|
Dinesh C. Suresh , Walid A. Najjar , Frank Vahid , Jason R. Villarreal , Greg Stitt, Profiling tools for hardware/software partitioning of embedded applications, Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems, June 11-13, 2003, San Diego, California, USA
|
| |
22
|
Texas Instruments. 2000. TMS320C6000 CPU and Instruction Set Reference Guide.
|
| |
23
|
Trimaran. Trimaran: An infrastructure for research in instruction-level parallelism. http://www.trimaran.org/.
|
| |
24
|
Tom Vander Aa , Murali Jayapala , Francisco Barat , Geert Deconinck , Rudy Lauwereins , Francky Catthoor , Henk Corporaal, Instruction buffering exploration for low energy VLIWs with instruction clusters, Proceedings of the 2004 conference on Asia South Pacific design automation: electronic design and solution fair, January 27-30, 2004, Yokohama, Japan
|
|