|
ABSTRACT
Lx is a scalable and customizable VLIW processor technology platform designed by Hewlett-Packard and STMicroelectronics that allows variations in instruction issue width, the number and capabilities of structures and the processor instruction set. For Lx we developed the architecture and software from the beginning to support both scalability (variable numbers of identical processing resources) and customizability (special purpose resources).
In this paper we consider the following issues. When is customization or scaling beneficial? How can one determine the right degree of customization or scaling for a particular application domain? What architectural compromises were made in the Lx project to contain the complexity inherent in a customizable and scalable processor family?
The experiments described in the paper show that specialization for an application domain is effective, yielding large gains in price/performance ratio. We also show how scaling machine resources scales performance, although not uniformly across all applications. Finally we show that customization on an application-by-application basis is today still very dangerous and much remains to be done for it to become a viable solution.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Colwell, R., O'Donnell, J., Papworth, D., and Rodman, P. "Instruction Storage Method with a Compressed Format using a Mask Word", U.S. Patent 5057837, Oct. 1991.
|
 |
2
|
Robert P. Colwell , Robert P. Nix , John J. O'Donnell , David B. Papworth , Paul K. Rodman, A VLIW architecture for a trace scheduling compiler, Proceedings of the second international conference on Architectual support for programming languages and operating systems, p.180-192, October 1987, Palo Alto, California, United States
|
| |
3
|
Faraboschi, P., Fisher, J. and Desoli, G Clustered Instruction-Level Parallel Processors. Hewlett-Packard Technical Report. HPL-98-204, 1998.
|
| |
4
|
|
| |
5
|
Fisher, J. "Trace Scheduling: A Technique for Global Microcode Compaction". IEEE Trans. on Computers, C- 30(7):478-490. 1981.
|
| |
6
|
IBM Corp. "CodePack Compression for PowerPC". Available as: http://www.chips.ibm.com/products/powerpc/ cores/cdpak.html
|
| |
7
|
P. Geoffrey Lowney , Stefan M. Freudenberger , Thomas J. Karzes , W. D. Lichtenstein , Robert P. Nix , John S. O'Donnell , John Ruttenberg, The multiflow trace scheduling compiler, The Journal of Supercomputing, v.7 n.1-2, p.51-142, May 1993
[doi> 10.1007/BF01205182]
|
| |
8
|
Raik-Allen G. "ARC Cores rides platform divergence trend". Red Herring, June 1999. Available as http://www. redherring.com/insider/1999/0604/vcarccores.html
|
| |
9
|
B. Ramakrishna Rau , David W. L. Yen , Wei Yen , Ross A. Towie, The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs, Computer, v.22 n.1, p.12-26, 28-30, 32-35, January 1989
[doi> 10.1109/2.19820]
|
| |
10
|
|
| |
11
|
Sharangpani H. "Intel~ Itanium Processor Microarchitecture Overview". Microprocessor Forum. 1999. Available as: http ://developer.intel.com/design/ia-64/architecture.htm
|
| |
12
|
Slavenburg G, Rathnam S., Dijkstra H, "The TriMedia TM- 1 PCI VLIW Media Processor", Hot Chips 8, August 1996.
|
| |
13
|
StarCore Alliance (Motorola Semiconductors and Lucent Technologies). Leadership in DSP Technology for Communications Applications. Available as: http://www. starcore-dsp.com/files/S C 140pres.pdf
|
| |
14
|
Tensilica Inc., "Application Specific Microprocessor Solutions (Data Sheet for Xtensa V1)", 1998. Available as: http ://www.tensilica.com/datasheet.pdf
|
| |
15
|
Texas Instruments Inc. "TMS320C6000: a High Performance DSP Platform". Available as: http://www.ti.com/ sc/docs/products/dsp/c6000/index.htm
|
 |
16
|
|
CITED BY 86
|
|
L. Salvemini , M. Sami , D. Sciuto , C. Silvano , V. Zaccaria , R. Zafalon, A methodology for the efficient architectural exploration of energy-delay trade-offs for embedded systems, Proceedings of the 2003 ACM symposium on Applied computing, March 09-12, 2003, Melbourne, Florida
|
|
|
|
|
|
M. Sami , D. Sciuto , C. Silvano , V. Zaccaria , R. Zafalon, Exploiting data forwarding to reduce the power budget of VLIW embedded processors, Proceedings of the conference on Design, automation and test in Europe, p.252-257, March 2001, Munich, Germany
|
|
|
|
|
|
|
|
|
A. Bona , M. Sami , D. Sciuto , V. Zaccaria , C. Silvano , R. Zafalon, Energy estimation and optimization of embedded VLIW processors based on instruction clustering, Proceedings of the 39th conference on Design automation, June 10-14, 2002, New Orleans, Louisiana, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
G. Palermo , M. Sam , C. Silvan , V. Zaccari , R. Zafalo, Branch prediction techniques for low-power VLIW processors, Proceedings of the 13th ACM Great Lakes symposium on VLSI, April 28-29, 2003, Washington, D. C., USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Andrei Terechko , Erwan Le Thénaff , Henk Corporaal, Cluster assignment of global values for clustered VLIW processors, Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, October 30-November 01, 2003, San Jose, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
M. Monchiero , G. Palermo , M. Sami , C. Silvano , V. Zaccaria , R. Zafalon, Low-power branch prediction techniques for VLIW architectures: a compiler-hints based approach, Integration, the VLSI Journal, v.38 n.3, p.515-524, January 2005
|
|
|
|
|
|
M. Monchiero , G. Palermo , M. Sami , C. Silvano , V. Zaccaria , R. Zafalon, Power-aware branch prediction techniques: a compiler-hints based approach for VLIW processors, Proceedings of the 14th ACM Great Lakes symposium on VLSI, April 26-28, 2004, Boston, MA, USA
|
|
|
|
|
|
Christophe Guillon , Fabrice Rastello , Thierry Bidault , Florent Bouchez, Procedure placement using temporal-ordering information: dealing with code size expansion, Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems, September 22-25, 2004, Washington DC, USA
|
|
|
Timothy Sherwood , Mark Oskin , Brad Calder, Balancing design options with Sherpa, Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems, September 22-25, 2004, Washington DC, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Murali Jayapala , Francisco Barat , Tom Vander Aa , Francky Catthoor , Henk Corporaal , Geert Deconinck, Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors, IEEE Transactions on Computers, v.54 n.6, p.672-683, June 2005
|
|
|
|
|
|
|
|
|
Tay-Jyi Lin , Chie-Min Chao , Chia-Hsien Liu , Pi-Chen Hsiao , Shin-Kai Chen , Li-Chun Lin , Chih-Wei Liu , Chein-Wei Jen, A unified processor architecture for RISC & VLIW DSP, Proceedings of the 15th ACM Great Lakes symposium on VLSI, April 17-19, 2005, Chicago, Illinois, USA
|
|
|
|
|
|
Giuseppe Desoli , Nikolay Mateev , Evelyn Duesterwald , Paolo Faraboschi , Joseph A. Fisher, DELI: a new run-time control point, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
|
|
|
|
|
Johann Großschädl , Paolo Ienne , Laura Pozzi , Stefan Tillich , Ajay K. Verma, Combining algorithm exploration with instruction set design: a case study in elliptic curve cryptography, Proceedings of the conference on Design, automation and test in Europe: Proceedings, March 06-10, 2006, Munich, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Martha Mercaldi , Steven Swanson , Andrew Petersen , Andrew Putnam , Andrew Schwerin , Mark Oskin , Susan J. Eggers, Instruction scheduling for a tiled dataflow architecture, ACM SIGOPS Operating Systems Review, v.40 n.5, December 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yuki Kobayashi , Murali Jayapala , Praveen Raghavan , Francky Catthoor , Masaharu Imai, Efficient Method to Generate an Energy Efficient Schedule Using Operation Shuffling, IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, v.E91-A n.2, p.604-612, February 2008
|
|
|
Huynh Phung Huynh , Joon Edward Sim , Tulika Mitra, An efficient framework for dynamic reconfiguration of instruction-set customization, Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, September 30-October 03, 2007, Salzburg, Austria
|
|
|
David Atienza , Praveen Raghavan , José L. Ayala , Giovanni De Micheli , Francky Catthoor , Diederik Verkest , Marisa López-Vallejo, Joint hardware-software leakage minimization approach for the register file of VLIW embedded architectures, Integration, the VLSI Journal, v.41 n.1, p.38-48, January, 2008
|
|
|
Rahul Nagpal , Arvind Madan , Amrutur Bhardwaj , Y. N. Srikant, INTACTE: an interconnect area, delay, and energy estimation tool for microarchitectural explorations, Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, September 30-October 03, 2007, Salzburg, Austria
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|