|
ABSTRACT
Recognizing speech, gestures, and visual features are important interface capabilities for future embedded mobile systems. Unfortunately, the real-time performance requirements of complex perception applications cannot be met by current embedded processors and often even exceed the performance of high performance microprocessors whose energy consumption far exceeds embedded energy budgets. Though custom ASICs provide a solution to this problem, they incur expensive and lengthy design cycles and are inflexible. This paper introduces a VLIW perception processor which uses a combination of clustered function units, compiler controlled dataflow and compiler controlled clock-gating in conjunction with a scratch-pad memory system to achieve high performance for perceptual algorithms at low energy consumption. The architecture is evaluated using ten benchmark applications taken from complex speech and visual feature recognition, security, and signal processing domains. The energy-delay product of a 0.13μ implementation of this architecture is compared against ASICs and general purpose processors. Using a combination of Spice simulations and real processor power measurements, we show that the cluster running at 1 GHz clock frequency outperforms a 2.4 GHz Pentium 4 by a factor of 1.75 while simultaneously achieving 159 times better energy delay product than a low power Intel XScale embedded processor.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
D. Brash. The ARM Archtecture Version 6 (ARMv6). ARM Holdings plc Whitepaper, January 2002.
|
 |
2
|
Timothy J. Callahan , John Wawrzynek, Adapting software pipelining for reconfigurable computing, Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems, p.57-64, November 17-19, 2000, San Jose, California, United States
[doi> 10.1145/354880.354889]
|
| |
3
|
Y. Cao, T. Sato, D. Sylvester, M. Orshansky, and C. Hu. New paradigm of predictive MOSFET and interconnect modeling for early circuit design. In Proceedings of the IEEE Custom Integrated Circuits Conference (CICC), pages 201--204, June 2000.
|
| |
4
|
Y. Cao, T. Sato, D. Sylvester, M. Orshansky, and C. Hu. Predictive technology model. http://www.device.eecs.berkeley.edu/~ptm, 2002.
|
| |
5
|
A. DeHon. DPGA-coupled microprocessors: Commodity ICs for the early 21st century. In D. A. Buell and K. L. Pocek, editors, IEEE Workshop on FPGAs for Custom Computing Machines, pages 31--39, Los Alamitos, CA, 1994. IEEE Computer Society Press.
|
| |
6
|
R. Gonzalez and M. Horowitz. Energy dissipation in general purpose microprocessors. IEEE Journal of Solid-State Circuits, 31(9):1277--1284, September 1996.
|
| |
7
|
|
 |
8
|
Michael K. Gowan , Larry L. Biro , Daniel B. Jackson, Power considerations in the design of the Alpha 21264 microprocessor, Proceedings of the 35th annual conference on Design automation, p.726-731, June 15-19, 1998, San Francisco, California, United States
[doi> 10.1145/277044.277226]
|
| |
9
|
|
| |
10
|
J. Hoogerbrugge and L. Augusteijn. Instruction scheduling for TriMedia. Journal of Instruction-Level Parallelism, 1(1), Feb. 1999.
|
 |
11
|
|
| |
12
|
X. Huang, F. Alleva, H.-W. Hon, M.-Y. Hwang, K.-F. Lee, and R. Rosenfeld. The SPHINX-II speech recognition system: an overview. Computer Speech and Language, 7(2):137--148, 1993.
|
| |
13
|
S. M. Joshi. Some fast speech processing algorithms using Altivec technology. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 2135 -- 2138, Mar. 1999.
|
| |
14
|
|
 |
15
|
|
| |
16
|
B. Mathew, A. Davis, and R. Evans. A characterization of visual feature recognition. In Proceedings of the IEEE 6th Annual Workshop on Workload Characterization (WWC-6), pages 3--11, October 2003.
|
 |
17
|
Binu Mathew , Al Davis , Zhen Fang, A low-power accelerator for the SPHINX 3 speech recognition system, Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, October 30-November 01, 2003, San Jose, California, USA
[doi> 10.1145/951710.951739]
|
| |
18
|
B. Mathew, A. Davis, and A. Ibrahim. Perception coprocessors for embedded systems. In Proceedings of the Workshop on Embedded Systems for Real-Time Multimedia (ESTIMedia), pages 109--116, October 2003.
|
| |
19
|
|
 |
20
|
|
| |
21
|
Scott Rixner , William J. Dally , Ujval J. Kapasi , Brucek Khailany , Abelardo López-Lagunas , Peter R. Mattson , John D. Owens, A bandwidth-efficient architecture for media processing, Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, p.3-13, November 1998, Dallas, Texas, United States
|
| |
22
|
|
| |
23
|
P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Dec. 2001.
|
| |
24
|
Elliot Waingold , Michael Taylor , Devabhaktuni Srikrishna , Vivek Sarkar , Walter Lee , Victor Lee , Jang Kim , Matthew Frank , Peter Finch , Rajeev Barua , Jonathan Babb , Saman Amarasinghe , Anant Agarwal, Baring It All to Software: Raw Machines, Computer, v.30 n.9, p.86-93, September 1997
[doi> 10.1109/2.612254]
|
| |
25
|
|
 |
26
|
|
|