|
ABSTRACT
The key to increasing performance without a commensurate increase in power consumption in modern processors lies in increasing both parallelism and core specialization. Core specialization has been employed in the embedded space and is likely to play an important role in future heterogeneous multi-core architectures as well. In this paper, the face recognition application domain is employed as a case study to showcase an architectural design methodology which generates a specialized core with high performance and very low powercharacteristics. Specifically, we create "ASIC-like" execution flows to sustain the high memory parallelism generated within the core. The price of this benefit is a significant increase in compilation complexity. The crux of the problem is the need to co-schedule the often conflicting constraints of data access, data movement, and computation. A modular compiler approach that employs integer linear programming (ILP) based "interconnect-aware" instruction and data scheduling techniques to solve this problem is then described. The resulting core running the compiled code delivers a 1.65x throughput improvement over a high performance processor (Pentium 4) while simultaneously achieving an 80x energy-delay improvement over an energy-efficient processor (XScale) and performs real-time face recognition at embedded power budgets.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
D. Burger and T. Austin. The Simplescalar Toolset, Version 2.0. Technical Report TR-97-1342, University of Wisconsin-Madison, June 1997.
|
| |
3
|
|
| |
4
|
Colorado State University. Evaluation of face recognition algorithms. http://www.cs.colostate.edu/evalfacerec/, 2003.
|
| |
5
|
William J. Dally , Francois Labonte , Abhishek Das , Patrick Hanrahan , Jung-Ho Ahn , Jayanth Gummaraju , Mattan Erez , Nuwan Jayasena , Ian Buck , Timothy J. Knight , Ujval J. Kapasi, Merrimac: Supercomputing with Streams, Proceedings of the 2003 ACM/IEEE conference on Supercomputing, p.35, November 15-21, 2003
|
| |
6
|
F. EisenBrand. Gomory-Chvatal Cutting Planes and the Elementary Closure of Polyhedra. PhD thesis, Saarland University, 2000.
|
| |
7
|
R. Gonzalez and M. Horowitz. Energy dissipation in general purpose microprocessors. IEEE Journal of Solid-State Circuits, 31(9):1277--1284, Sept. 1996.
|
| |
8
|
|
| |
9
|
|
| |
10
|
A. Ibrahim. ACT: Adaptive Cellular Telephony Co-Processor. PhD thesis, University of Utah, December 2005.
|
 |
11
|
|
 |
12
|
Walter Lee , Rajeev Barua , Matthew Frank , Devabhaktuni Srikrishna , Jonathan Babb , Vivek Sarkar , Saman Amarasinghe, Space-time scheduling of instruction-level parallelism on a raw machine, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.46-57, October 02-07, 1998, San Jose, California, United States
|
| |
13
|
|
| |
14
|
B. Mathew, A. Davis, and R. Evans. A characterization of visual feature recognition. In Proceedings of the IEEE 6th Annual Workshop on Workload Characterization (WWC-6), pages 3--11, October 2003.
|
| |
15
|
|
 |
16
|
Hyunchul Park , Kevin Fan , Manjunath Kudlur , Scott Mahlke, Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures, Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, October 22-25, 2006, Seoul, Korea
[doi> 10.1145/1176760.1176778]
|
| |
17
|
|
 |
18
|
|
 |
19
|
|
| |
20
|
Scott Rixner , William J. Dally , Ujval J. Kapasi , Brucek Khailany , Abelardo López-Lagunas , Peter R. Mattson , John D. Owens, A bandwidth-efficient architecture for media processing, Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, p.3-13, November 1998, Dallas, Texas, United States
|
| |
21
|
|
| |
22
|
R. E. Schapire. The boosting approach to machine learning: An overview. In In MSRI Workshop on Nonlinear Estimation and Classification, 2002.
|
| |
23
|
|
 |
24
|
Michael D. Smith , Monica S. Lam , Mark A. Horowitz, Boosting beyond static scheduling in a superscalar processor, Proceedings of the 17th annual international symposium on Computer Architecture, p.344-354, May 28-31, 1990, Seattle, Washington, United States
|
| |
25
|
P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Dec. 2001.
|
| |
26
|
Elliot Waingold , Michael Taylor , Devabhaktuni Srikrishna , Vivek Sarkar , Walter Lee , Victor Lee , Jang Kim , Matthew Frank , Peter Finch , Rajeev Barua , Jonathan Babb , Saman Amarasinghe , Anant Agarwal, Baring It All to Software: Raw Machines, Computer, v.30 n.9, p.86-93, September 1997
[doi> 10.1109/2.612254]
|
| |
27
|
L. Wiskott, J. Fellous, N. Kruger, and C. Malsburg. Face Recognition by Elastic Bunch Graph Matching. Technical Report 96-08, Ruhr-Universitat Bochum, April 1996.
|
| |
28
|
|
|