|
ABSTRACT
Spatial Computing (SC) has been shown to be an energy-efficient model for implementing program kernels. In this paper we explore the feasibility of using SC for more than small kernels. To this end, we evaluate the performance and energy efficiency of entire applications on Tartan, a general-purpose architecture which integrates a reconfigurable fabric (RF) with a superscalar core. Our compiler automatically partitions and compiles an application into an instruction stream for the core and a configuration for the RF. We use a detailed simulator to capture both timing and energy numbers for all parts of the system.Our results indicate that a hierarchical RF architecture, designed around a scalable interconnect, is instrumental in harnessing the benefits of spatial computation. The interconnect uses static configuration and routing at the lower levels and a packet-switched, dynamically-routed network at the top level. Tartan is most energyefficient when almost all of the application is mapped to the RF, indicating the need for the RF to support most general-purpose programming constructs. Our initial investigation reveals that such a system can provide, on average, an order of magnitude improvement in energy-delay compared to an aggressive superscalar core on single-threaded workloads.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
M. Budiu, P.V. Artigas, et al. Dataflow: A complement to superscalar. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 177--186, March 20-22 2005.
|
| |
5
|
|
| |
6
|
M. Budiu and S.C. Goldstein. Pegasus: An efficient intermediate representation. Technical Report CMU-CS-02-107, Carnegie Mellon University, May 2002.
|
| |
7
|
|
 |
8
|
Mihai Budiu , Girish Venkataramani , Tiberiu Chelcea , Seth Copen Goldstein, Spatial computation, Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, October 07-13, 2004, Boston, MA, USA
|
 |
9
|
Timothy J. Callahan , John Wawrzynek, Adapting software pipelining for reconfigurable computing, Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems, p.57-64, November 17-19, 2000, San Jose, California, United States
[doi> 10.1145/354880.354889]
|
| |
10
|
|
| |
11
|
|
 |
12
|
|
| |
13
|
|
| |
14
|
|
 |
15
|
Seth Copen Goldstein , Herman Schmit , Matthew Moe , Mihai Budiu , Srihari Cadambi , R. Reed Taylor , Ronald Laufer, PipeRench: a co/processor for streaming multimedia acceleration, Proceedings of the 26th annual international symposium on Computer architecture, p.28-39, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
16
|
|
| |
17
|
J.R. Heath, P.J. Kuekes, et al. A defect-tolerant computer architecture: Opportunities for nanotechnology. Science, 280, 1998.
|
| |
18
|
|
| |
19
|
Intel Corp. Intel Pentium M Datasheet, January 2006.
|
 |
20
|
|
 |
21
|
|
| |
22
|
E. Larson, S. Chatterjee, et al. MASE: A novel architecture or detailed microarchitectural modeling. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), November 4-6 2001.
|
| |
23
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
24
|
Scott A. Mahlke , David C. Lin , William Y. Chen , Richard E. Hank , Roger A. Bringmann, Effective compiler support for predicated execution using the hyperblock, Proceedings of the 25th annual international symposium on Microarchitecture, p.45-54, December 01-04, 1992, Portland, Oregon, United States
|
 |
25
|
Ken Mai , Tim Paaske , Nuwan Jayasena , Ron Ho , William J. Dally , Mark Horowitz, Smart Memories: a modular reconfigurable architecture, Proceedings of the 27th annual international symposium on Computer architecture, p.161-171, June 2000, Vancouver, British Columbia, Canada
|
| |
26
|
B.J. Nelson. Remote procedure call. Technical Report CSL-81-9, Xerox Palo Alto Research Center, 1981.
|
| |
27
|
|
 |
28
|
|
| |
29
|
Scott Rixner , William J. Dally , Ujval J. Kapasi , Brucek Khailany , Abelardo López-Lagunas , Peter R. Mattson , John D. Owens, A bandwidth-efficient architecture for media processing, Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, p.3-13, November 1998, Dallas, Texas, United States
|
 |
30
|
Karthikeyan Sankaralingam , Ramadass Nagarajan , Haiming Liu , Changkyu Kim , Jaehyuk Huh , Doug Burger , Stephen W. Keckler , Charles R. Moore, Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture, Proceedings of the 30th annual international symposium on Computer architecture, June 09-11, 2003, San Diego, California
|
| |
31
|
H. Schmit, D. Whelihan, et al. Piperench: A virtualized programmable datapath in 0.18 micron technology. In IEEE Custom Integrated Circuits Conference, pages 63--66, 2002.
|
| |
32
|
|
 |
33
|
|
| |
34
|
Standard Performance Evaluation Corp. SPEC INT 95 Benchmark Suite, 1995.
|
| |
35
|
Standard Performance Evaluation Corp. SPEC INT 2000 Benchmark Suite, 2000.
|
 |
36
|
|
| |
37
|
|
 |
38
|
Michael Bedford Taylor , Walter Lee , Jason Miller , David Wentzlaff , Ian Bratt , Ben Greenwald , Henry Hoffmann , Paul Johnson , Jason Kim , James Psota , Arvind Saraf , Nathan Shnidman , Volker Strumpen , Matt Frank , Saman Amarasinghe , Anant Agarwal, Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams, Proceedings of the 31st annual international symposium on Computer architecture, p.2, June 19-23, 2004, München, Germany
|
| |
39
|
|
 |
40
|
William Tsu , Kip Macy , Atul Joshi , Randy Huang , Norman Walker , Tony Tung , Omid Rowhani , Varghese George , John Wawrzynek , André DeHon, HSRA: high-speed, hierarchical synchronous reconfigurable array, Proceedings of the 1999 ACM/SIGDA seventh international symposium on Field programmable gate arrays, p.125-134, February 21-23, 1999, Monterey, California, United States
[doi> 10.1145/296399.296442]
|
 |
41
|
Girish Venkataramani , Tiberiu Chelcea , Seth Copen Goldstein , Tobias Bjerregaard, SOMA: a tool for synthesizing and optimizing memory accesses in ASICs, Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, September 19-21, 2005, Jersey City, NJ, USA
[doi> 10.1145/1084834.1084894]
|
| |
42
|
G. Venkataramani, M. Budiu, et al. C to asynchronous dataflow circuits: An end-to-end toolflow. In International Workshop on Logic Synthesis, June 2004.
|
| |
43
|
G. Venkataramani, T. Chelcea, et al. HLS support for unconstrained memory accesses. In International Workshop on Logic Syntheis, June 2005.
|
| |
44
|
M. Wazlowski, L. Agarwal, et al. PRISM-II compiler and architecture. In IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 9--16, Apr 1993.
|
| |
45
|
C. Wong, A. Martin, et al. An architecture for asynchronous FPGAs. In Proceedings of Field Programmable Technology (FPT), pages 170--177, 2003.
|
 |
46
|
Zhi Alex Ye , Andreas Moshovos , Scott Hauck , Prithviraj Banerjee, CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit, Proceedings of the 27th annual international symposium on Computer architecture, p.225-235, June 2000, Vancouver, British Columbia, Canada
|
CITED BY 2
|
|
|
|
|
Andrew Putnam , Susan Eggers , Dave Bennett , Eric Dellinger , Jeff Mason , Henry Styles , Prasanna Sundararajan , Ralph Wittig, Performance and power of cache-based reconfigurable computing, ACM SIGARCH Computer Architecture News, v.37 n.3, June 2009
|
|