|
ABSTRACT
High-level synthesis (HLS) of memory-intensive applications has featured several innovations in terms of enhancements made to the basic memory organization and data layout. However, increasing performance and energy demands faced by application-specific integrated circuits (ASIC) are forcing designers to alter the fundamental architectural template of the HLS output, namely, a controller-datapath associated with a memory subsystem (monolithic, banked, etc.). We propose an architectural template for the HLS output that consists of a controller-datapath circuit associated with a memory subsystem into which computation units have been integrated. The enhanced memory subsystem is called computation-unit integrated memory (CIM). A CIM offers higher memory bandwidth (relative to what is offered through the system bus) to computation units present locally within it and reduces the overall communication between the memory subsystem and the controller-datapath, thus providing a template highly suitable for deriving efficient implementations of memory-intensive applications. This work addresses the challenge of providing an automatic synthesis framework for a CIM-based architecture. Our framework can analyze the various trade-offs involved in selecting suitable operations in a behavior for execution using a CIM and generate a high-performance, low-overhead implementation. Experiments with several behaviors indicate that an average performance improvement of 1.88/spl times/ (a maximum of 2.63/spl times/) is possible with very low area overheads. The energy-delay product improves by an average of 2.1/spl times/ (maximum of 3.4/spl times/).
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
P. W. Diodato , Y.-H. Wong , C.-T Liu , K.-H. Lee , R. Dail , W. S. Lindenberger , A. C.. .. Dumbri , M. V. Depaolis , J. T. Clemens , W. W. Troutman , K. Noda , J. M. Drynan , M. Nakamae, Merged Dram-Logic In The Year 2001, Proceedings of the 1998 IEEE International Workshop on Memory Technology, Design and Testing, p.24, August 24-25, 1998
|
| |
2
|
[2] H. Horikawa and H. Aslam, Merged-logic-type embedded DRAM suits high-performance SoCs, EE Times, Mar. 2003.
|
| |
3
|
Christoforos E. Kozyrakis , Stylianos Perissakis , David Patterson , Thomas Anderson , Krste Asanovic , Neal Cardwell , Richard Fromm , Jason Golbus , Benjamin Gribstad , Kimberly Keeton , Randi Thomas , Noah Treuhaft , Katherine Yelick, Scalable Processors in the Billion-Transistor Era: IRAM, Computer, v.30 n.9, p.75-78, September 1997
[doi> 10.1109/2.612252]
|
 |
4
|
Ken Mai , Tim Paaske , Nuwan Jayasena , Ron Ho , William J. Dally , Mark Horowitz, Smart Memories: a modular reconfigurable architecture, Proceedings of the 27th annual international symposium on Computer architecture, p.161-171, June 2000, Vancouver, British Columbia, Canada
|
| |
5
|
|
 |
6
|
|
 |
7
|
Mary Hall , Peter Kogge , Jeff Koller , Pedro Diniz , Jacqueline Chame , Jeff Draper , Jeff LaCoss , John Granacki , Jay Brockman , Apoorv Srivastava , William Athas , Vincent Freeh , Jaewook Shin , Joonseok Park, Mapping irregular applications to DIVA, a PIM-based data-intensive architecture, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p.57-es, November 14-19, 1999, Portland, Oregon, United States
[doi> 10.1145/331532.331589]
|
| |
8
|
|
 |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
Mary W. Hall , Jennifer M. Anderson , Saman P. Amarasinghe , Brian R. Murphy , Shih-Wei Liao , Edouard Bugnion , Monica S. Lam, Maximizing Multiprocessor Performance with the SUIF Compiler, Computer, v.29 n.12, p.84-89, December 1996
[doi> 10.1109/2.546613]
|
| |
13
|
J. Carter , W. Hsieh , L. Stoller , M. Swanson , L. Zhang , E. Brunvand , A. Davis , C.-C. Kuo , R. Kuramkote , M. Parker , L. Schaelicke , T. Tateyama, Impulse: Building a Smarter Memory Controller, Proceedings of the 5th International Symposium on High Performance Computer Architecture, p.70, January 09-12, 1999
|
| |
14
|
|
| |
15
|
|
| |
16
|
[16] R. Cloutier and D. Thomas, "The combination of scheduling, allocation, and mapping in a single algorithm," in Proc. Int. Symp. Microarchitecture, Dec. 1996, pp. 126-137.
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
Kamal S. Khouri , Ganesh Lakshminarayana , Niraj K. Jha, Memory binding for performance optimization of control-flow intensive behaviors, Proceedings of the 1999 IEEE/ACM international conference on Computer-aided design, p.482-488, November 07-11, 1999, San Jose, California, United States
|
 |
21
|
|
 |
22
|
|
| |
23
|
[23] O. Sentieys, D. Chillet, J. P. Diguet, and J. L. Phillipe, "Memory module selection for high-level synthesis," in Proc. VLSI Signal Processing IX, Oct. 1996, pp. 273-282.
|
 |
24
|
L. Benini , L. Macchiarulo , A. Macii , E. Macii , M. Poncino, From architecture to layout: partitioned memory synthesis for embedded systems-on-chip, Proceedings of the 38th conference on Design automation, p.784-789, June 2001, Las Vegas, Nevada, United States
[doi> 10.1145/378239.379066]
|
| |
25
|
Peter Slock , Sven Wuytack , Francky Catthoor , Gjalt de Jong, Fast and extensive system-level memory exploration for ATM applications, Proceedings of the 10th international symposium on System synthesis, p.74-81, September 17-19, 1997, Antwerp, Belgium
|
| |
26
|
|
| |
27
|
[27] P. G. Kjeldsberg, F. Catthoor, and E. J. Aas, "Data dependency size estimation for use in memory optimization," IEEE Trans. Computer-Aided Design, vol. 22, no. 7, pp. 908-921, July 2003.
|
| |
28
|
|
| |
29
|
Robert Sedgewick, Algorithms in C: Parts 1-4, Fundamentals, Data Structures, Sorting, and Searching, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1997
|
| |
30
|
[30] TSMC 0.13µm Process (CL013G) High-Speed Single-Port Synchronous SRAM (SRAM-SP-HS) Generator User Manual, http://www.artisan.com.
|
 |
31
|
|
| |
32
|
[32] SYNOPSIS Design Compiler, VSS and Cyclone User Manual, http://www.synopsys.com.
|
| |
33
|
[33] Sequence Design Power Tools 2000.4 User's Guide, http://www.sequencedesign.com.
|
| |
34
|
[34] Cadence Openbook SE 5.3. IC 4.4.3 and LVD 3.0, http://www.cadence.com.
|
| |
35
|
|
|