|
ABSTRACT
With the increasing cost of global communication on-chip, high-performance designs for data-intensive applications require architectures that distribute hardware resources (computing logic, memories, interconnect, etc.) throughout a chip, while restricting computations and communications to geographic proximities. In this paper, we present a methodology for high-level synthesis (HLS) of distributed logic-memory architectures, i.e., architectures that have logic and memory distributed across several partitions in a chip. Conventional HLS tools are capable of extracting parallelism from a behavior for architectures that assume a monolithic controller/datapath communicating with a memory or memory hierarchy. This work provides techniques to extend the synthesis frontier to more general architectures that can extract both coarse- and fine-grained parallelism from data accesses and computations in a synergistic manner. Our methodology selects many possible ways of organizing data and computations, carefully examines the trade-offs (i.e., communication overheads, synchronization costs, area overheads) in choosing one solution over another, and utilizes conventional HLS techniques for intermediate steps.We have evaluated the proposed framework on several benchmarks by generating register-transfer level (RTL) implementations using an existing commercial HLS tool with and without our enhancements, and by subjecting the resulting RTL circuits to logic synthesis and layout. The results show that circuits designed as distributed logic-memory architectures using our framework achieve significant (upto, 5.31X average of 3.45X) performance improvements over well-optimized conventional designs with small area overheads (upto 19.3%, 15.1% on average).
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
Ingrid M. Verbauwhede , Chris J. Scheers , Jan M. Rabaey, Memory estimation for high level synthesis, Proceedings of the 31st annual conference on Design automation, p.143-148, June 06-10, 1994, San Diego, California, United States
[doi> 10.1145/196244.196313]
|
| |
4
|
|
| |
5
|
F. Catthoor, F. Franssen, S. Wuytack, L. Nachtergaele, and H. De Man, "Global communication and memory optimizing transformations for low power signal processing systems," in Proc. Int. Wkshp. Low Power Design, 1994, pp. 51--56.
|
| |
6
|
|
| |
7
|
H. De Man, F. Catthoor, G. Goossens, J. V. Meerbergen, J. Rabaey, and J. Huisken, "Architecture driven synthesis techniques for mapping digital signal processing structures into silicon," Proc. IEEE, vol. 78, no. 2, pp. 319--335, Feb. 1990.
|
| |
8
|
R. Cloutier and D. Thomas, "The combination of scheduling, allocation, and mapping in a single algorithm," in Proc. Int. Symp. Microarchitecture, Dec. 1996, pp. 126--137.
|
| |
9
|
|
| |
10
|
Donald E. Thomas , Elizabeth D. Lagnese , John A. Nestor , Jayanth V. Rajan , Robert L. Blackburn , Robert A. Walker, Algorithmic and Register-Transfer Level Synthesis: The System Architect's Workbench, Kluwer Academic Publishers, Norwell, MA, 1989
|
| |
11
|
L. Ramachandran, D. D. Gajski, and V. Chaiyakul, "An algorithm for array variable clustering," in Proc. European Design Automation Conf., Mar. 1994, pp. 262--266.
|
| |
12
|
|
| |
13
|
|
 |
14
|
|
| |
15
|
O. Sentieys, D. Chillet, J. P. Diguet, and J. L. Phillipe, "Memory module selection for high-level synthesis," in Proc. VLSI Signal Processing IX, Oct. 1996, pp. 273--282.
|
| |
16
|
|
| |
17
|
F. Balasa, Background Memory Allocation for Multi-dimensional Signal Processing, Ph.D. thesis, ESAT/EE Dept., K.U.Leuven, Belgium, 1995.
|
| |
18
|
Kamal S. Khouri , Ganesh Lakshminarayana , Niraj K. Jha, Memory binding for performance optimization of control-flow intensive behaviors, Proceedings of the 1999 IEEE/ACM international conference on Computer-aided design, p.482-488, November 07-11, 1999, San Jose, California, United States
|
 |
19
|
Julio L. da Silva, Jr. , Francky Catthoor , Diederik Verkest , Hugo de Man, Power exploration for dynamic data types through virtual memory management refinement, Proceedings of the 1998 international symposium on Low power electronics and design, p.311-316, August 10-12, 1998, Monterey, California, United States
[doi> 10.1145/280756.280944]
|
| |
20
|
|
| |
21
|
|
| |
22
|
ATOMIUM Project, IMEC, http://www.imec.be/atomium.
|
| |
23
|
F. Vahid, "Techniques for minimizing and balancing I/O during functional partitioning," IEEE Trans. Computer-Aided Design, vol. 18, no. 1, pp. 69--75, Jan. 1999.
|
| |
24
|
Christoforos E. Kozyrakis , Stylianos Perissakis , David Patterson , Thomas Anderson , Krste Asanovic , Neal Cardwell , Richard Fromm , Jason Golbus , Benjamin Gribstad , Kimberly Keeton , Randi Thomas , Noah Treuhaft , Katherine Yelick, Scalable Processors in the Billion-Transistor Era: IRAM, Computer, v.30 n.9, p.75-78, September 1997
[doi> 10.1109/2.612252]
|
 |
25
|
|
| |
26
|
Y. Kang, M. Huang, S. Yoo, Z. Ge, D. Keen, V. Lam, P. Pattnaik, and J. Torrellas, "Flexram: Toward an advanced intelligent memory system," Oct. 1999.
|
 |
27
|
Ken Mai , Tim Paaske , Nuwan Jayasena , Ron Ho , William J. Dally , Mark Horowitz, Smart Memories: a modular reconfigurable architecture, Proceedings of the 27th annual international symposium on Computer architecture, p.161-171, June 2000, Vancouver, British Columbia, Canada
|
| |
28
|
|
| |
29
|
|
| |
30
|
|
| |
31
|
J. Ramanujam and P. Sadayappan, "Tiling multidimensional iteration spaces for multicomputers," J. Parallel & Distributed Computing, vol. 16, no. 2, pp. 108--230, 1992.
|
| |
32
|
|
 |
33
|
Roni Potasman , Joseph Lis , Alexandru Nicolau , Daniel Gajski, Percolation based synthesis, Proceedings of the 27th ACM/IEEE conference on Design automation, p.444-449, June 24-27, 1990, Orlando, Florida, United States
[doi> 10.1145/123186.123333]
|
| |
34
|
K. Wakabayashi, C-Based High-Level Synthesis System, "CYBER"-Design Experience-, vol. 41, pp. 264--268, July 2000.
|
| |
35
|
SYNOPSYS Design Compiler, VSS and Cyclone User Manual, http://www.synopsys.com.
|
| |
36
|
TSMC 0.25mm Process High-Density Single-Port SRAM (HD-SRAM-SP) Generator User Manual, http://www.artisan.com.
|
| |
37
|
Cadence Openbook SE 5.3, IC 4.4.5 and LVD 3.0, http://www.cadence.com.
|
CITED BY 4
|
|
|
Yan Meng , Andrew P. Brown , Ronald A. Iltis , Timothy Sherwood , Hua Lee , Ryan Kastner, MP core: algorithm and design techniques for efficient channel estimation in wireless applications, Proceedings of the 42nd annual conference on Design automation, June 13-17, 2005, San Diego, California, USA
|
|
|
|
|
|
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE Design Automation Conference on
Gwo-Dong Chen
, Daniel D. Gajski
|