| Operation chaining asynchronous pipelined circuits |
| Full text |
Pdf
(267 KB)
|
| Source
|
International Conference on Computer Aided Design
archive
Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design
table of contents
San Jose, California
SESSION: High level synthesis
table of contents
Pages 442-449
Year of Publication: 2007
ISBN ~ ISSN:1092-3152 , 1-4244-1382-6
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
IEEE Press
Piscataway, NJ, USA
|
| Bibliometrics |
Downloads (6 Weeks): 8, Downloads (12 Months): 29, Citation Count: 1
|
|
|
ABSTRACT
We define operation chaining (op-chaining) as an optimization problem to determine the optimal pipeline depth for balancing performance against energy demands in pipelined asynchronous designs. Since there are no clock period requirements, asynchronous pipeline stages can have non-uniform latencies. We exploit this fact to coalesce several stages together thereby saving power and area due to the elimination of control-path resources from the pipeline. The trade-off is potentially reduced pipeline parallelism. In this paper, we formally define this optimization as a graph covering problem, which finds sub-graphs that will be synthesized as an opchained pipeline stage. We then define the solution space for provably correct solutions and present an algorithm to efficiently search this space. The search technique partitions the graph based on post-dominator relationships to find sub-graphs that are potential op-chain candidates. We use knowledge of the Global Critical Path (GCP) [13] to evaluate the performance impact of accepting a candidate sub-graph and formulate a heuristic cost function to model this trade-off. The algorithm has a quadratic-time complexity in the size of the dataflow graph. We have implemented this algorithm within an automated asynchronous synthesis toolchain [12]. Experimental evidence from applying the algorithm on several media processing kernels reveals that the average energy-delay and energy-delay-area products improve by about 1.4x and 1.8x respectively, with a maximum improvement of 5x and 18x.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Mihai Budiu , Girish Venkataramani , Tiberiu Chelcea , Seth Copen Goldstein, Spatial computation, Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, October 07-13, 2004, Boston, MA, USA
|
 |
2
|
|
 |
3
|
|
| |
4
|
|
 |
5
|
|
| |
6
|
|
| |
7
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
8
|
|
| |
9
|
|
 |
10
|
|
 |
11
|
Rajeev K. Ranjan , Vigyan Singhal , Fabio Somenzi , Robert K. Brayton, On the optimization power of retiming and resynthesis transformations, Proceedings of the 1998 IEEE/ACM international conference on Computer-aided design, p.402-407, November 08-12, 1998, San Jose, California, United States
[doi> 10.1145/288548.289061]
|
| |
12
|
G. Venkataramani, M. Budiu, et al. C to asynchronous dataflow circuits: An end-to-end toolflow. In IWLS, June 2004.
|
 |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
Ying Yi , Ioannis Nousias , Mark Milward , Sami Khawam , Tughrul Arslan , Iain Lindsay, System-level scheduling on instruction cell based reconfigurable systems, Proceedings of the conference on Design, automation and test in Europe: Proceedings, March 06-10, 2006, Munich, Germany
|
|