ACM Home Page
Please provide us with feedback. Feedback
Operation chaining asynchronous pipelined circuits
Full text PdfPdf (267 KB)
Source International Conference on Computer Aided Design archive
Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design table of contents
San Jose, California
SESSION: High level synthesis table of contents
Pages 442-449  
Year of Publication: 2007
ISBN ~ ISSN:1092-3152 , 1-4244-1382-6
Authors
Girish Venkataramani  Carnegie Mellon University, Pittsburgh, PA
Seth C. Goldstein  Carnegie Mellon University, Pittsburgh, PA
Sponsors
: IEEE CASS/CANDE
SIGDA: ACM Special Interest Group on Design Automation
IEEE-CS\DATC : IEEE Computer Society
CEDA : Council on Electronic Design Automation
Publisher
IEEE Press  Piscataway, NJ, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 29,   Citation Count: 1
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

We define operation chaining (op-chaining) as an optimization problem to determine the optimal pipeline depth for balancing performance against energy demands in pipelined asynchronous designs. Since there are no clock period requirements, asynchronous pipeline stages can have non-uniform latencies. We exploit this fact to coalesce several stages together thereby saving power and area due to the elimination of control-path resources from the pipeline. The trade-off is potentially reduced pipeline parallelism.

In this paper, we formally define this optimization as a graph covering problem, which finds sub-graphs that will be synthesized as an opchained pipeline stage. We then define the solution space for provably correct solutions and present an algorithm to efficiently search this space. The search technique partitions the graph based on post-dominator relationships to find sub-graphs that are potential op-chain candidates. We use knowledge of the Global Critical Path (GCP) [13] to evaluate the performance impact of accepting a candidate sub-graph and formulate a heuristic cost function to model this trade-off. The algorithm has a quadratic-time complexity in the size of the dataflow graph. We have implemented this algorithm within an automated asynchronous synthesis toolchain [12]. Experimental evidence from applying the algorithm on several media processing kernels reveals that the average energy-delay and energy-delay-area products improve by about 1.4x and 1.8x respectively, with a maximum improvement of 5x and 18x.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
5
 
6
 
7
 
8
 
9
10
11
 
12
G. Venkataramani, M. Budiu, et al. C to asynchronous dataflow circuits: An end-to-end toolflow. In IWLS, June 2004.
13
 
14
 
15
 
16

Collaborative Colleagues:
Girish Venkataramani: colleagues
Seth C. Goldstein: colleagues