ACM Home Page
Please provide us with feedback. Feedback
Latency and bandwidth efficient communication through system customization for embedded multiprocessors
Full text PdfPdf (300 KB)
Source Annual ACM IEEE Design Automation Conference archive
Proceedings of the 45th annual Design Automation Conference table of contents
Anaheim, California
SESSION: Multi-core design tools and architectures table of contents
Pages 766-771  
Year of Publication: 2008
ISBN ~ ISSN:0738-100X , 978-1-60558-115-6
Authors
Chenjie Yu  University of Maryland College Park
Peter Petrov  University of Maryland College Park
Sponsors
SIGDA: ACM Special Interest Group on Design Automation
: IEEE/CASS/CANDE/CEDA
: The EDA Consortium
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 45,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1391469.1391665
What is a DOI?

ABSTRACT

We present a cross-layer customization methodology for latency and bandwidth efficient inter-core communication in embedded multiprocessors. The methodology integrates compiler, operating system, and hardware support to achieve a bandwidth efficient, snoop-free, and coherence cache miss-free shared memory communication between synchronized producer and consumers cores. A compiler-driven code transformation is introduced that utilizes a simple ISA support in the form of a special write-through store instruction. It ensures that producer writes are propagated to the consumers with a single bus transaction per cache block when the producer performs the last write to that cache line before exiting its synchronization region. Information regarding the shared buffers involved in the communications is captured by the OS and provided to the cores with the purpose of filtering bus traffic and performing remote updates when necessary. The end result of the proposed methodology is a single bus transaction per shared cache block and snoop-free communication between a producer and a set of consumers with no intervening coherence misses on the consumer caches. Our experiments demonstrate the significant reductions in both bus traffic and cache misses for a set of multiprocessor benchmarks.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
4
5
6
7
 
8
 
9
 
10


Collaborative Colleagues:
Chenjie Yu: colleagues
Peter Petrov: colleagues