ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Efficiency and scalability of barrier synchronization on NoC based many-core architectures
Full text PdfPdf (818 KB)
Source
International Conference on Compilers, Architecture and Synthesis for Embedded Systems archive
Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems table of contents
Atlanta, GA, USA
SESSION: Multiprocessors table of contents
Pages: 81-90  
Year of Publication: 2008
ISBN:978-1-60558-469-0
Authors
Oreste Villa  Pacific Northwest National Laboratory, Richland, WA, USA
Gianluca Palermo  Politecnico di Milano, Milano, Italy
Cristina Silvano  Politecnico di Milano, Milano, Italy
Sponsors
SIGDA: ACM Special Interest Group on Design Automation
ACM: Association for Computing Machinery
SIGBED: ACM Special Interest Group on Embedded Systems
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 23,   Downloads (12 Months): 160,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1450095.1450110
What is a DOI?

ABSTRACT

Interconnects based on Networks-on-Chip are an appealing solution to address future microprocessor designs where, very likely, hundreds of cores will be connected on a single chip. A fundamental role in highly parallelized applications running on many-core architectures will be played by barrier primitives used to synchronize the execution of parallel processes. This paper focuses on the analysis of the efficiency and scalability of different barrier implementations in many-core architectures based on NoCs. Several message passing barrier implementations based on four algorithms (all-to-all, master-slave, butterfly and tree) have been implemented and evaluated for a single-chip target architecture composed of a variable number of cores (from 4 to 128) and different network topologies (mesh, torus, ring, clustered-ring and fat-tree). Using a cycle-accurate simulator, we show the scalability of each barrier for every NoC topology, analyzing and comparing theoretical with real behaviors. We observed that some barrier algorithms, when implemented in hardware or software, show a different scaling behavior with respect to those theoretically expected. We evaluate the efficiency of each combination topology-barrier, demonstrating that, in many cases, simple network topologies can be more efficient than complex and highly connected topologies.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
SystemC 2.0 User's Guide5.
 
2
K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. The landscape of parallel computing research: A view from berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, Dec 2006.
 
3
4
 
5
Cavium. Octeon plus cn58xx multi-core mips64. Available at: http://www.cavium.com/OCTEONPlus_CN58XX.html.
6
 
7
IBM. PowerPC 750 RISC microprocessor technical summary. Available at: http://www-3.ibm.com/chips/techlib/techlib.nsf/techdocs/750_ts.pdf, January 1998.
 
8
Intel. From a few cores to many: A tera-scale computing research overview. Available at: ftp://download.intel.com/research/platform/terascale/terascale_overview_paper.pdf.
 
9
10
 
11
M. Monchiero, G. Palermo, C. Silvano, and O. Villa. Efficient synchronization for embedded on-chip multiprocessors. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 14(10):1049--1062, October 2006.
12
13
 
14
G. Palermo and C. Silvano. PIRATE: A framework for power/performance exploration of network-on-chip architectures. In PATMOS-04: Proceedings of International Workshop on Power and Timing Modeling, Optimization and Simulation, September 2004.
15
 
16
 
17
Tilera. Tile64 processor family. Available at: http://www.tilera.com/pdf/ProBrief_Tile64.pdf.
 
18
 
19
W. Yu, D. Buntinas, R. L. Graham, and D. K. Panda. Efficient and scalable barrier over quadrics and myrinet with a new nic-based collective message passing protocol. ipdps, 09:182b, 2004.

Collaborative Colleagues:
Oreste Villa: colleagues
Gianluca Palermo: colleagues
Cristina Silvano: colleagues