| Efficiency and scalability of barrier synchronization on NoC based many-core architectures |
| Full text |
Pdf
(818 KB)
|
Source
|
International Conference on Compilers, Architecture and Synthesis for Embedded Systems
archive
Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
table of contents
Atlanta, GA, USA
SESSION: Multiprocessors
table of contents
Pages 81-90
Year of Publication: 2008
ISBN:978-1-60558-469-0
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 20, Downloads (12 Months): 231, Citation Count: 0
|
|
|
ABSTRACT
Interconnects based on Networks-on-Chip are an appealing solution to address future microprocessor designs where, very likely, hundreds of cores will be connected on a single chip. A fundamental role in highly parallelized applications running on many-core architectures will be played by barrier primitives used to synchronize the execution of parallel processes. This paper focuses on the analysis of the efficiency and scalability of different barrier implementations in many-core architectures based on NoCs. Several message passing barrier implementations based on four algorithms (all-to-all, master-slave, butterfly and tree) have been implemented and evaluated for a single-chip target architecture composed of a variable number of cores (from 4 to 128) and different network topologies (mesh, torus, ring, clustered-ring and fat-tree). Using a cycle-accurate simulator, we show the scalability of each barrier for every NoC topology, analyzing and comparing theoretical with real behaviors. We observed that some barrier algorithms, when implemented in hardware or software, show a different scaling behavior with respect to those theoretically expected. We evaluate the efficiency of each combination topology-barrier, demonstrating that, in many cases, simple network topologies can be more efficient than complex and highly connected topologies.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
SystemC 2.0 User's Guide5.
|
| |
2
|
K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. The landscape of parallel computing research: A view from berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, Dec 2006.
|
| |
3
|
|
 |
4
|
|
| |
5
|
Cavium. Octeon plus cn58xx multi-core mips64. Available at: http://www.cavium.com/OCTEONPlus_CN58XX.html.
|
 |
6
|
|
| |
7
|
IBM. PowerPC 750 RISC microprocessor technical summary. Available at: http://www-3.ibm.com/chips/techlib/techlib.nsf/techdocs/750_ts.pdf, January 1998.
|
| |
8
|
Intel. From a few cores to many: A tera-scale computing research overview. Available at: ftp://download.intel.com/research/platform/terascale/terascale_overview_paper.pdf.
|
| |
9
|
|
 |
10
|
Andrea Marongiu , Luca Benini , Mahmut Kandemir, Lightweight barrier-based parallelization support for non-cache-coherent MPSoC platforms, Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, September 30-October 03, 2007, Salzburg, Austria
[doi> 10.1145/1289881.1289908]
|
| |
11
|
M. Monchiero, G. Palermo, C. Silvano, and O. Villa. Efficient synchronization for embedded on-chip multiprocessors. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 14(10):1049--1062, October 2006.
|
 |
12
|
|
 |
13
|
|
| |
14
|
G. Palermo and C. Silvano. PIRATE: A framework for power/performance exploration of network-on-chip architectures. In PATMOS-04: Proceedings of International Workshop on Power and Timing Modeling, Optimization and Simulation, September 2004.
|
 |
15
|
|
| |
16
|
|
| |
17
|
Tilera. Tile64 processor family. Available at: http://www.tilera.com/pdf/ProBrief_Tile64.pdf.
|
| |
18
|
|
| |
19
|
W. Yu, D. Buntinas, R. L. Graham, and D. K. Panda. Efficient and scalable barrier over quadrics and myrinet with a new nic-based collective message passing protocol. ipdps, 09:182b, 2004.
|
|