|
ABSTRACT
In a multiprocessor system-on-chip (MPSoC) private caches introduce the cache coherence problem. Here, we target at heterogeneous MPSoCs with a network-on-chip (NoC). Existing hardware cache coherence protocols are less suitable for MPSoCs because many off-the-shelf processors used in MPSoCs do not support these protocols. Furthermore, these protocols typically rely on global visibility and serialization of writes which does not match well with the parallel point-to-point communication provided by a NoC. Therefore, we propose a software cache coherence protocol, which can be applied in a heterogeneous MPSoC with a NoC. The software cache coherence protocol relies on explicit synchronization in the software. More specifically, caches are guaranteed to be coherent according to the Release Consistency model, on top of which we have implemented the standard Pthreads communication library. Heterogeneous MPSoCs with off-the-shelf processors can easily be supported, because processors are only required to provide cache control operations, e.g., clean and invalidate. All cache coherence operations are interruptible and do not impact the execution of tasks on other processors, therefore this protocol is suitable for predictable MPSoCs. Our software cache coherence protocol is implemented on an ARM926EJ-S MPSoC which is mapped on an FPGA. From experiments we conclude that the protocol overhead is low for the applications taken from the SPLASH-2 benchmark set. For these applications we observed a speedup between 1.89 and 2.01 on the two processor MPSoC.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
The POSIX Threads Standard. ISO/IEC standard 9945-1:1996, also known as ANSI/IEEE POSIX 1003.1-1995.
|
| |
2
|
S. Adve and K. Gharachorloo. Shared memory consistency models: a tutorial. Computer, 29(12):66--76, Dec 1996.
|
| |
3
|
H.-J. Boehm. Threads cannot be implemented as a library. In Proc. PLDI, pages 261--268, New York, NY, USA, 2005. ACM.
|
| |
4
|
D. Culler, J. P. Singh, and A. Gupta. Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann, 1999.
|
| |
5
|
M. Dubois, C. Scheurich, and F. Briggs. Memory access buffering in multiprocessors. SIGARCH Comput. Archit. News, 14(2):434--442, 1986.
|
| |
6
|
S. F. Fahmy, B. Ravidran, and E. Jensen. On bounding response times under software transactional memory in distributed multiprocessor real-time systems. In Proc. DATE, 2009.
|
| |
7
|
K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proc. of the 17th Annual International Symposium on Computer Architecture, pages 15--26, 1990.
|
| |
8
|
L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput., 28(9):690--691, September 1979.
|
| |
9
|
J. Laudon and D. Lenoski. The SGI Origin: a ccNUMA highly scalable server. In Proc. The 24th Annual International Symposium on Computer Architecture, pages 241--251, 1997.
|
| |
10
|
D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam. The stanford Dash multiprocessor. Computer, 25(3):63--79, Mar 1992.
|
| |
11
|
F. Petrot, A. Greiner, and P. Gomez. On cache coherency and memory consistency issues in NoC based shared memory multiprocessor SoC architectures. Proc. DSD, pages 53--60, 2006.
|
| |
12
|
H. Sandhu, B. Gamsa, and S. Zhou. The shared regions approach to software cache coherence on multiprocessors. ACM SIGPLAN Notices, 28(7):229--238, 1993.
|
| |
13
|
T. Suh, D. Blough, and H.-H. Lee. Supporting cache coherence in heterogeneous multiprocessor systems. In Proc. DATE, volume 2, pages 1150--1155 Vol.2, Feb. 2004.
|
| |
14
|
I. Tartalja and V. Milutinovic. An approach to dynamic software cache consistency maintenance based on conditional invalidation. Proc. of the Twenty-Fifth Hawaii International Conference on System Sciences, pages 457--466 vol.1, Jan 1992.
|
| |
15
|
J.-W. van de Waerdt, S. Vassiliadis, J.-P. van Itegem, and H. van Antwerpen. The TM3270 media-processor data cache. In Proc. Computer Design: VLSI in Computers and Processors, ICCD, pages 334--341, Oct. 2005.
|
| |
16
|
J. van den Brand and M. Bekooij. Streaming consistency: a model for efficient MPSoC design. Proc. DSD, pages 27--34, 2007.
|
| |
17
|
S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta. The SPLASH-2 programs: characterization and methodological considerations. Proc. of the 22nd Annual International Symposium on Computer Architecture, pages 24--36, Jun 1995.
|
|