ACM Home Page
Please provide us with feedback. Feedback
Comparative evaluation of memory models for chip multiprocessors
Full text PdfPdf (1.59 MB)
Source
ACM Transactions on Architecture and Code Optimization (TACO) archive
Volume 5 ,  Issue 3  (November 2008) table of contents
Article No. 12  
Year of Publication: 2008
ISSN:1544-3566
Authors
Jacob Leverich  Stanford University
Hideho Arakida  Stanford University
Alex Solomatnikov  Stanford University
Amin Firoozshahian  Stanford University
Mark Horowitz  Stanford University
Christos Kozyrakis  Stanford University
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 46,   Downloads (12 Months): 524,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1455650.1455651
What is a DOI?

ABSTRACT

There are two competing models for the on-chip memory in Chip Multiprocessor (CMP) systems: hardware-managed coherent caches and software-managed streaming memory. This paper performs a direct comparison of the two models under the same set of assumptions about technology, area, and computational capabilities. The goal is to quantify how and when they differ in terms of performance, energy consumption, bandwidth requirements, and latency tolerance for general-purpose CMPs. We demonstrate that for data-parallel applications on systems with up to 16 cores, the cache-based and streaming models perform and scale equally well. For certain applications with little data reuse, streaming scales better due to better bandwidth use and macroscopic software prefetching. However, the introduction of techniques such as hardware prefetching and nonallocating stores to the cache-based model eliminates the streaming advantage. Overall, our results indicate that there is not sufficient advantage in building streaming memory systems where all on-chip memory structures are explicitly managed. On the other hand, we show that streaming at the programming model level is particularly beneficial, even with the cache-based model, as it enhances locality and creates opportunities for bandwidth optimizations. Moreover, we observe that stream programming is actually easier with the cache-based model because the hardware guarantees correct, best-effort execution even when the programmer cannot fully regularize an application's code.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
 
4
Andrews, J. and Backer, N. 2005. Xbox360 system architecture. In Conference Record of Hot Chips 17. Stanford, CA.
5
 
6
Chen, Y.-K., Li, E. Q., Zhou, X., and Ge, S. 2006. Implementation of h.264 encoder and decoder on personal computers. J. Visual Communication and Image Representation 17, 2, 509--532.
 
7
 
8
 
9
 
10
 
11
Drake, M., Hoffmann, H., Rabbah, R., and Amarasinghe, S. 2006. Mpeg-2 decoding in a stream programming language. In Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium, Rhodes Island (IPDPS).
 
12
Eatherton, W. 2005. The push of network processing to the top of the pyramid. Keynote presentation at the Symposium on Architectures for Networking and Communication Systems, Princeton, NJ.
13
14
15
16
 
17
Gschwind, M. et al. 2005. A novel SIMD architecture for the cell heterogeneous chip-multiprocessor. In Conference Record of Hot Chips 17.
18
 
19
 
20
 
21
Havran, V. 2002. Heuristic ray shooting algorithms. Ph.D. thesis, Czech Technical University in Prague.
 
22
Heinlein, J., Gharachorloo, K., Dresser, S., and Gupta, A. 1994. Integration of message passing and shared memory in the stanford flash multiprocessor. SIGOPS Oper. Syst. Rev. 28, 5, 38--50.
 
23
Ho, R., Mai, K., and Horowitz, M. 2001. The Future of wires. Proceedings of the IEEE 89, 4 (Apr.).
 
24
Ho, R., Mai, K., and Horowitz, M. 2003. Efficient on-chip global interconnects. In Symposium on VLSI Circuits. 271--274.
 
25
Horowitz, M. and Dally, W. 2004. How scaling will change processor architecture. In Proceedings of the International Solid-State Circuits Conference. 132--133.
 
26
Independent JPEG Group. 1998. IJG's JPEG Software Release 6b.
 
27
ITU-T Rec. H.264. 2002. ISO/IEC 144496-10 AVC. 2002.
 
28
Jani, D., Ezer, G., and Kim, J. 2004. Long words and wide ports: Reinventing the Configurable Processor. In Proceedings of the Conference Record of Hot Chips 16. Stanford, CA.
 
29
 
30
Khailany, B., Williams, T., Lin, J., Long, E., Rygh, M., Tovey, D., and Dally, W. 2008. A programmable 512 gops stream processor for signal, image, and video processing. IEEE Journal of Solid-State Circuits 43, 1, 202--213.
31
 
32
Kongetira, P. 2004. A 32-way Multithreaded sparc processor. In Proceedings of the Conference Record of Hot Chips.
33
34
35
 
36
 
37
Li, M. et al. 2005. ALP: efficient support for all levels of parallelism for complex Media applications. Tech. Rep. UIUCDCS-R-2005-2605, UIUC CS. July.
38
 
39
Lin, Y. 2004. A programmable Vector coprocessor architecture for wireless applications. In Proceedings of the 3rd Workshop on Application Specific Processors.
 
40
 
41
Machnicki, E. 2005. Ultra high performance scalable DSP family for multimedia. In Proceedings of the Conference Record of Hot Chips 17.
42
 
43
MIPS32 2001. MIPS32 Architecture For Programmers Volume II: The MIPS32 Instruction Set. MIPS Technologies, Inc.
44
 
45
MPEG Software Simulation Group. Mssg mpeg2 encoder and decoder. Available at: http://www.mpeg.org/MPEG/MSSG/.
46
47
 
48
Tarjan, D., Thoziyoor, S., and Jouppi, N. P. 2006. CACTI 4.0. Tech. Rep. HPL-2006-86, HP Labs.
49
 
50
Tensilica 2007. Tensilica Software Tools. http://www.tensilica.com/products/software.htm.
51
52
 
53
54
 
55
Yeh, T.-Y. 2005. The low-power high-performance architecture of the PWRficient processor family. In Proceedings of the Conference Record of Hot Chips 17.

Collaborative Colleagues:
Jacob Leverich: colleagues
Hideho Arakida: colleagues
Alex Solomatnikov: colleagues
Amin Firoozshahian: colleagues
Mark Horowitz: colleagues
Christos Kozyrakis: colleagues