| A memory system design framework: creating smart memories |
| Full text |
Pdf
(718 KB)
|
Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 36th annual international symposium on Computer architecture
table of contents
Austin, TX, USA
SESSION: Memory system reconfiguration and acceleration
table of contents
Pages 406-417
Year of Publication: 2009
ISBN:978-1-60558-526-0
Also published in ...
|
|
Authors
|
|
Amin Firoozshahian
|
Hicamp Systems Inc., Menlo Park, CA, USA
|
|
Alex Solomatnikov
|
Hicamp Systems Inc., Menlo Park, CA, USA
|
|
Ofer Shacham
|
Stanford University, Stanford, CA, USA
|
|
Zain Asgar
|
Stanford University, Stanford, CA, USA
|
|
Stephen Richardson
|
Stanford University, Stanford, CA, USA
|
|
Christos Kozyrakis
|
Stanford University, Stanford, CA, USA
|
|
Mark Horowitz
|
Stanford University, Stanford, CA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 120, Downloads (12 Months): 296, Citation Count: 0
|
|
|
ABSTRACT
As CPU cores become building blocks, we see a great expansion in the types of on-chip memory systems proposed for CMPs. Unfortunately, designing the cache and protocol controllers to support these memory systems is complex, and their concurrency and latency characteristics significantly affect the performance of any CMP. To address this problem, this paper presents a microarchitecture framework for cache and protocol controllers, which can aid in generating the RTL for new memory systems. The framework consists of three pipelined engines' request-tracking, state-manipulation, and data movement' which are programmed to implement a higher-level memory model. This approach simplifies the design and verification of CMP systems by decomposing the memory model into sequences of state and data manipulations. Moreover, implementing the framework itself produces a polymorphic memory system. To validate the approach, we implemented a scalable, flexible CMP in silicon. The memory system was then programmed to support three disparate memory models' cache coherent shared memory, streams and transactional memory. Measured overheads of this approach seem promising. Our system generates controllers with performance overheads of less than 20% compared to an ideal controller with zero internal latency. Even the overhead of directly implementing a fully programmable controller was modest. While it did double the controller's area, the amortized effective area in the system grew by roughly 7%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
G. Grohoski, "Niagara-2: A Highly Threaded Server-on-a-Chip," 18th Hot Chips Symposium, August 2006.
|
| |
3
|
Brucek Khailany , William J. Dally , Ujval J. Kapasi , Peter Mattson , Jinyung Namkoong , John D. Owens , Brian Towles , Andrew Chang , Scott Rixner, Imagine: Media Processing with Streams, IEEE Micro, v.21 n.2, p.35-46, March 2001
[doi> 10.1109/40.918001]
|
| |
4
|
D. Pham, S. Asano, M. Bolliger, M.N. Day, H.P. Hofstee, C. Johns, J. Kahle, A. Kameyama, J. Keaty, Y. Masubuchi, M. Riley, D. Shippy, D. Stasiak, M. Suzuoki, M. Wang, J. Warnock, S. Weitzel, D. Wendel, T. Yamazaki, K. Yazawa, "The Design and Implementation of a First-Generation CELL Processor," Digest of Technical Papers, ISSCC, Vol. 1, pp. 184--185, February 2005.
|
| |
5
|
Lance Hammond , Benedict A. Hubbert , Michael Siu , Manohar K. Prabhu , Michael Chen , Kunle Olukotun, The Stanford Hydra CMP, IEEE Micro, v.20 n.2, p.71-84, March 2000
[doi> 10.1109/40.848474]
|
 |
6
|
|
 |
7
|
|
 |
8
|
Ken Mai , Tim Paaske , Nuwan Jayasena , Ron Ho , William J. Dally , Mark Horowitz, Smart Memories: a modular reconfigurable architecture, ACM SIGARCH Computer Architecture News, v.28 n.2, p.161-171, May 2000
|
 |
9
|
Karthikeyan Sankaralingam , Ramadass Nagarajan , Haiming Liu , Changkyu Kim , Jaehyuk Huh , Doug Burger , Stephen W. Keckler , Charles R. Moore, Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture, Proceedings of the 30th annual international symposium on Computer architecture, June 09-11, 2003, San Diego, California
|
 |
10
|
Lance Hammond , Vicky Wong , Mike Chen , Brian D. Carlstrom , John D. Davis , Ben Hertzberg , Manohar K. Prabhu , Honggo Wijaya , Christos Kozyrakis , Kunle Olukotun, Transactional Memory Coherence and Consistency, Proceedings of the 31st annual international symposium on Computer architecture, p.102, June 19-23, 2004, München, Germany
|
| |
11
|
K.E. Moore, J. Bobba, M.J. Moravan, M.D. Hill, D.A. Wood, "LogTM: Log-Based Transactional Memory," HPCA-12, pp. 254--265, 2006.
|
 |
12
|
S. K. Reinhardt , J. R. Larus , D. A. Wood, Tempest and typhoon: user-level shared memory, Proceedings of the 21st annual international symposium on Computer architecture, p.325-336, April 18-21, 1994, Chicago, Illinois, United States
|
 |
13
|
J. Kuskin , D. Ofelt , M. Heinrich , J. Heinlein , R. Simoni , K. Gharachorloo , J. Chapin , D. Nakahira , J. Baxter , M. Horowitz , A. Gupta , M. Rosenblum , J. Hennessy, The Stanford FLASH multiprocessor, Proceedings of the 21st annual international symposium on Computer architecture, p.302-313, April 18-21, 1994, Chicago, Illinois, United States
|
| |
14
|
|
| |
15
|
|
| |
16
|
Tensilica, Webpage: http://www.tensilica.com/
|
| |
17
|
|
| |
18
|
J. Carter , W. Hsieh , L. Stoller , M. Swanson , L. Zhang , E. Brunvand , A. Davis , C.-C. Kuo , R. Kuramkote , M. Parker , L. Schaelicke , T. Tateyama, Impulse: Building a Smarter Memory Controller, Proceedings of the 5th International Symposium on High Performance Computer Architecture, p.70, January 09-12, 1999
|
| |
19
|
|
 |
20
|
Anant Agarwal , Ricardo Bianchini , David Chaiken , Kirk L. Johnson , David Kranz , John Kubiatowicz , Beng-Hong Lim , Kenneth Mackenzie , Donald Yeung, The MIT Alewife machine: architecture and performance, Proceedings of the 22nd annual international symposium on Computer architecture, p.2-13, June 22-24, 1995, S. Margherita Ligure, Italy
|
| |
21
|
S. Narayanasamy, B. Carneal, B. Calder, Patching Processor Design Errors, ICCD, pp. 491--498, October 2006.
|
| |
22
|
|
| |
23
|
I. Wagner, V. Bertacco, T. Austin, "Using Field-Repairable Control Logic to Correct Design Errors in Microprocessors", IEEE Transactions on Computer-Aided Design (TCAD), Vol. 27, Issue 2, pp. 380--393, February 2008.
|
| |
24
|
A.K. Nanda, A.-T. Nguyen, M.M. Michael, D.J. Joseph, "High-Throughput Coherence Controllers," HPCA-6, pp. 145--155, 2000.
|
| |
25
|
|
 |
26
|
Maged M. Michael , Ashwini K. Nanda , Beng-Hong Lim , Michael L. Scott, Coherence controller architectures for SMP-based CC-NUMA multiprocessors, Proceedings of the 24th annual international symposium on Computer architecture, p.219-228, June 01-04, 1997, Denver, Colorado, United States
|
| |
27
|
IBM tutorial, "Cell Broadband Engine solution, Software Development Kit v3.1: SPE configuration."
|
| |
28
|
|
| |
29
|
|
| |
30
|
Sun Microsystems, Inc., "OpenSPARC(tm) T1 Microarchitecture Specification," Part No. 819-6650-10, August 2006, Revision A. http://opensparc-t1.sunsource.net/ specs/OpenSPARCT1_Micro_Arch.pdf
|
|