|
ABSTRACT
Chip-multiprocessors are quickly gaining momentum in all segments of computing. However, the practical success of CMPs strongly depends on addressing the difficulty of multithreaded application development. To address this challenge, it is necessary to co-develop new CMP architecture with novel programming models. Currently, architecture research relies on software simulators which are too slow to facilitate interesting experiments with CMP software without using small datasets or significantly reducing the level of detail in the simulated models. An alternative to simulation is to exploit the rich capabilities of modern FPGAs to create FPGA-based platforms for novel CMP research. This paper presents ATLAS, the first prototype for CMPs with hardware support for Transactional Memory (TM), a technology aiming to simplify parallel programming. ATLAS uses the BEE2 multi-FPGA board to provide a system with 8 PowerPC cores that run at 100MHz and runs Linux. ATLAS provides significant benefits for CMP research such as 100x performance improvement over a software simulator and good visibility that helps with software tuning and architectural improvements. In addition to presenting and evaluating ATLAS, we share our observations about building a FPGA-based framework for CMP research. Specifically, we address issues such as overall performance, challenges of mapping ASIC-style CMP RTL on to FPGAs, software support, the selection criteria for the base processor, and the challenges of using pre-designed IP libraries.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
H. Sutter, "The free lunch is over: A fundamental turn toward concurrency in software," Dr. Dobb's Journal, vol. 30, March 2005.
|
| |
2
|
|
 |
3
|
|
 |
4
|
Lance Hammond , Vicky Wong , Mike Chen , Brian D. Carlstrom , John D. Davis , Ben Hertzberg , Manohar K. Prabhu , Honggo Wijaya , Christos Kozyrakis , Kunle Olukotun, Transactional Memory Coherence and Consistency, Proceedings of the 31st annual international symposium on Computer architecture, p.102, June 19-23, 2004, München, Germany
|
| |
5
|
|
 |
6
|
|
| |
7
|
K. E. Moore, J. Bobba, M. J. Moravan, M. D. Hill, and D. A. Wood, "LogTM: Log-Based Transactional Memory," in 12th International Conference on High-Performance Computer Architecture, February 2006.
|
 |
8
|
|
 |
9
|
Maurice Herlihy , Victor Luchangco , Mark Moir , William N. Scherer, III, Software transactional memory for dynamic-sized data structures, Proceedings of the twenty-second annual symposium on Principles of distributed computing, p.92-101, July 13-16, 2003, Boston, Massachusetts
[doi> 10.1145/872035.872048]
|
 |
10
|
Tim Harris , Keir Fraser, Language support for lightweight transactions, Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications, October 26-30, 2003, Anaheim, California, USA
|
| |
11
|
A. Welc, S. Jagannathan, and A. L. Hosking, "Transactional monitors for concurrent objects," in Proceedings of the European Conference on Object-Oriented Programming (M. Odersky, ed.), vol. 3086 of Lecture Notes in Computer Science, pp. 519--542, Springer-Verlag, 2004.
|
 |
12
|
Bratin Saha , Ali-Reza Adl-Tabatabai , Richard L. Hudson , Chi Cao Minh , Benjamin Hertzberg, McRT-STM: a high performance software transactional memory system for a multi-core runtime, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, March 29-31, 2006, New York, New York, USA
[doi> 10.1145/1122971.1123001]
|
 |
13
|
|
| |
14
|
V. J. Marathe, W. N. Scherer III, and M. L. Scott, "Adaptive Software Transactional Memory," in 19th International Symposium on Distributed Computing, September 2005.
|
| |
15
|
Arvind, K. Asanovic, D. Chiou, J. C. Hoe, C. Kozyrakis, S.-L. Lu, M. Oskin, D. Patterson, J. Rabaey, and J. Wawrzynek, "RAMP: Research accelerator for multiple processors - a community vision for a shared experimental parallel HW/SW platform," tech. rep., 2005.
|
| |
16
|
|
| |
17
|
J. Chung, H. Chafi, C. Cao Minh, A. McDonald, B. D. Carlstrom, C. Kozyrakis, and K. Olukotun, "The Common Case Transactional Behavior of Multithreaded Programs," in Proceedings of the 12th International Conference on High-Performance Computer Architecture, February 2006.
|
 |
18
|
JaeWoong Chung , Chi Cao Minh , Austen McDonald , Travis Skare , Hassan Chafi , Brian D. Carlstrom , Christos Kozyrakis , Kunle Olukotun, Tradeoffs in transactional memory virtualization, Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, October 21-25, 2006, San Jose, California, USA
|
| |
19
|
Austen McDonald , JaeWoong Chung , Hassan Chafi , Chi Cao Minh , Brian D. Carlstrom , Lance Hammond , Christos Kozyrakis , Kunle Olukotun, Characterization of TCC on Chip-Multiprocessors, Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, p.63-74, September 17-21, 2005
[doi> 10.1109/PACT.2005.11]
|
 |
20
|
Koray Öner , Luiz A. Barroso , Sasan Iman , Jaeheon Jeong , Krishnan Ramamurthy , Michel Dubois, The design of RPM: an FPGA-based multiprocessor emulator, Proceedings of the 1995 ACM third international symposium on Field-programmable gate arrays, p.60-66, February 12-14, 1995, Monterey, California, United States
[doi> 10.1145/201310.201321]
|
 |
21
|
|
| |
22
|
D. Chiou, H. Sunjeliwala, D. Sunwoo, J. Xu, and N. Patil, "Fpga-based fast, cycle-accurate, full-system simulators," in 2nd Workshop on Architecture Research using FPGA Platforms, 12th International Symposium on High-Performance Computer Architecture (HPCA-12), February 2006.
|
| |
23
|
N. Dave, M. Pellauer, Arvind, and J. Emer, "Implementing a functional/timing partitioned microprocessor simulator with an fpga," in 2nd Workshop on Architecture Research using FPGA Platforms, 12th International Symposium on High-Performance Computer Architecture (HPCA-12), February 2006.
|
| |
24
|
D. A. Penry, D. Fay, D. Hodgdon, R. Wells, G. Schelle, D. I. August, and D. A. Connors, "Exploiting parallelism and structure to accelerate the simulation of chip multi-processors," in Proceedings of the 12th International Conference on High-Performance Computer Architecture, February 2006.
|
 |
25
|
Jumnit Hong , Eriko Nurvitadhi , Shih-Lien L. Lu, Design, implementation, and verification of active cache emulator (ACE), Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays, February 22-24, 2006, Monterey, California, USA
[doi> 10.1145/1117201.1117211]
|
| |
26
|
F. J. Mesa-Martinez et al., "SCOORE: Santa Cruz out-of-order RISC engine, FPGA design issues," in Workshop on Architectural Research Prototyping (WARP), held in conjunction with ISCA-33, 2006.
|
 |
27
|
Lance Hammond , Brian D. Carlstrom , Vicky Wong , Ben Hertzberg , Mike Chen , Christos Kozyrakis , Kunle Olukotun, Programming with transactional coherence and consistency (TCC), Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, October 07-13, 2004, Boston, MA, USA
|
 |
28
|
Richard A. Hankins , Gautham N. Chinya , Jamison D. Collins , Perry H. Wang , Ryan Rakvic , Hong Wang , John P. Shen, Multiple Instruction Stream Processor, Proceedings of the 33rd annual international symposium on Computer Architecture, p.114-127, June 17-21, 2006
|
 |
29
|
Hassan Chafi , Chi Cao Minh , Austen McDonald , Brian D. Carlstrom , JaeWoong Chung , Lance Hammond , Christos Kozyrakis , Kunle Olukotun, TAPE: a transactional application profiling environment, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
[doi> 10.1145/1088149.1088176]
|
| |
30
|
A. S. G. Gibeling and K. Asanovic, "The RAMP architecture & description language," tech. rep., 2005.
|
| |
31
|
J. D. Gilbert, S. H. Hunt, D. Gunadi, and G. Srinivasa, "TULSA, A Dual P4 Core Large Shared Cache Intel Xeon Processor for the MP Server Market Segment, Intel," in Conference Record of Hot Chips 18, 2006.
|
CITED BY 8
|
|
Njuguna Njoroge , Jared Casper , Sewook Wee , Yuriy Teslyar , Daxia Ge , Christos Kozyrakis , Kunle Olukotun, ATLAS: a chip-multiprocessor with transactional memory support, Proceedings of the conference on Design, automation and test in Europe, April 16-20, 2007, Nice, France
|
|
|
|
|
|
Eric S. Chung , Eriko Nurvitadhi , James C. Hoe , Babak Falsafi , Ken Mai, A complexity-effective architecture for accelerating full-system multiprocessor simulations using FPGAs, Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays, February 24-26, 2008, Monterey, California, USA
|
|
|
|
|
|
John Wawrzynek , David Patterson , Mark Oskin , Shih-Lien Lu , Christoforos Kozyrakis , James C. Hoe , Derek Chiou , Krste Asanovic, RAMP: Research Accelerator for Multiple Processors, IEEE Micro, v.27 n.2, p.46-57, March 2007
|
|
|
Eric S. Chung , Michael K. Papamichael , Eriko Nurvitadhi , James C. Hoe , Ken Mai , Babak Falsafi, ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs, ACM Transactions on Reconfigurable Technology and Systems (TRETS), v.2 n.2, p.1-32, June 2009
|
|
|
|
|
|
|
|