|
ABSTRACT
Hardware trends suggest that large-scale CMP architectures, with tens to hundreds of processing cores on a single piece of silicon, are iminent within the next decade. While existing CMP machines have traditionally been handled in the same way as SMPs, this magnitude of parallelism introduces several fundamental challenges at the architectural level and this, in turn, translates to novel challenges in the design of the software stack for these platforms. This paper presents the "Many Core Run Time" (McRT), a software prototype of an integrated language runtime that was designed to explore configurations of the software stack for enabling performance and scalability on large scale CMP platforms. This paper presents the architecture of McRT and discusses our experiences with the system, including experimental evaluation that lead to several interesting, non-intuitive findings, providing key insights about the structure of the system stack at this scale. A key contribution of this paper is to demonstrate how McRT enables near linear improvements in performance and scalability for desktop workloads such as the popular XviD encoder and a set of RMS (recognition, mining, and synthesis) applications. Another key contribution of this work is its use of McRT to explore non-traditional system configurations such as a light-weight executive in which McRT runs on "bare metal" and replaces the traditional OS. Such configurations are becoming an increasingly attractive alternative to leverage heterogeneous computing uints as seen in today's CPU-GPU configurations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Brian D. Marsh , Michael L. Scott , Thomas J. LeBlanc , Evangelos P. Markatos, First-class user-level threads, Proceedings of the thirteenth ACM symposium on Operating systems principles, p.110-121, October 13-16, 1991, Pacific Grove, California, United States
|
| |
2
|
|
| |
3
|
Next Generation POSIX Threading. http://www-124.ibm.com/pthreads/
|
| |
4
|
U. Drepper, and I. Molnar. The native POSIX thread library for Linux, Jan 2003. http://people.redhat.com/drepper/nptl-design.pdf.
|
| |
5
|
D. Vianney, Hyper-Threading speeds Linux, Jan 2003. http://www-128.ibm.com/developerworks/linux/library/l-htl/
|
| |
6
|
Microsoft Corp, Windows Support for Hyper-Threading technology, 2002. download.microsoft.com/download/5/7/7/577a5684-8a83-43ae-9272-ff260a9c20e2/Hyper-thread_Windows.doc
|
 |
7
|
|
 |
8
|
Rob von Behren , Jeremy Condit , Feng Zhou , George C. Necula , Eric Brewer, Capriccio: scalable threads for internet services, Proceedings of the nineteenth ACM symposium on Operating systems principles, October 19-22, 2003, Bolton Landing, NY, USA
|
| |
9
|
N. Nagarajaya, Improving Application Efficiency Through Chip Multi-Threading, Sun Developer Network, Mar 2005. developers.sun.com/solaris/articles/chip_multi_thread.html
|
| |
10
|
|
 |
11
|
Matt Welsh , David Culler , Eric Brewer, SEDA: an architecture for well-conditioned, scalable internet services, Proceedings of the eighteenth ACM symposium on Operating systems principles, October 21-24, 2001, Banff, Alberta, Canada
|
| |
12
|
S. Gribble, M. Welsh, R. von Behren, E. Brewer, D. Culler, N. Borisov, S. Czerwinski, R. Gummadi, J. Hill, A. Josheph, R. Katz, Z. Mao, S. Ross, and B. Zhao. The Ninja Architecture for Robust Internet-Scale Systems and Services. Sp. lss.: Computer Networks on Pervasive Computing 2000.
|
 |
13
|
|
 |
14
|
|
 |
15
|
D. R. Engler , M. F. Kaashoek , J. O'Toole, Jr., Exokernel: an operating system architecture for application-level resource management, Proceedings of the fifteenth ACM symposium on Operating systems principles, p.251-266, December 03-06, 1995, Copper Mountain, Colorado, United States
|
 |
16
|
Bryan Ford , Godmar Back , Greg Benson , Jay Lepreau , Albert Lin , Olin Shivers, The Flux OSKit: a substrate for kernel and language research, Proceedings of the sixteenth ACM symposium on Operating systems principles, p.38-51, October 05-08, 1997, Saint Malo, France
|
| |
17
|
|
 |
18
|
|
| |
19
|
|
 |
20
|
Michael B. Jones , Richard F. Rashid, Mach and Matchmaker: kernel and language support for object-oriented distributed systems, Conference proceedings on Object-oriented programming systems, languages and applications, p.67-77, September 29-October 02, 1986, Portland, Oregon, United States
|
 |
21
|
|
| |
22
|
The K42 project, IBM Research. http://www.research.ibm.com/k42/
|
| |
23
|
The K42/Tornado Operating System. http://www.eecg.toronto.edu/~tornado/
|
| |
24
|
T. G. Mattson and G. Henry. An overview of the Intel TFLOPS supercomputer. Intel Technology Journal, 1, 1998.
|
| |
25
|
Sharad Garg, Robert Godley, Richard Griffiths, Andrew Pfiffer, Terry Prickett, David Robboy, Stan Smith, T. Mack Stallcup, and Stephen Zeisset. Achieving large scale parallelism through operating system resource management on the Intel TFLOPS supercomputer. Intel Technology Journal, 1st quarter 1998.
|
| |
26
|
Ron Brightwell, Rolf Riesen, Keith D. Underwood, Trammell Hudson, Patrick G. Bridges, Arthur B. Maccabe: A Performance Comparison of Linux and a Lightweight Kernel. CLUSTER 2003: 251--258
|
| |
27
|
IBM Research Hypervisor. http://www.research.ibm.com/hypervisor/.
|
| |
28
|
Boris Dragovic, Keir Fraser, Steve Hand, Tim Harris, Alex Ho, Ian Pratt, Andrew Warfield, Paul Barham, and Rolf Neugebauer. Xen and the Art of Virtualization. SOSP, 2003.
|
 |
29
|
B. N. Bershad , S. Savage , P. Pardyak , E. G. Sirer , M. E. Fiuczynski , D. Becker , C. Chambers , S. Eggers, Extensibility safety and performance in the SPIN operating system, Proceedings of the fifteenth ACM symposium on Operating systems principles, p.267-283, December 03-06, 1995, Copper Mountain, Colorado, United States
|
 |
30
|
Robert D. Blumofe , Christopher F. Joerg , Bradley C. Kuszmaul , Charles E. Leiserson , Keith H. Randall , Yuli Zhou, Cilk: an efficient multithreaded runtime system, Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.207-216, July 19-21, 1995, Santa Barbara, California, United States
|
| |
31
|
|
| |
32
|
J. Rattner. Platform 2015. Intel Dev. Forum, Spring 2005.
|
| |
33
|
J. Rattner. Tera-Scale Research Program. Intel Dev. Forum, Spring 2006.
|
| |
34
|
|
 |
35
|
|
 |
36
|
Ali-Reza Adl-Tabatabai , Brian T. Lewis , Vijay Menon , Brian R. Murphy , Bratin Saha , Tatiana Shpeisman, Compiler and runtime support for efficient software transactional memory, Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation, June 11-14, 2006, Ottawa, Ontario, Canada
|
 |
37
|
Bratin Saha , Ali-Reza Adl-Tabatabai , Richard L. Hudson , Chi Cao Minh , Benjamin Hertzberg, McRT-STM: a high performance software transactional memory system for a multi-core runtime, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, March 29-31, 2006, New York, New York, USA
[doi> 10.1145/1122971.1123001]
|
 |
38
|
Richard L. Hudson , Bratin Saha , Ali-Reza Adl-Tabatabai , Benjamin C. Hertzberg, McRT-Malloc: a scalable transactional memory allocator, Proceedings of the 5th international symposium on Memory management, June 10-11, 2006, Ottawa, Ontario, Canada
[doi> 10.1145/1133956.1133967]
|
| |
39
|
|
| |
40
|
|
| |
41
|
P. Dubey. Recognition, Mining, and Synthesis moves computers to the era of tera. Technology@Intel, Feb 2005.
|
| |
42
|
Craig, T. S. Building FIFO and priority-queueing spin locks from atomic swap. Technical Report TR 93-02-02, Dept of Computer Science, University of Washington, Feb. 1993.
|
| |
43
|
|
 |
44
|
|
| |
45
|
|
| |
46
|
|
| |
47
|
B. So, A. M. Ghuloum, Y. Wu. Optimizing data parallel operations on many-core platforms. STMCS 2006.
|
| |
48
|
|
 |
49
|
|
 |
50
|
|
| |
51
|
IA-32 Intel Architecture Software Developer's Manual. Intel Corporation.
|
| |
52
|
Bradski, G.; Kaehler, A.; Pisarevsky, V. "Learning-Based Computer Vision with Intel's Open Source Computer Vision Library." Intel Technology Journal. http://developer.intel.com/technology/itj/2005/volume09issue02/art03_learning_vision/p01_abstract.htm. May 2005.
|
 |
53
|
|
CITED BY 5
|
|
|
|
|
|
|
|
|
|
|
Bratin Saha , Xiaocheng Zhou , Hu Chen , Ying Gao , Shoumeng Yan , Mohan Rajagopalan , Jesse Fang , Peinan Zhang , Ronny Ronen , Avi Mendelson, Programming model for a heterogeneous x86 platform, ACM SIGPLAN Notices, v.44 n.6, June 2009
|
|
|
|
INDEX TERMS
Primary Classification:
D.
Software
D.1
PROGRAMMING TECHNIQUES
D.1.3
Concurrent Programming
Additional Classification:
D.
Software
D.3
PROGRAMMING LANGUAGES
D.3.4
Processors
D.4
OPERATING SYSTEMS
D.4.0
General
General Terms:
Design,
Experimentation,
Measurement,
Performance
Keywords:
memory management,
multi-core processors,
parallel programming,
runtime design,
scheduler design,
sequestered mode,
synchronization primitives,
transactional memory
|