ABSTRACT
Recently, the number of cores on general-purpose processors has been increasing rapidly. Using conventional programming models, it is challenging to effectively exploit these cores for maximal performance. An interesting alternative candidate for programming multiple cores is the stream programming model, which provides a framework for writing programs in a sequential-style while greatly simplifying the task of automatic parallelization. It has been shown that not only traditional media/image applications but also more general-purpose data-intensive applications can be expressed in the stream programming style. In this paper, we investigate the potential to use the stream programming model to efficiently utilize commodity multicore general-purpose processors (e.g., Intel/AMD). Although several stream languages and stream compilers have recently been developed, they typically target special-purpose stream processors. In contrast, we propose a flexible software system, Streamware, which automatically maps stream programs onto a wide variety of general-purpose multicore processor configurations. We leverage existing compilation framework for stream processors and design a runtime environment which takes as input the output of these stream compilers in the form of machine-independent stream virtual machine code. The runtime environment assigns work to processor cores considering processor/cache configurations and adapts to workload variations. We evaluate this approach for a few general-purpose scientific applications on real hardware and a cycle-level simulator set-up to showcase scaling and contention issues. The results show that the stream programming model is a good choice for efficiently exploiting modern and future multicore CPUs for an important class of applications.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Intel Thread Building Blocks. osstbb.intel.com.
|
| |
2
|
MPI. www.open-mpi.org.
|
| |
3
|
NVidia G80. www.nvidia.com.
|
| |
4
|
OpenMP. www.openmp.org.
|
| |
5
|
RStream Compiler. www.reservoir.com.
|
| |
6
|
T. Barth. Simplified discontinuous Galerkin methods for systems of conservation laws with convex extension. In Discontinuous Galerkin Methods, volume 11 of Lecture Notes in Computational Science and Engineering. Springer-Verlag, Heidelberg, 1999.
|
| |
7
|
Y. Basar and M. Itskov. Constitutive model and finite element formulation for large strain elasto-plastic analysis of shells. In Journal of Computational Mechanics, Jun 1999.
|
| |
8
|
N. Binkert, E. Hallnor, and S. Reinhardt. Network-oriented full system simulation using M5. In CAECW, 2003.
|
 |
9
|
Bratin Saha , Ali-Reza Adl-Tabatabai , Richard L. Hudson , Chi Cao Minh , Benjamin Hertzberg, McRT-STM: a high performance software transactional memory system for a multi-core runtime, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, March 29-31, 2006, New York, New York, USA
[doi> 10.1145/1122971.1123001]
|
 |
10
|
Bratin Saha , Ali-Reza Adl-Tabatabai , Anwar Ghuloum , Mohan Rajagopalan , Richard L. Hudson , Leaf Petersen , Vijay Menon , Brian Murphy , Tatiana Shpeisman , Eric Sprangle , Anwar Rohillah , Doug Carmean , Jesse Fang, Enabling scalability and performance in a large scale CMP environment, Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, March 21-23, 2007, Lisbon, Portugal
|
 |
11
|
Ian Buck , Tim Foley , Daniel Horn , Jeremy Sugerman , Kayvon Fatahalian , Mike Houston , Pat Hanrahan, Brook for GPUs: stream computing on graphics hardware, ACM SIGGRAPH 2004 Papers, August 08-12, 2004, Los Angeles, California
|
| |
12
|
|
| |
13
|
William J. Dally , Francois Labonte , Abhishek Das , Patrick Hanrahan , Jung-Ho Ahn , Jayanth Gummaraju , Mattan Erez , Nuwan Jayasena , Ian Buck , Timothy J. Knight , Ujval J. Kapasi, Merrimac: Supercomputing with Streams, Proceedings of the 2003 ACM/IEEE conference on Supercomputing, p.35, November 15-21, 2003
|
 |
14
|
|
 |
15
|
Matteo Frigo , Charles E. Leiserson , Keith H. Randall, The implementation of the Cilk-5 multithreaded language, Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, p.212-223, June 17-19, 1998, Montreal, Quebec, Canada
|
 |
16
|
Michael I. Gordon , William Thies , Saman Amarasinghe, Exploiting coarse-grained task, data, and pipeline parallelism in stream programs, Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, October 21-25, 2006, San Jose, California, USA
|
| |
17
|
|
| |
18
|
|
 |
19
|
|
| |
20
|
|
 |
21
|
Jacob Leverich , Hideho Arakida , Alex Solomatnikov , Amin Firoozshahian , Mark Horowitz , Christos Kozyrakis, Comparing memory systems for chip multiprocessors, Proceedings of the 34th annual international symposium on Computer architecture, June 09-13, 2007, San Diego, California, USA
|
 |
22
|
Kayvon Fatahalian , Daniel Reiter Horn , Timothy J. Knight , Larkhoon Leem , Mike Houston , Ji Young Park , Mattan Erez , Manman Ren , Alex Aiken , William J. Dally , Pat Hanrahan, Sequoia: programming the memory hierarchy, Proceedings of the 2006 ACM/IEEE conference on Supercomputing, November 11-17, 2006, Tampa, Florida
[doi> 10.1145/1188455.1188543]
|
| |
23
|
K. Mahesh et al. Large eddy simulation of reacting turbulent flows in complex geometries. ASME J. of Applied Mechanics, May 2006.
|
| |
24
|
K. Yelick et al. Titanium: A high-performance Java dialect. In ACM Workshop on Java for High-Performance Network Computing, Feb 1998.
|
| |
25
|
U. Kapasi, W. Dally, S. Rixner, J. Owens, and B. Khailany. The Imagine stream processor. In ICCD, Sep 2002.
|
| |
26
|
Francois Labonte , Peter Mattson , William Thies , Ian Buck , Christos Kozyrakis , Mark Horowitz, The Stream Virtual Machine, Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, p.267-277, September 29-October 03, 2004
[doi> 10.1109/PACT.2004.29]
|
| |
27
|
Michael Bedford Taylor , Jason Kim , Jason Miller , David Wentzlaff , Fae Ghodrat , Ben Greenwald , Henry Hoffman , Paul Johnson , Jae-Wook Lee , Walter Lee , Albert Ma , Arvind Saraf , Mark Seneski , Nathan Shnidman , Volker Strumpen , Matt Frank , Saman Amarasinghe , Anant Agarwal, The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs, IEEE Micro, v.22 n.2, p.25-35, March 2002
[doi> 10.1109/MM.2002.997877]
|
 |
28
|
Mattan Erez , Jung Ho Ahn , Jayanth Gummaraju , Mendel Rosenblum , William J. Dally, Executing irregular scientific applications on stream architectures, Proceedings of the 21st annual international conference on Supercomputing, June 17-21, 2007, Seattle, Washington
[doi> 10.1145/1274971.1274987]
|
 |
29
|
Michael I. Gordon , William Thies , Michal Karczmarek , Jasper Lin , Ali S. Meli , Andrew A. Lamb , Chris Leger , Jeremy Wong , Henry Hoffmann , David Maze , Saman Amarasinghe, A stream compiler for communication-exposed architectures, Proceedings of the 10th international conference on Architectural support for programming languages and operating systems, October 05-09, 2002, San Jose, California
|
 |
30
|
Mike Houston , Ji-Young Park , Manman Ren , Timothy Knight , Kayvon Fatahalian , Alex Aiken , William Dally , Pat Hanrahan, A portable runtime interface for multi-level memory hierarchies, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, February 20-23, 2008, Salt Lake City, UT, USA
[doi> 10.1145/1345206.1345229]
|
 |
31
|
Michael Isard , Mihai Budiu , Yuan Yu , Andrew Birrell , Dennis Fetterly, Dryad: distributed data-parallel programs from sequential building blocks, Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, March 21-23, 2007, Lisbon, Portugal
|
| |
32
|
M. D. McCool. Data-parallel programming on Cell BE and the GPU using the Rapidmind development platform. In GSPx Multicore Applications Conference, 2006.
|
 |
33
|
Philippe Charles , Christian Grothoff , Vijay Saraswat , Christopher Donawa , Allan Kielstra , Kemal Ebcioglu , Christoph von Praun , Vivek Sarkar, X10: an object-oriented approach to non-uniform cluster computing, Proceedings of the 20th annual ACM SIGPLAN conference on Object oriented programming, systems, languages, and applications, October 16-20, 2005, San Diego, CA, USA
|
| |
34
|
T. Knight et al. Sequoia: Programming the Memory Hierarchy. In PPoPP, 2007.
|
 |
35
|
|
 |
36
|
|
| |
37
|
|
| |
38
|
Richard Vuduc , James W. Demmel , Katherine A. Yelick , Shoaib Kamil , Rajesh Nishtala , Benjamin Lee, Performance optimizations and bounds for sparse matrix-vector multiply, Proceedings of the 2002 ACM/IEEE conference on Supercomputing, p.1-35, November 16, 2002, Baltimore, Maryland
|
 |
39
|
|
| |
40
|
D. Zhang, Q. Li, R. Rabbah, and S. Amarasinghe. A Lightweight Streaming Layer for Multicore Execution. In Workshop on Design, Architecture, and Simulation of Chip Multiprocessors, Dec 2007.
|
CITED BY 4
|
|
Scott Schneider , Jae-Seung Yeom , Benjamin Rose , John C. Linford , Adrian Sandu , Dimitrios S. Nikolopoulos, A comparison of programming models for multiprocessors with explicitly managed memory hierarchies, Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, February 14-18, 2009, Raleigh, NC, USA
|
|
|
Jacob Leverich , Hideho Arakida , Alex Solomatnikov , Amin Firoozshahian , Mark Horowitz , Christos Kozyrakis, Comparative evaluation of memory models for chip multiprocessors, ACM Transactions on Architecture and Code Optimization (TACO), v.5 n.3, p.1-30, November 2008
|
|
|
Andreas Dahlin , Johan Ersfolk , Guyfu Yang , Haitham Habli , Johan Lilius, The canals language and its compiler, Proceedings of th 12th International Workshop on Software and Compilers for Embedded Systems, April 23-24, 2009, Nice, France
|
|
|
Padmanabhan S. Pillai , Lily B. Mummert , Steven W. Schlosser , Rahul Sukthankar , Casey J. Helfrich, SLIPstream: scalable low-latency interactive perception on streaming data, Proceedings of the 18th international workshop on Network and operating systems support for digital audio and video, June 03-05, 2009, Williamsburg, VA, USA
|
|