|
ABSTRACT
In this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming co-processor. We present a compiler and runtime system that abstracts and virtualizes many aspects of graphics hardware. In addition, we present an analysis of the effectiveness of the GPU as a compute engine compared to the CPU, to determine when the GPU can outperform the CPU for a particular algorithm. We evaluate our system with five applications, the SAXPY and SGEMV BLAS operators, image segmentation, FFT, and ray tracing. For these applications, we demonstrate that our Brook implementations perform comparably to hand-written GPU code and up to seven times faster than their CPU counterparts.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
ATI, 2004. Hardware image processing using ARB_fragment_program. http://www.ati.com/developer/sdk/RadeonSDK/Html/Samples/OpenGL/HW_Image_Processing.html.
|
| |
2
|
ATI, 2004. Radeon X800 product site. http://www.ati.com/products/radeonx800.
|
 |
3
|
|
| |
4
|
BOVE, V., AND WATLINGTON, J. 1995. Cheops: A reconfigurable data-flow system for video processing. IEEE Trans. on Circuits and Systems for Video Technology (April), 140--149.
|
| |
5
|
BROOK, 2004. Brook project web page. http://brook.sourceforge.net.
|
| |
6
|
BUCK, I. 2004. Brook specification v.0.2. Tech. Rep. CSTR 2003-04 10/31/03 12/5/03, Stanford University.
|
| |
7
|
|
| |
8
|
Eric Chan , Ren Ng , Pradeep Sen , Kekoa Proudfoot , Pat Hanrahan, Efficient partitioning of fragment shaders for multipass rendering on programmable graphics hardware, Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, September 01-02, 2002, Saarbrucken, Germany
|
| |
9
|
COOLEY, J. W., AND TUKEY, J. W. 1965. An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation 19 (April), 297--301.
|
| |
10
|
William J. Dally , Francois Labonte , Abhishek Das , Patrick Hanrahan , Jung-Ho Ahn , Jayanth Gummaraju , Mattan Erez , Nuwan Jayasena , Ian Buck , Timothy J. Knight , Ujval J. Kapasi, Merrimac: Supercomputing with Streams, Proceedings of the 2003 ACM/IEEE conference on Supercomputing, p.35, November 15-21, 2003
|
| |
11
|
|
| |
12
|
|
| |
13
|
FLISAKOWSKI, S., 2004. cTool library. http://ctool.sourceforge.net.
|
| |
14
|
FRIGO, M., AND JOHNSON, S. G., 2003. benchFFT home page. http://www.fftw.org/benchfft.
|
 |
15
|
Henry Fuchs , John Poulton , John Eyles , Trey Greer , Jack Goldfeather , David Ellsworth , Steve Molnar , Greg Turk , Brice Tebbs , Laura Israel, Pixel-planes 5: a heterogeneous multiprocessor graphics system using processor-enhanced memories, Proceedings of the 16th annual conference on Computer graphics and interactive techniques, p.79-88, July 1989
[doi> 10.1145/74333.74341]
|
| |
16
|
|
| |
17
|
|
| |
18
|
INTEL, 2003. Intel software development products. http://www.intel.com/software/products/compilers.
|
| |
19
|
INTEL, 2004. Intel math kernel library. http://www.intel.com/software/products/mkl.
|
| |
20
|
|
| |
21
|
KESSENICH, J., BALDWIN, D., AND ROST, R., 2003. The OpenGL Shading Language. http://www.opengl.org/documentation/oglsl.html.
|
| |
22
|
Brucek Khailany , William J. Dally , Ujval J. Kapasi , Peter Mattson , Jinyung Namkoong , John D. Owens , Brian Towles , Andrew Chang , Scott Rixner, Imagine: Media Processing with Streams, IEEE Micro, v.21 n.2, p.35-46, March 2001
[doi> 10.1109/40.918001]
|
| |
23
|
|
 |
24
|
|
| |
25
|
LABONTE, F., HOROWITZ, M., AND BUCK, I., 2004. An evaluation of graphics processors as stream co-processors. Unpublished.
|
 |
26
|
|
 |
27
|
|
 |
28
|
|
| |
29
|
|
| |
30
|
|
 |
31
|
|
| |
32
|
MICROSOFT, 2003. High-level shader language. http://msdn.microsoft.com/library/default.asp?url=/library/enus/directx9_c/directx/graphics/reference/Shaders/HighLevelShaderLanguage.asp.
|
 |
33
|
|
| |
34
|
|
| |
35
|
NVIDIA, 2004. GeForce 6800: Product overview. http://nvidia.com/page/geforce_6800.html.
|
 |
36
|
John D. Owens , William J. Dally , Ujval J. Kapasi , Scott Rixner , Peter Mattson , Ben Mowery, Polygon rendering on a stream architecture, Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware, p.23-32, August 21-22, 2000, Interlaken, Switzerland
[doi> 10.1145/346876.346883]
|
| |
37
|
|
| |
38
|
PERCY, J., 2003. OpenGL Extensions. http://mirror.ati.com/developer/SIGGRAPH03/Percy_OpenGL_Extensions_SIG03.pdf.
|
| |
39
|
|
 |
40
|
|
 |
41
|
|
 |
42
|
Karthikeyan Sankaralingam , Ramadass Nagarajan , Haiming Liu , Changkyu Kim , Jaehyuk Huh , Doug Burger , Stephen W. Keckler , Charles R. Moore, Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture, Proceedings of the 30th annual international symposium on Computer architecture, June 09-11, 2003, San Diego, California
[doi> 10.1145/859618.859667]
|
| |
43
|
|
| |
44
|
SULLIVAN, W., WERTHIMER, D., BOWYER, S., COBB, J., GEDYE, D., AND ANDERSON, D. 1997. A new major SETI project based on Project Serendip data and 100,000 personal computers. In Astronomical and Biochemical Origins and the Search for Life in the Universe, Proceedings of the Fifth International Conference on Bioastronomy, Editrice Compositori, C. Cosmovici, S. Bowyer, and D. Wertheimer, Eds.
|
| |
45
|
Michael Bedford Taylor , Jason Kim , Jason Miller , David Wentzlaff , Fae Ghodrat , Ben Greenwald , Henry Hoffman , Paul Johnson , Jae-Wook Lee , Walter Lee , Albert Ma , Arvind Saraf , Mark Seneski , Nathan Shnidman , Volker Strumpen , Matt Frank , Saman Amarasinghe , Anant Agarwal, The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs, IEEE Micro, v.22 n.2, p.25-35, March 2002
[doi> 10.1109/MM.2002.997877]
|
| |
46
|
|
| |
47
|
WALD, I. 2004. Realtime Ray Tracing and Interactive Global Illumination. PhD thesis, Saarland University.
|
| |
48
|
WHALEY, R. C., PETITET, A., AND DONGARRA, J. J. 2001. Automated empirical optimizations of software and the ATLAS project. Parallel Computing 27, 1--2, 3--35.
|
| |
49
|
WOO, M., NEIDER, J., DAVIS, T., SHREINER, D., AND OPENGL ARCHITECTURE REVIEW BOARD, 1999. OpenGL programming guide.
|
CITED BY 122
|
|
Aaron E. Lefohn , Shubhabrata Sengupta , Joe Kniss , Robert Strzodka , John D. Owens, Glift: Generic, efficient, random-access GPU data structures, ACM Transactions on Graphics (TOG), v.25 n.1, p.60-99, January 2006
|
|
|
|
|
|
Jiawen Chen , Michael I. Gordon , William Thies , Matthias Zwicker , Kari Pulli , Frédo Durand, A reconfigurable architecture for load-balanced rendering, Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, July 30-31, 2005, Los Angeles, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Christopher I. Rodrigues , David J. Hardy , John E. Stone , Klaus Schulten , Wen-Mei W. Hwu, GPU acceleration of cutoff pair potentials for molecular modeling applications, Proceedings of the 2008 conference on Computing frontiers, May 05-07, 2008, Ischia, Italy
|
|
|
|
|
|
|
|
|
Jung Ho Ahn , Mattan Erez , William J. Dally, Tradeoff between data-, instruction-, and thread-level parallelism in stream processors, Proceedings of the 21st annual international conference on Supercomputing, June 17-21, 2007, Seattle, Washington
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kayvon Fatahalian , Daniel Reiter Horn , Timothy J. Knight , Larkhoon Leem , Mike Houston , Ji Young Park , Mattan Erez , Manman Ren , Alex Aiken , William J. Dally , Pat Hanrahan, Memory---Sequoia: programming the memory hierarchy, Proceedings of the 2006 ACM/IEEE conference on Supercomputing, November 11-17, 2006, Tampa, Florida
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Dominik Göddeke , Robert Strzodka , Jamaludin Mohd-Yusof , Patrick McCormick , Sven H. M. Buijssen , Matthias Grajewski , Stefan Turek, Exploring weak scalability for FEM calculations on a GPU-enhanced cluster, Parallel Computing, v.33 n.10-11, p.685-699, November, 2007
|
|
|
Muthu Manikandan Baskaran , Uday Bondhugula , Sriram Krishnamoorthy , J. Ramanujam , Atanas Rountev , P. Sadayappan, A compiler framework for optimization of affine loop nests for gpgpus, Proceedings of the 22nd annual international conference on Supercomputing, June 07-12, 2008, Island of Kos, Greece
|
|
|
Brent Cowan , Bill Kapralos, Spatial sound for video games and virtual environments utilizing real-time GPU-based convolution, Proceedings of the 2008 Conference on Future Play: Research, Play, Share, November 03-05, 2008, Toronto, Ontario, Canada
|
|
|
|
|
|
Perry H. Wang , Jamison D. Collins , Gautham N. Chinya , Hong Jiang , Xinmin Tian , Milind Girkar , Nick Y. Yang , Guei-Yuan Lueh , Hong Wang, EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system, ACM SIGPLAN Notices, v.42 n.6, June 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mike Houston , Ji-Young Park , Manman Ren , Timothy Knight , Kayvon Fatahalian , Alex Aiken , William Dally , Pat Hanrahan, A portable runtime interface for multi-level memory hierarchies, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, February 20-23, 2008, Salt Lake City, UT, USA
|
|
|
Patrick McCormick , Jeff Inman , James Ahrens , Jamaludin Mohd-Yusof , Greg Roth , Sharen Cummins, Scout: a data-parallel programming language for graphics processors, Parallel Computing, v.33 n.10-11, p.648-662, November, 2007
|
|
|
|
|
|
Dominik Goddeke , Robert Strzodka , Jamaludin Mohd-Yusof , Patrick McCormick , Hilmar Wobker , Christian Becker , Stefan Turek, Using GPUs to improve multigrid solver performance on a cluster, International Journal of Computational Science and Engineering, v.4 n.1, p.36-55, November 2008
|
|
|
|
|
|
Bingsheng He , Ke Yang , Rui Fang , Mian Lu , Naga Govindaraju , Qiong Luo , Pedro Sander, Relational joins on graphics processors, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, June 09-12, 2008, Vancouver, Canada
|
|
|
|
|
|
Matthew Fluet , Nic Ford , Mike Rainey , John Reppy , Adam Shaw , Yingqi Xiao, Status report: the manticore project, Proceedings of the 2007 workshop on Workshop on ML, October 05-05, 2007, Freiburg, Germany
|
|
|
|
|
|
|
|
|
Roger D. Chamberlain , Mark A. Franklin , Eric J. Tyson , Jeremy Buhler , Saurabh Gayen , Patrick Crowley , James H. Buckley, Application development on hybrid systems, Proceedings of the 2007 ACM/IEEE conference on Supercomputing, November 10-16, 2007, Reno, Nevada
|
|
|
|
|
|
|
|
|
Xuejun Yang , Ying Zhang , Jingling Xue , Ian Rogers , Gen Li , Guibin Wang, Exploiting loop-dependent stream reuse for stream processors, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
|
|
|
Samer Al-Kiswany , Abdullah Gharaibeh , Elizeu Santos-Neto , George Yuan , Matei Ripeanu, StoreGPU: exploiting graphics processing units to accelerate distributed storage systems, Proceedings of the 17th international symposium on High performance distributed computing, June 23-27, 2008, Boston, MA, USA
|
|
|
|
|
|
|
|
|
|
|
|
Shuai Che , Michael Boyer , Jiayuan Meng , David Tarjan , Jeremy W. Sheaffer , Kevin Skadron, A performance study of general-purpose applications on graphics processors using CUDA, Journal of Parallel and Distributed Computing, v.68 n.10, p.1370-1380, October, 2008
|
|
|
|
|
|
|
|
|
Manman Ren , Ji Young Park , Mike Houston , Alex Aiken , William J. Dally, A tuning framework for software-managed memory hierarchies, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
Dinesh Manocha , Paul Calamia , Ming C. Lin , Dinesh Manocha , Lauri Savioja , Nicolas Tsingos, Interactive sound rendering, ACM SIGGRAPH 2009 Courses, p.1-338, August 03-07, 2009, New Orleans, Louisiana
|
|
|
Larry Seiler , Doug Carmean , Eric Sprangle , Tom Forsyth , Michael Abrash , Pradeep Dubey , Stephen Junkins , Adam Lake , Jeremy Sugerman , Robert Cavin , Roger Espasa , Ed Grochowski , Toni Juan , Pat Hanrahan, Larrabee: a many-core x86 architecture for visual computing, ACM Transactions on Graphics (TOG), v.27 n.3, August 2008
|
|
|
|
|
|
|
|
|
Amir Hormati , Manjunath Kudlur , Scott Mahlke , David Bacon , Rodric Rabbah, Optimus: efficient realization of streaming applications on FPGAs, Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems, October 19-24, 2008, Atlanta, GA, USA
|
|
|
Bingsheng He , Wenbin Fang , Qiong Luo , Naga K. Govindaraju , Tuyong Wang, Mars: a MapReduce framework on graphics processors, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
Henry Wong , Anne Bracy , Ethan Schuchman , Tor M. Aamodt , Jamison D. Collins , Perry H. Wang , Gautham Chinya , Ankur Khandelwal Groen , Hong Jiang , Hong Wang, Pangaea: a tightly-coupled IA32 heterogeneous chip multiprocessor, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
Jay L.T. Cornwall , Lee Howes , Paul H.J. Kelly , Phil Parsonage , Bruno Nicoletti, High-performance SIMT code generation in an active visual effects library, Proceedings of the 6th ACM conference on Computing frontiers, May 18-20, 2009, Ischia, Italy
|
|
|
|
|
|
|
|
|
|
|
|
Byunghyun Jang , Synho Do , Homer Pien , David Kaeli, Architecture-aware optimization targeting multithreaded stream computing, Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, p.62-70, March 08-08, 2009, Washington, D.C.
|
|
|
|
|
|
Jeremy S. Meredith , Gonzalo Alvarez , Thomas A. Maier , Thomas C. Schulthess , Jeffrey S. Vetter, Accuracy and performance of graphics processors: A Quantum Monte Carlo application case study, Parallel Computing, v.35 n.3, p.151-163, March, 2009
|
|
|
|
|
|
|
|
|
|
|
|
Andreas Dahlin , Johan Ersfolk , Guyfu Yang , Haitham Habli , Johan Lilius, The canals language and its compiler, Proceedings of th 12th International Workshop on Software and Compilers for Embedded Systems, April 23-24, 2009, Nice, France
|
|
|
|
|
|
Padmanabhan S. Pillai , Lily B. Mummert , Steven W. Schlosser , Rahul Sukthankar , Casey J. Helfrich, SLIPstream: scalable low-latency interactive perception on streaming data, Proceedings of the 18th international workshop on Network and operating systems support for digital audio and video, June 03-05, 2009, Williamsburg, VA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hui Liu , Zili Shao , Meng Wang , Junzhao Du , Chun Jason Xue , Zhiping Jia, Combining Coarse-Grained Software Pipelining with DVS for Scheduling Real-Time Periodic Dependent Tasks on Multi-Core Embedded Systems, Journal of Signal Processing Systems, v.57 n.2, p.249-262, November 2009
|
|
|
|
|
|
|
|
|
|
|
|
Nan Wu , Mei Wen , Wei Wu , Ju Ren , Huayou Su , Changqing Xun , Chunyuan Zhang, Streaming HD H.264 encoder on programmable processors, Proceedings of the seventeen ACM international conference on Multimedia, October 19-24, 2009, Beijing, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Dan Fay , Li Shang , Dirk Grunwald, A platform for developing adaptable multicore applications, Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems, October 11-16, 2009, Grenoble, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bingsheng He , Mian Lu , Ke Yang , Rui Fang , Naga K. Govindaraju , Qiong Luo , Pedro V. Sander, Relational query coprocessing on graphics processors, ACM Transactions on Database Systems (TODS), v.34 n.4, p.1-39, December 2009
|
|
|
|
|
|
|
|