|
ABSTRACT
This paper presents a many-core visual computing architecture code named Larrabee, a new software rendering pipeline, a manycore programming model, and performance analysis for several applications. Larrabee uses multiple in-order x86 CPU cores that are augmented by a wide vector processor unit, as well as some fixed function logic blocks. This provides dramatically higher performance per watt and per unit of area than out-of-order CPUs on highly parallel workloads. It also greatly increases the flexibility and programmability of the architecture as compared to standard GPUs. A coherent on-die 2nd level cache allows efficient inter-processor communication and high-bandwidth local data access by CPU cores. Task scheduling is performed entirely with software in Larrabee, rather than in fixed function logic. The customizable software graphics rendering pipeline for this architecture uses binning in order to reduce required memory bandwidth, minimize lock contention, and increase opportunities for parallelism relative to standard GPUs. The Larrabee native programming model supports a variety of highly parallel applications that use irregular data structures. Performance analysis on those applications demonstrates Larrabee's potential for a broad range of parallel computation.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Aila, T., Laine, S. 2004. Alias-Free Shadow Maps. In Proceedings of Eurographics Symposium on Rendering 2004, Eurographics Association. 161--166.
|
| |
3
|
|
| |
4
|
AMD. 2007. Product description web site: ati.amd.com/products/Radeonhd3800/specs.html.
|
| |
5
|
Bader, A., Chhugani, J., Dubey, P., Junkins, S., Morrison T., Ragozin, D., Smelyanskiy. 2008. Game Physics Performance On Larrabee Architecture. Intel whitepaper, available in August, 2008. Web site: techresearch.intel.com.
|
 |
6
|
Louis Bavoil , Steven P. Callahan , Aaron Lefohn , João L. D. Comba , Cláudio T. Silva, Multi-fragment effects on the GPU using the k-buffer, Proceedings of the 2007 symposium on Interactive 3D graphics and games, April 30-May 02, 2007, Seattle, Washington
[doi> 10.1145/1230100.1230117]
|
| |
7
|
Robert D. Blumofe , Christopher F. Joerg , Bradley C. Kuszmaul , Charles E. Leiserson , Keith H. Randall , Yuli Zhou, Cilk: an efficient multithreaded runtime system, Journal of Parallel and Distributed Computing, v.37 n.1, p.55-69, Aug. 25, 1996
[doi> 10.1006/jpdc.1996.0107]
|
 |
8
|
|
| |
9
|
Bookout, D. July, 2007. Shadow Map Aliasing. Web site: www.gamedev.net/reference/articles/article2376.asp.
|
 |
10
|
Ian Buck , Tim Foley , Daniel Horn , Jeremy Sugerman , Kayvon Fatahalian , Mike Houston , Pat Hanrahan, Brook for GPUs: stream computing on graphics hardware, ACM Transactions on Graphics (TOG), v.23 n.3, August 2004
|
| |
11
|
|
| |
12
|
Robit Chandra , Leonardo Dagum , Dave Kohr , Dror Maydan , Jeff McDonald , Ramesh Menon, Parallel programming in OpenMP, Morgan Kaufmann Publishers Inc., San Francisco, CA, 2001
|
 |
13
|
Milton Chen , Gordon Stoll , Homan Igehy , Kekoa Proudfoot , Pat Hanrahan, Simple models of the impact of overlap in bucket rendering, Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware, p.105-112, August 31-September 01, 1998, Lisbon, Portugal
[doi> 10.1145/285305.285318]
|
| |
14
|
Chen, Y., Chhugani, J., Dubey, P., Hughes, C., Kim, D., Kumar, S., Lee, V., Nguyen A., Smelyanskiy, M. 2008. Convergence of Recognition, Mining, and Synthesis Workloads and its Implications. In Procedings of IEEE, v. 96, n. 5, 790--807.
|
| |
15
|
Chuvelev, M., Greer, B., Henry, G., Kuznetsov, S., Burylov, I., Sabanin, B. Nov. 2007. Intel Performance Libraries: Multicore ready Software for Numeric Intensive Computation. Intel Technology Journal, v. 11, i. 4, 1--10.
|
 |
16
|
Jonathan D. Cohen , Ming C. Lin , Dinesh Manocha , Madhav Ponamgi, I-COLLIDE: an interactive and exact collision detection system for large-scale environments, Proceedings of the 1995 symposium on Interactive 3D graphics, p.189-ff., April 09-12, 1995, Monterey, California, United States
[doi> 10.1145/199404.199437]
|
| |
17
|
|
| |
18
|
|
 |
19
|
Henry Fuchs , John Poulton , John Eyles , Trey Greer , Jack Goldfeather , David Ellsworth , Steve Molnar , Greg Turk , Brice Tebbs , Laura Israel, Pixel-planes 5: a heterogeneous multiprocessor graphics system using processor-enhanced memories, Proceedings of the 16th annual conference on Computer graphics and interactive techniques, p.79-88, July 1989
|
| |
20
|
Ghuloum, A., Smith, T., Wu, G., Zhou, X., Fang, J., Guo, P., So, B., Rajagopalan, M., Chen, Y., Chen, B. November 2007. Future-Proof Data Parallel Algorithms and Software on Intel Multi-Core Architectures. Intel Technology Journal, v. 11, i. 04, 333--348.
|
| |
21
|
Gilbert, E., Johnson, D., Keerthi, S. 1988. A fast procedure for computing the distance between complex objects in three-dimensional space. IEEE Journal of Robotics and Automation, 4, 2, 193--203.
|
| |
22
|
GPGPU. 2007. GPGPU web site: www.gpgpu.org.
|
 |
23
|
|
| |
24
|
|
| |
25
|
Gwennap, L. 1995. Intel's P6 Uses Decoupled Superscalar Design. Microprocessor Report. v. 9, n. 2, Feb. 16, 1995.
|
| |
26
|
|
 |
27
|
Christopher J. Hughes , Radek Grzeszczuk , Eftychios Sifakis , Daehyun Kim , Sanjeev Kumar , Andrew P. Selle , Jatin Chhugani , Matthew Holliman , Yen-Kuang Chen, Physical simulation for animation and visual effects: parallelization and characterization for chip multiprocessors, Proceedings of the 34th annual international symposium on Computer architecture, June 09-13, 2007, San Diego, California, USA
|
| |
28
|
IEEE Std. 1003.1, 2004 Edition. Standard for Information Technology - Portable Operating System Interface (POSIX) System Interfaces. The Open Group Technical Standard Base Specifications. Issue 6.
|
| |
29
|
Jacobsen, T. 2001. Advanced Character Physics. Proc. Game Developers Conference 2001, 1--10.
|
 |
30
|
|
 |
31
|
|
 |
32
|
|
| |
33
|
Kessenich, J., Baldwin, D., Rost, R. The OpenGL Shading Language. Version 1.1. Sept. 7, 2006. Web site: www.opengl.org/registry/doc/GLSLangSpec.Full.1.20.8.pdf
|
| |
34
|
Brucek Khailany , William J. Dally , Ujval J. Kapasi , Peter Mattson , Jinyung Namkoong , John D. Owens , Brian Towles , Andrew Chang , Scott Rixner, Imagine: Media Processing with Streams, IEEE Micro, v.21 n.2, p.35-46, March 2001
[doi> 10.1109/40.918001]
|
| |
35
|
|
| |
36
|
Lake, A. 2005. Intel Graphics Media Accelerator Series 900 Developer's Guide. Version 2.0. Web site:download.intel.com/ids/gma/Intel_915G_SDG_Feb05.pdf.
|
| |
37
|
|
 |
38
|
|
| |
39
|
Microsoft. 2007. Microsoft Reference for HLSL. Web site: msdn2.microsoft.com/en-us/library/bb509638.aspx.
|
| |
40
|
|
 |
41
|
|
| |
42
|
Morein, S. 2000. ATI Radeon HyperZ Technology. Presented at Graphics Hardware 2000. Web site:www.graphicshardware.org/previous/www_2000/presentations/ATIHot3D.pdf.
|
 |
43
|
|
| |
44
|
Nvidia. 2008. Product description web site:www.nvidia.com/object/geforce_family.html.
|
| |
45
|
Owens, J., Luebke, D., Govindaraju, N., Harris, M., Kruger, J., Lefohn, A., Purcell, T. 2007. A Survey of General Purpose Computation on Graphics Hardware. Computer Graphics Forum. v.26, n. 1, 80--113.
|
| |
46
|
Pham D., Asano, S., Bolliger, M., Day, M., Hofstee, H., Johns., C., Kahle, J., Kameyama, A., Keaty, J., Masubuchi, Y., Riley, M., Shippy, D., Stasiask, D., Suzuodi, M., Wang, M., Warnock, J., Weitzel, S., Wendel, D., Yamazaki, T., Yazawa, K. 2005. The Design and Implementation of a First Generation CELL Processor. IEEE International Solid-State Circuits Conference. 184--186.
|
| |
47
|
Pharr, M. 2006. Interactive Rendering in the Post-GPU Era. Presented at Graphics Hardware 2006. Web site:www.pharr.org/matt/.
|
 |
48
|
|
| |
49
|
Power VR. 2008. Web site:www.imgtec.com/powervr/products/Graphics/index.asp.
|
| |
50
|
|
| |
51
|
|
 |
52
|
|
| |
53
|
|
| |
54
|
Shevtsov, M., Soupikov, A., Kapustin, A. 2007. Ray-Triangle Intersection Algorithm for Modern CPU Architectures. In Proceedings of GraphiCon 2007, 33--39.
|
| |
55
|
Stevens, A. 2006. ARM Mali 3D Graphics System Solution. Web site:www.arm.com/miscPDFs/16514.pdf.
|
 |
56
|
Gordon Stoll , Matthew Eldridge , Dan Patterson , Art Webb , Steven Berman , Richard Levy , Chris Caywood , Milton Taveira , Stephen Hunt , Pat Hanrahan, Lightning-2: a high-performance display subsystem for PC clusters, Proceedings of the 28th annual conference on Computer graphics and interactive techniques, p.141-148, August 2001
[doi> 10.1145/383259.383273]
|
 |
57
|
|
 |
58
|
|
CITED BY 29
|
|
|
|
|
|
|
|
|
|
|
Jatin Chhugani , Anthony D. Nguyen , Victor W. Lee , William Macy , Mostafa Hagog , Yen-Kuang Chen , Akram Baransi , Sanjeev Kumar , Pradeep Dubey, Efficient implementation of sorting on multi-core SIMD CPU architecture, Proceedings of the VLDB Endowment, v.1 n.2, August 2008
|
|
|
|
|
|
Roger Ferrer , Marc González , Federico Silla , Xavier Martorell , Eduard Ayguadé, Evaluation of memory performance on the cell BE with the SARC programming model, Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture, p.77-84, October 26-26, 2008, Toronto, Canada
|
|
|
|
|
|
Bratin Saha , Xiaocheng Zhou , Hu Chen , Ying Gao , Shoumeng Yan , Mohan Rajagopalan , Jesse Fang , Peinan Zhang , Ronny Ronen , Avi Mendelson, Programming model for a heterogeneous x86 platform, ACM SIGPLAN Notices, v.44 n.6, June 2009
|
|
|
|
|
|
|
|
|
Gregory S. Johnson , Warren A. Hunt , Allen Hux , William R. Mark , Christopher A. Burns , Stephen Junkins, Soft irregular shadow mapping: fast, high-quality, and robust soft shadows, Proceedings of the 2009 symposium on Interactive 3D graphics and games, February 27-March 01, 2009, Boston, Massachusetts
|
|
|
Vishakha Gupta , Ada Gavrilovska , Karsten Schwan , Harshvardhan Kharche , Niraj Tolia , Vanish Talwar , Parthasarathy Ranganathan, GViM: GPU-accelerated virtual machines, Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing, p.17-24, March 31-31, 2009, Nuremburg, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bongjun Jin , Insung Ihm , Byungjoon Chang , Chanmin Park , Wonjong Lee , Seokyoon Jung, Selective and adaptive supersampling for real-time ray tracing, Proceedings of the Conference on High Performance Graphics 2009, August 01-03, 2009, New Orleans, Louisiana
|
|
|
Kayvon Fatahalian , Edward Luong , Solomon Boulos , Kurt Akeley , William R. Mark , Pat Hanrahan, Data-parallel rasterization of micropolygons with defocus and motion blur, Proceedings of the Conference on High Performance Graphics 2009, August 01-03, 2009, New Orleans, Louisiana
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
John H. Kelm , Daniel R. Johnson , Matthew R. Johnson , Neal C. Crago , William Tuohy , Aqeel Mahesri , Steven S. Lumetta , Matthew I. Frank , Sanjay J. Patel, Rigel: an architecture and scalable programming interface for a 1000-core accelerator, ACM SIGARCH Computer Architecture News, v.37 n.3, June 2009
|
|
|
|
|
|
|
INDEX TERMS
Primary Classification:
I.
Computing Methodologies
I.3
COMPUTER GRAPHICS
I.3.1
Hardware architecture
Subjects:
Graphics processors
Additional Classification:
I.
Computing Methodologies
I.3
COMPUTER GRAPHICS
I.3.1
Hardware architecture
Subjects:
Parallel processing
I.3.3
Picture/Image Generation
Subjects:
Display algorithms
I.3.7
Three-Dimensional Graphics and Realism
Subjects:
Color, shading, shadowing, and texture
Keywords:
GPGPU,
SIMD,
graphics architecture,
many-core computing,
parallel processing,
realtime graphics,
software rendering,
throughput computing,
visual computing
REVIEW
"Hector Yee : Reviewer"
In the early years of computer graphics, software renderers were very popular on the personal computer. These renderers have been recently supplanted by graphics processing units (GPUs), which first took over fixed-function operations such as tria
more...
|