| Pangaea: a tightly-coupled IA32 heterogeneous chip multiprocessor |
| Full text |
Pdf
(495 KB)
|
Source
|
PACT
archive
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
table of contents
Toronto, Ontario, Canada
SESSION: CMP architecture design
table of contents
Pages 52-61
Year of Publication: 2008
ISBN:978-1-60558-282-5
|
|
Authors
|
|
Henry Wong
|
University of British Columbia, Vancouver, BC, Canada
|
|
Anne Bracy
|
Intel Corporation, Santa Clara, CA, USA
|
|
Ethan Schuchman
|
Intel Corporation, Santa Clara, CA, USA
|
|
Tor M. Aamodt
|
University of British Columbia, Vancouver, BC, Canada
|
|
Jamison D. Collins
|
Intel Corporation, Santa Clara, CA, USA
|
|
Perry H. Wang
|
Intel Corporation, Santa Clara, CA, USA
|
|
Gautham Chinya
|
Intel Corporation, Santa Clara, CA, USA
|
|
Ankur Khandelwal Groen
|
Intel Corporation, Santa Clara, CA, USA
|
|
Hong Jiang
|
Intel Corporation, Santa Clara, CA, USA
|
|
Hong Wang
|
Intel Corporation, Santa Clara, CA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 24, Downloads (12 Months): 254, Citation Count: 1
|
|
|
ABSTRACT
Moore's Law and the drive towards performance efficiency have led to the on-chip integration of general-purpose cores with special-purpose accelerators. Pangaea is a heterogeneous CMP design for non-rendering workloads that integrates IA32 CPU cores with non-IA32 GPU-class multi-cores, extending the current state-of-the-art CPU-GPU integration that physically "fuses" existing CPU and GPU designs. Pangaea introduces (1) a resource repartitioning of the GPU, where the hardware budget dedicated for 3D-specific graphics processing is used to build more general-purpose GPU cores, and (2) a 3-instruction extension to the IA32 ISA that supports tighter architectural integration and fine-grain shared memory collaborative multithreading between the IA32 CPU cores and the non-IA32 GPU cores. We implement Pangaea and the current CPU-GPU designs in fully-functional synthesizable RTL based on the production quality RTL of an IA32 CPU and an Intel GMA X4500 GPU. On a 65 nm ASIC process technology, the legacy graphics-specific fixed-function hardware has the area of 9 GPU cores and total power consumption of 5 GPU cores. With the ISA extensions, the latency from the time an IA32 core spawns a GPU thread to the time the thread begins execution is reduced from thousands of cycles to fewer than 30 cycles. Pangaea is synthesized on a FPGA-based prototype and runs off-the-shelf IA32 OSes. A set of general-purpose non-graphics workloads demonstrate speedups of up to 8.8x.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
GPGPU: General Purpose Computation using Graphics Hardware. http://www.gpgpu.org.
|
 |
2
|
Anant Agarwal , Beng-Hong Lim , David Kranz , John Kubiatowicz, APRIL: a processor architecture for multiprocessing, Proceedings of the 17th annual international symposium on Computer Architecture, p.104-114, May 28-31, 1990, Seattle, Washington, United States
|
 |
3
|
|
 |
4
|
|
| |
5
|
|
 |
6
|
Ian Buck , Tim Foley , Daniel Horn , Jeremy Sugerman , Kayvon Fatahalian , Mike Houston , Pat Hanrahan, Brook for GPUs: stream computing on graphics hardware, ACM Transactions on Graphics (TOG), v.23 n.3, August 2004
|
 |
7
|
W. J. Dally , L. Chao , A. Chien , S. Hassoun , W. Horwat , J. Kaplan , P. Song , B. Totty , S. Wills, Architecture of a message-driven processor, Proceedings of the 14th annual international symposium on Computer architecture, p.189-196, June 02-05, 1987, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/30350.30372]
|
| |
8
|
S. Ghiasi. Aide de Camp: Asymmetric Multi-core Design for Dynamic Thermal Management. Technical Report TR-01-43, 2003.
|
| |
9
|
E. Grochowski and M. Annavaram. Energy per Instruction Trends in Intel Microprocessors. Technology@Intel Magazine, March 2006.
|
| |
10
|
|
| |
11
|
|
 |
12
|
Richard A. Hankins , Gautham N. Chinya , Jamison D. Collins , Perry H. Wang , Ryan Rakvic , Hong Wang , John P. Shen, Multiple Instruction Stream Processor, Proceedings of the 33rd annual international symposium on Computer Architecture, p.114-127, June 17-21, 2006
|
 |
13
|
|
 |
14
|
Mark Horowitz , Margaret Martonosi , Todd C. Mowry , Michael D. Smith, Informing memory operations: providing memory performance feedback in modern processors, Proceedings of the 23rd annual international symposium on Computer architecture, p.260-270, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
| |
15
|
Intel. G45 Express Chipset. http://www.intel.com/Assets/PDF/prodbrief/319946.pdf.
|
| |
16
|
Intel. IA Programmers Reference Manual 2008. http://www.intel.com/products/processor/manuals/index.htm.
|
| |
17
|
Intel. Use MONITOR and MWAIT Streaming SIMD Extensions 3 Instructions. http://softwarecommunity.intel.com/Wiki.
|
| |
18
|
J. A. Kahle , M. N. Day , H. P. Hofstee , C. R. Johns , T. R. Maeurer , D. Shippy, Introduction to the cell multiprocessor, IBM Journal of Research and Development, v.49 n.4/5, p.589-604, July 2005
|
| |
19
|
|
 |
20
|
Rakesh Kumar , Dean M. Tullsen , Parthasarathy Ranganathan , Norman P. Jouppi , Keith I. Farkas, Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance, Proceedings of the 31st annual international symposium on Computer architecture, p.64, June 19-23, 2004, München, Germany
|
 |
21
|
|
 |
22
|
J. Kuskin , D. Ofelt , M. Heinrich , J. Heinlein , R. Simoni , K. Gharachorloo , J. Chapin , D. Nakahira , J. Baxter , M. Horowitz , A. Gupta , M. Rosenblum , J. Hennessy, The Stanford FLASH multiprocessor, Proceedings of the 21st annual international symposium on Computer architecture, p.302-313, April 18-21, 1994, Chicago, Illinois, United States
|
 |
23
|
Shih-Lien L. Lu , Peter Yiannacouras , Rolf Kassa , Michael Konow , Taeweon Suh, An FPGA-based Pentium® in a complete desktop system, Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays, February 18-20, 2007, Monterey, California, USA
[doi> 10.1145/1216919.1216927]
|
 |
24
|
Olivier Maquelin , Guang R. Gao , Herbert H. J. Hum , Kevin B. Theobald , Xin-Min Tian, Polling watchdog: combining polling and interrupts for efficient message handling, Proceedings of the 23rd annual international symposium on Computer architecture, p.179-188, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
25
|
|
| |
26
|
Microsoft. A Roadmap for DirectX. http://msdn.microsoft.com/en-us/library/bb756949.aspx.
|
| |
27
|
T. Morad, U. Weiser, and A. Kolodny. ACCMP - Asymmetric Cluster Chip-Multiprocessing. Technical Report 488, CCIT, 2004.
|
| |
28
|
Tomer Y. Morad , Uri C. Weiser , Avinoam Kolodny , Mateo Valero , Eduard Ayguade, Performance, Power Efficiency and Scalability of Asymmetric Cluster Chip Multiprocessors, IEEE Computer Architecture Letters, v.5 n.1, p.4, January 2006
[doi> 10.1109/L-CA.2006.6]
|
 |
29
|
Shubhendu S. Mukherjee , Babak Falsafi , Mark D. Hill , David A. Wood, Coherent network interfaces for fine-grain communication, Proceedings of the 23rd annual international symposium on Computer architecture, p.247-258, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
30
|
|
| |
31
|
Nvidia. Compute Unified Device Architecture (CUDA). http://developer.nvidia.com/object/cuda.html.
|
| |
32
|
J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A. E. Lefohn, and T. J. Purcell. A Survey of General-Purpose Computation on Graphics Hardware. In Eurographics 2005, State of the Art Reports, pages 21--51, Aug. 2005.
|
| |
33
|
Peakstream Inc. The PeakStream Platform: High Productivity Software Development for Multi-core Processors, 2006.
|
 |
34
|
Matt Pharr , Aaron Lefohn , Craig Kolb , Paul Lalonde , Tim Foley , Geoff Berry, Programmable graphics: the future of interactive rendering, ACM SIGGRAPH 2008 classes, August 11-15, 2008, Los Angeles, California
[doi> 10.1145/1401132.1401153]
|
 |
35
|
|
| |
36
|
R. Uhlig, R. Fishtein, O. Gershon, I. Hirsh, and H. Wang. SoftSDV: A Pre-silicon Software Development Environment for the IA-64 Architecture. Intel Technology Journal, (Q4):14, 1999.
|
 |
37
|
Thorsten von Eicken , David E. Culler , Seth Copen Goldstein , Klaus Erik Schauser, Active messages: a mechanism for integrating communication and computation, 25 years of the international symposia on Computer architecture (selected papers), p.430-440, June 27-July 02, 1998, Barcelona, Spain
[doi> 10.1145/285930.286002]
|
 |
38
|
Perry H. Wang , Jamison D. Collins , Gautham N. Chinya , Hong Jiang , Xinmin Tian , Milind Girkar , Nick Y. Yang , Guei-Yuan Lueh , Hong Wang, EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system, Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, June 10-13, 2007, San Diego, California, USA
|
|