ACM Home Page
Please provide us with feedback. Feedback
CUBA: an architecture for efficient CPU/co-processor data communication
Full text PdfPdf (393 KB)
Source
International Conference on Supercomputing archive
Proceedings of the 22nd annual international conference on Supercomputing table of contents
Island of Kos, Greece
SESSION: Memory management table of contents
Pages 299-308  
Year of Publication: 2008
ISBN:978-1-60558-158-3
Authors
Isaac Gelado  Universitat Politecnica de Catalunya, Barcelona, Spain
John H. Kelm  University of Illinois at Urbana-Champaign, Urbana, IL, USA
Shane Ryoo  University of Illinois at Urbana-Champaign, Urbana, IL, USA
Steven S. Lumetta  University of Illinois at Urbana-Champaign, Urbana, IL, USA
Nacho Navarro  Universität Politecnica de Catalunya, Barcelona, Spain
Wen-mei W. Hwu  University of Illinois at Urbana-Champaign, Urbana, IL, USA
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 22,   Downloads (12 Months): 216,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1375527.1375571
What is a DOI?

ABSTRACT

Data-parallel co-processors have the potential to improve performance in highly parallel regions of code when coupled to a general-purpose CPU. However, applications often have to be modified in non-intuitive and complicated ways to mitigate the cost of data marshalling between the CPU and the co-processor. In some applications the overheads cannot be amortized and co-processors are unable to provide benefit. The additional effort and complexity of incorporating co-processors makes it difficult, if not impossible, to effectively utilize co-processors in large applications.

This paper presents CUBA, an architecture model where co-processors encapsulated as function calls can efficiently access their input and output data structures through pointer parameters. The key idea is to map the data structures required by the co-processor to the co-processor local memory as opposed to the CPU's main memory. The mapping in CUBA preserves the original layout of the shared data structures hosted in the co-processor local memory. The mapping renders the data marshalling process unnecessary and reduces the need for code changes in order to use the co-processors. CUBA allows the CPU to cache hosted data structures with a selective write-through cache policy, allowing the CPU to access hosted data structures while supporting efficient communication with the co-processors. Benchmark simulation results show that a CUBA-based system can approach optimal transfer rates while requiring few changes to the code that executes on the CPU.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
AMD Staff. AMD64 Architecture Programmer's Manual. AMD Corporation, Sept. 2006.
 
2
D. Anderson. Hyper-Transport System Architecture. Addison-Wesley Professional, 2003.
 
3
R. Enzler, M. Platzer, C. Plessl, L. Thiele, and G. Troester. Reconfigurable processors for handhelds and wearables: Application analysis. In Reconfigurable Technology, pages 135146, Denver, CO, USA, Aug. 2001.
 
4
M. Fahey, S. Alam, T. Dunigan Jr, J. Vetter, and P. Worley. Early Evaluation of the Cray XD1. Cray User Group Conference, 2005.
 
5
6
7
 
8
 
9
 
10
M. Hummel, M. Krause, and D. O'Flaherty. AMD and HP: Protocol enhacements for tightly coupled accelerators. Press Release, 2007.
 
11
Intel Staff. Intel 64 and IA-32 Architectures Software Developer's Manuals. Intel, May 2007.
12
 
13
 
14
15
 
16
MIPS Staff. MIPS32 Architecture for Programmers. MIPS Technologies, Mar. 2001.
 
17
J. Renau, B. Fragela, J. Tuck, W. Liu, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos. SESC simulator. http://sesc.sourceforge.net, Jan. 2005.
18
 
19
 
20
 
21
Xilinx. Virtex-II Pro and Virtex-II Pro X Plaform FPGAs: Complete Data Sheet, Oct. 2005.


Collaborative Colleagues:
Isaac Gelado: colleagues
John H. Kelm: colleagues
Shane Ryoo: colleagues
Steven S. Lumetta: colleagues
Nacho Navarro: colleagues
Wen-mei W. Hwu: colleagues