ACM Home Page
Please provide us with feedback. Feedback
Efficient computation of sum-products on GPUs through software-managed cache
Full text PdfPdf (310 KB)
Source
International Conference on Supercomputing archive
Proceedings of the 22nd annual international conference on Supercomputing table of contents
Island of Kos, Greece
SESSION: Memory management table of contents
Pages 309-318  
Year of Publication: 2008
ISBN:978-1-60558-158-3
Authors
Mark Silberstein  Technion - Israel Institute of Technology, Haifa, Israel
Assaf Schuster  Technion - Israel Institute of Technology, Haifa, Israel
Dan Geiger  Technion - Israel Institute of Technology, Haifa, Israel
Anjul Patney  University of California, Davis, CA, USA
John D. Owens  University of California, Davis, CA, USA
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 30,   Downloads (12 Months): 338,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1375527.1375572
What is a DOI?

ABSTRACT

We present a technique for designing memory-bound algorithms with high data reuse on Graphics Processing Units (GPUs) equipped with close-to-ALU software-managed memory. The approach is based on the efficient use of this memory through the implementation of a software-managed cache. We also present an analytical model for performance analysis of such algorithms.

We apply this technique to the implementation of the GPU-based solver of the sum-product or marginalize a product of functions (MPF) problem, which arises in a wide variety of real-life applications in artificial intelligence, statistics, image processing, and digital communications. Our motivation to accelerate MPF originated in the context of the analysis of genetic diseases, which in some cases requires years to complete on modern CPUs. Computing MPF is similar to computing the chain matrix product of multi-dimensional matrices, but is more difficult due to a complex data-dependent access pattern, high data reuse, and a low compute-to-memory access ratio. Our GPU-based MPF solver achieves up to 2700-fold speedup on random data and 270-fold on real-life genetic analysis datasets on GeForce 8800GTX GPU from NVIDIA over the optimized CPU version on an Intel 2.4GHz Core 2 with a 4MB L2 cache.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Balart, M. Gonzalez, X. Martorell, E. Ayguade, Z. Sura, T. Chen, T. Zhang, K. O'brien, and K. O'brien. A Novel Asynchronous Software Cache Implementation for the Cell-BE Processor. In LCPC '07: Proceedings of the 2007 Workshop on Languages and Compilers for Parallel Computing, 2007.
 
2
C. Benthin, I. Wald, M. Scherbaum, and H. Friedrich. Ray Tracing on the Cell Processor. IEEE Symposium on Interactive Ray Tracing 2006, pages 15--23, Sept. 2006.
3
 
4
5
6
 
7
M. Fishelson and D. Geiger. Exact genetic linkage computations for general pedigrees. Bioinformatics, 18(Suppl. 1):S189--S198, 2002.
8
 
9
IBM Corporation. Cell Broadband Engine Architecture. http://www.ibm.com/techlib/techlib.nsf/techdocs.
10
 
11
J. Kurzak, W. Alvaro, and J. Dongarra. Fast and small short vector SIMD matrix multiplication kernel for the synergistic processing element of the CELL processor. Technical Report LAPACK Working Note 189, University of Tennessee, 2007.
 
12
NVIDIA Corporation. NVIDIA CUDA compute unified device architecture programming guide. http://developer.nvidia.com/cuda, Jan. 2007.
 
13
J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A. E. Lefohn, and T. J. Purcell. A survey of general-purpose computation on graphics hardware. Computer Graphics Forum, 26(1):80--113, 2007.
 
14
P. Pakzad and V. Anantharam. A new look at the generalized distributive law. IEEE Transactions on Information Theory, 50(6):1132--1155, June 2004.
15
 
16
R. Whaley, A. Petitet, and J. Dongarra. Automated empirical optimizations of software and the ATLAS project. Parallel Computing, 27:3--35, 2001.


Collaborative Colleagues:
Mark Silberstein: colleagues
Assaf Schuster: colleagues
Dan Geiger: colleagues
Anjul Patney: colleagues
John D. Owens: colleagues