ACM Home Page
Please provide us with feedback. Feedback
Algorithmic foundations for a parallel vector access memory system
Full text PdfPdf (221 KB)
Source ACM Symposium on Parallel Algorithms and Architectures archive
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures table of contents
Bar Harbor, Maine, United States
Pages: 156 - 165  
Year of Publication: 2000
ISBN:1-58113-185-2
Authors
Binu K. Mathew  Department of Computer Science, University of Utah, Salt Lake City, UT
Sally A. McKee  Department of Computer Science, University of Utah, Salt Lake City, UT
John B. Carter  Department of Computer Science, University of Utah, Salt Lake City, UT
Al Davis  Department of Computer Science, University of Utah, Salt Lake City, UT
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 28,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/341800.341819
What is a DOI?

ABSTRACT

This paper presents mathematical foundations for the design of a memory controller subcomponent that helps to bridge the processor/memory performance gap for applications with strided access patterns. The Parallel Vector Access (PVA) unit exploits the regularity of vectors or streams to access them efficiently in parallel on a multi-bank SDRAM memory system. The PVA unit performs scatter/gather operations so that only the elements accessed by the application are transmitted across the system bus. Vector operations are broadcast in parallel to all memory banks, each of which implements an efficient algorithm to determine which vector elements it holds. Earlier performance evaluations have demonstrated that our PVA implementation loads elements up to 32.8 times faster than a conventional memory system and 3.3 times faster than a pipelined vector unit, without hurting the performance of normal cache-line fills. Here we present the underlying PVA algorithms for both word interleaved and cache-line inter-leaved memory systems.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
 
4
 
5
 
6
J. Corbal, R. Espasa, and M. Valero. Command vector memory systems: High performance at low cost. Technical Report UPC-DAC- 1999-5, Universitat Politecnica de Catalunya, Jan. 1999.
 
7
Cray Research, Inc. CRAY T3D System Architecture Overview, hr-04033 edition, Sept. 1993.
 
8
9
10
 
11
B. Mathew. Parallel vector access: A technique for improving memory system performance. Master's thesis, University of Utah Department of Computer Science, Jan. 2000.
 
12
B. Mathew, S. McKee, J. Carter, and A. Davis. Design of a parallel vector access unit for sdram memories. In Proceedings of the Sixth Annual Symposium on High Performance Computer Architecture, pages 39-48, Jan. 2000.
13
 
14
15
 
16
Rambus, Inc. RMC2 Data Sheet Advance Information. http://www.rambus.com/developer/downloads/rmc2_ overview.pdf, August 1999.
 
17
18
 
19


Collaborative Colleagues:
Binu K. Mathew: colleagues
Sally A. McKee: colleagues
John B. Carter: colleagues
Al Davis: colleagues