| Algorithmic foundations for a parallel vector access memory system |
| Full text |
Pdf
(221 KB)
|
| Source
|
ACM Symposium on Parallel Algorithms and Architectures
archive
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
table of contents
Bar Harbor, Maine, United States
Pages: 156 - 165
Year of Publication: 2000
ISBN:1-58113-185-2
|
|
Authors
|
|
Binu K. Mathew
|
Department of Computer Science, University of Utah, Salt Lake City, UT
|
|
Sally A. McKee
|
Department of Computer Science, University of Utah, Salt Lake City, UT
|
|
John B. Carter
|
Department of Computer Science, University of Utah, Salt Lake City, UT
|
|
Al Davis
|
Department of Computer Science, University of Utah, Salt Lake City, UT
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 5, Downloads (12 Months): 28, Citation Count: 3
|
|
|
ABSTRACT
This paper presents mathematical foundations for the design of a memory controller subcomponent that helps to bridge the processor/memory performance gap for applications with strided access patterns. The Parallel Vector Access (PVA) unit exploits the regularity of vectors or streams to access them efficiently in parallel on a multi-bank SDRAM memory system. The PVA unit performs scatter/gather operations so that only the elements accessed by the application are transmitted across the system bus. Vector operations are broadcast in parallel to all memory banks, each of which implements an efficient algorithm to determine which vector elements it holds. Earlier performance evaluations have demonstrated that our PVA implementation loads elements up to 32.8 times faster than a conventional memory system and 3.3 times faster than a pipelined vector unit, without hurting the performance of normal cache-line fills. Here we present the underlying PVA algorithms for both word interleaved and cache-line inter-leaved memory systems.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Michael Bekerman , Stephan Jourdan , Ronny Ronen , Gilad Kirshenboim , Lihu Rappoport , Adi Yoaz , Uri Weiser, Correlated load-address predictors, Proceedings of the 26th annual international symposium on Computer architecture, p.54-63, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
3
|
J. Carter , W. Hsieh , L. Stoller , M. Swanson , L. Zhang , E. Brunvand , A. Davis , C.-C. Kuo , R. Kuramkote , M. Parker , L. Schaelicke , T. Tateyama, Impulse: Building a Smarter Memory Controller, Proceedings of the 5th International Symposium on High Performance Computer Architecture, p.70, January 09-12, 1999
|
| |
4
|
Siddhartha Chatterjee , John R. Gilbert , Fred J. E. Long , Robert Schreiber , Shang-Hua Teng, Generating local addresses and communication sets for data-parallel programs, Journal of Parallel and Distributed Computing, v.26 n.1, p.72-84, April 1, 1995
[doi> 10.1006/jpdc.1995.1049]
|
| |
5
|
|
| |
6
|
J. Corbal, R. Espasa, and M. Valero. Command vector memory systems: High performance at low cost. Technical Report UPC-DAC- 1999-5, Universitat Politecnica de Catalunya, Jan. 1999.
|
| |
7
|
Cray Research, Inc. CRAY T3D System Architecture Overview, hr-04033 edition, Sept. 1993.
|
| |
8
|
|
 |
9
|
|
 |
10
|
|
| |
11
|
B. Mathew. Parallel vector access: A technique for improving memory system performance. Master's thesis, University of Utah Department of Computer Science, Jan. 2000.
|
| |
12
|
B. Mathew, S. McKee, J. Carter, and A. Davis. Design of a parallel vector access unit for sdram memories. In Proceedings of the Sixth Annual Symposium on High Performance Computer Architecture, pages 39-48, Jan. 2000.
|
 |
13
|
Sally A. McKee , Assaji Aluwihare , Benjamin H. Clark , Robert H. Klenke , Trevor C. Landon , Christopher W. Oliver , Maximo H. Salinas , Adam E. Szymkowiak , Kenneth L. Wright , Wm. A. Wulf , James H. Aylor, Design and evaluation of dynamic access ordering hardware, Proceedings of the 10th international conference on Supercomputing, p.125-132, May 25-28, 1996, Philadelphia, Pennsylvania, United States
[doi> 10.1145/237578.237594]
|
| |
14
|
|
 |
15
|
Montse Peiron , Mateo Valero , Eduard Ayguadé , Tomás Lang, Vector multiprocessors with arbitrated memory access, Proceedings of the 22nd annual international symposium on Computer architecture, p.243-252, June 22-24, 1995, S. Margherita Ligure, Italy
|
| |
16
|
Rambus, Inc. RMC2 Data Sheet Advance Information. http://www.rambus.com/developer/downloads/rmc2_ overview.pdf, August 1999.
|
| |
17
|
|
 |
18
|
Mateo Valero , Tomás Lang , José M. Llabería , Montse Peiron , Eduard Ayguadé , Juan J. Navarra, Increasing the number of strides for conflict-free vector access, Proceedings of the 19th annual international symposium on Computer architecture, p.372-381, May 19-21, 1992, Queensland, Australia
|
| |
19
|
|
|