|
ABSTRACT
Vector processors have good performance, cost and adaptability when targeting multimedia applications. However, for a significant number of media programs, conventional memory configurations fail to deliver enough memory references per cycle to feed the SIMD functional units. This paper addresses the problem of the memory bandwidth.We propose a novel mechanism suitable for 2-dimensional vector architectures and targeted at providing high effective bandwidth for SIMD memory instructions. The basis of this mechanism is the extension of the scope of vectorization at the memory level, so that 3-dimensional memory patterns can be fetched into a second-level register file.By fetching long blocks of data and by reusing 2-dimensional memory streams at this second-level register file, we obtain a significant increase in the effective memory bandwidth. As side benefits, the new 3-dimensional load instructions provide a high robustness to memory latency and a significant reduction of the cache activity, thus reducing power and energy requirements. At the investment of a 50% more area than a regular SIMD register file, we have measured and average speed-up of 13% and the potential for power savings in the L2 cache of a 30%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
Pentium iii processor: Developer's manual. Technical Report http://developer.intel.com/design/PentiumlIl, INTEL, 1999.
|
| |
5
|
|
| |
6
|
3dnow! technology manual. Technical Report http://www.amd.com, Advanced Micro Devices, Inc., 1999.
|
| |
7
|
Mips extension for digital media with 3d. Technical Report http://www.mips.com, MIPS technologies, Inc., 1997.
|
| |
8
|
|
 |
9
|
Parthasarathy Ranganathan , Sarita Adve , Norman P. Jouppi, Performance of image and video processing with general-purpose processors and media ISA extensions, Proceedings of the 26th annual international symposium on Computer architecture, p.124-135, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
 |
13
|
|
 |
14
|
|
| |
15
|
|
 |
16
|
Francisca Quintana , Jesus Corbal , Roger Espasa , Mateo Valero, Adding a vector unit to a superscalar processor, Proceedings of the 13th international conference on Supercomputing, p.1-10, June 20-25, 1999, Rhodes, Greece
[doi> 10.1145/305138.305148]
|
| |
17
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
18
|
Jesus Corbal, Roger Espasa, and Mateo Valero. Morn: Instruction set architecture. Technical report, Universitat Politècnica de Catalunya, 1999.
|
 |
19
|
|
| |
20
|
S. Rixner, W.J. Dally, B. Khailany, P. Mattson, U. Kapasi, and J.D. Owens. Register organization for media processing. High Performance Computer Architecture, HPCA-5, pages 375--386, 2000.
|
| |
21
|
Peter Bannon. Alpha 21364: A Scalable Single-chip SMP. Technical Report http://www.digital.com/alphaoem/microprocessorforum.htm, Compaq Computer Corporation, 1998.
|
| |
22
|
|
| |
23
|
|
| |
24
|
William J. Dally. Tomorrow's computing engines (keynote speech). Feb 1998.
|
| |
25
|
Atsushi Kunimatsu , Nobuhiro Ide , Toshinori Sato , Yukio Endo , Hiroaki Murakami , Takayuki Kamei , Masashi Hirano , Fujio Ishihara , Haruyuki Tago , Masaaki Oka , Akio Ohba , Teiji Yutaka , Toyoshi Okada , Masakazu Suzuoki, Vector Unit Architecture for Emotion Synthesis, IEEE Micro, v.20 n.2, p.40-47, March 2000
[doi> 10.1109/40.848471]
|
| |
26
|
|
| |
27
|
|
| |
28
|
R. Schaffer, F. Catthoor, and R. Merker. Combining background memory management and regular array co-partitioning illustrated on a full motion estimation kernel. special issue on Advanced Regular Array Design (T. Plaks, ed.) in J. of Parallel Algorithms and Applications, Vol. 15, No. 3-4:pp. 201--228, December 2000.
|
| |
29
|
|
| |
30
|
|
| |
31
|
Akihiro Iwaya and Tadashi Watanabe. The parallel processing feature of the NEC SX-3 supercomputer system. Intl. Journal of High Speed Computing, 3(3&4):187--197, 1991.
|
|