|
ABSTRACT
Memory access latency and memory-related operations are often the performance bottleneck in parallel applications. In this paper, we present a concept of active memory operations which is an on-chip network transaction that operates based on the microcode provided by the software designer. Utilizing the active memory operation, we can replace multiple transactions of memory accesses over the on-chip network and related local processing element computation with a smaller number of high-level transactions and near-memory computation. We implemented a processor called active memory processor which is located near the memory and executes the active memory operations. In our case studies, we applied the concept to three real-world applications (parallelized JPEG, FFT, and text indexing for data mining) running on a 36-tile architecture with 32 cores and 4 memories and found that the programmable transaction approach can improve performance by 34.3% to 618% at the cost of additional design effort.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
B. K. Mathew, S. A. McKee, J. B. Carter, A. Davis, "Design of a parallel vector access unit for SDRAM memory systems," in Proc. 6<sup>th</sup> International Symposium on High-Performance Computer Architecture, pp. 39--48, Jan 2000
|
| |
2
|
|
 |
3
|
|
| |
4
|
M. Karlsson, F. Dahlgren, P, Stenstrom. "A prefetching technique for irregular accesses to linked data structures," in Proc. 6<sup>th</sup> International Symposium on High-Performance Computer Architecture, 2000.
|
 |
5
|
|
| |
6
|
|
| |
7
|
|
 |
8
|
Lars Arge , Michael A. Bender , Erik D. Demaine , Bryan Holland-Minkley , J. Ian Munro, Cache-oblivious priority queue and graph algorithm applications, Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, May 19-21, 2002, Montreal, Quebec, Canada
[doi> 10.1145/509907.509950]
|
 |
9
|
Thorsten von Eicken , David E. Culler , Seth Copen Goldstein , Klaus Erik Schauser, Active messages: a mechanism for integrated communication and computation, Proceedings of the 19th annual international symposium on Computer architecture, p.256-266, May 19-21, 1992, Queensland, Australia
|
| |
10
|
Stratix III FPGA Device Family Overview, http://www.altera.com/products/devices/stratix-fpgas/stratix-iii/overview/st3-overview.html
|
| |
11
|
|
 |
12
|
Ilya Issenin , Erik Brockmeyer , Bart Durinck , Nikil Dutt, Multiprocessor system-on-chip data reuse analysis for exploring customized memory hierarchies, Proceedings of the 43rd annual Design Automation Conference, July 24-28, 2006, San Francisco, CA, USA
[doi> 10.1145/1146909.1146925]
|
 |
13
|
Derek Chiou , Prabhat Jain , Larry Rudolph , Srinivas Devadas, Application-specific memory management for embedded systems using software-controlled caches, Proceedings of the 37th Annual Design Automation Conference, p.416-419, June 05-09, 2000, Los Angeles, California, United States
[doi> 10.1145/337292.337523]
|
 |
14
|
Zhen Fang , Lixin Zhang , John B. Carter , Ali Ibrahim , Michael A. Parker, Active memory operations, Proceedings of the 21st annual international conference on Supercomputing, June 17-21, 2007, Seattle, Washington
[doi> 10.1145/1274971.1275004]
|
|