ACM Home Page
Please provide us with feedback. Feedback
Multiprocessor System-on-Chip designs with active memory processors for higher memory efficiency
Full text PdfPdf (432 KB)
Source Annual ACM IEEE Design Automation Conference archive
Proceedings of the 46th Annual Design Automation Conference table of contents
San Francisco, California
SESSION: Network-on-chip advances for power, reliability and the memory bottleneck table of contents
Pages 806-811  
Year of Publication: 2009
ISBN:978-1-60558-497-3
Authors
Junhee Yoo  Seoul National University
Sungjoo Yoo  POSTECH
Kiyoung Choi  Seoul National University
Sponsors
EDAC : Electronic Design Automation Consortium
SIGDA: ACM Special Interest Group on Design Automation
IEEE-CAS : Circuits & Systems
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 20,   Downloads (12 Months): 20,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1629911.1630118
What is a DOI?

ABSTRACT

Memory access latency and memory-related operations are often the performance bottleneck in parallel applications. In this paper, we present a concept of active memory operations which is an on-chip network transaction that operates based on the microcode provided by the software designer. Utilizing the active memory operation, we can replace multiple transactions of memory accesses over the on-chip network and related local processing element computation with a smaller number of high-level transactions and near-memory computation. We implemented a processor called active memory processor which is located near the memory and executes the active memory operations. In our case studies, we applied the concept to three real-world applications (parallelized JPEG, FFT, and text indexing for data mining) running on a 36-tile architecture with 32 cores and 4 memories and found that the programmable transaction approach can improve performance by 34.3% to 618% at the cost of additional design effort.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
B. K. Mathew, S. A. McKee, J. B. Carter, A. Davis, "Design of a parallel vector access unit for SDRAM memory systems," in Proc. 6th International Symposium on High-Performance Computer Architecture, pp. 39--48, Jan 2000
 
2
Wei-fen Lin, Steven K. Reinhart, D. Burger. "Reducing DRAM latencies with an integrated memory hierarchy design," in Proc. 7th International Symposium on High-Performance Computer Architecture, pp. 301, Jan 2001
 
3
A. Roth and G. S. Sohi. "Effective jump-pointer prefetching for linked data structures," in Proc. 26th International Symposium on Computer Architecture, May 1999.
 
4
M. Karlsson, F. Dahlgren, P, Stenstrom. "A prefetching technique for irregular accesses to linked data structures," in Proc. 6th International Symposium on High-Performance Computer Architecture, 2000.
 
5
S. P. Vanderwiel, D. J. Lilja, "Data prefetch mechanisms," ACM Computing Surveys, v. 32 n. 2, p. 174--199, June 2000.
 
6
M. Frigo, C. E. Leiserson, H. Prokop, S. Ramachandran, "Cache-oblivious algorithms," in Proc. 40th Annual Symposium on Foundations of Computer Science, 1999.
 
7
M. Bender, E. Demaine, M. Farach-Coltom. "Cache-oblivious B-trees," in Proc. 41st Annual Symposium of Foundations of Computer Science, 2000.
 
8
L. Arge, M. Bender, E. Demaine, B. Holland-Minkley, J. Ian Munro, "Cache-oblivious priority queue and graph algorithm applications," in Proc. 34th annual ACM Symposium on Theory of Computing, May 2002.
 
9
T. v. Eicken, D. E. Culler, S. C. Goldstein, K. E. Schauser, "Active messages: a mechanism for integrated communication and computation," in Proc. 19th Annual Internation Symposium on Computer Architecture, 1992.
 
10
Stratix III FPGA Device Family Overview, http://www.altera.com/products/devices/stratix-fpgas/stratix-iii/overview/st3-overview.html
 
11
L. Li, L. Gao, J. Xue, "Memory coloring: a compiler approach for scratchpad memory management," in Proc. of 14th International Conference on Parallel Architectures and Compilation Techniques, pp. 329--338, 2005.
 
12
I. Issenin, E. Brockmeyer, B. Durinck, N. Dutt, "Multiprocessor system-on-chip data reuse analysis for exploring customized memory hierarchies," in Proc. of 43rd Design Automation Conference, pp. 49--52, July 2006.
 
13
L. Rudolph, P. Jain, S. Devadas, D. Chiou, "Application-specific memory management for embedded systems using software-controlled caches," in Proc. of 37th Design Automation Conference, pp. 416--419, June 2000.
 
14
Z. Fang, L. Zhang, J. B. Carter, A. Ibrahim, M. A. Parker, "Active Memory Operations", in Proc. of 21st Annual International Conference on Supercomputing, pp. 232--241, July 2007.