|
Warning: The download time has expired please click on the item to try again.
ABSTRACT
In the past decade, there has been much literature describing various cache organizations that exploit general programming idiosyncrasies to obtain maximum hit rate (the probability that a requested datum is now resident in the cache). Little, if any, has been presented to exploit: (1) the inherent dual input nature of the cache and (2) the many-datum reference type central processor instructions. No matter how high the cache hit rate is, a cache miss may impose a penalty on subsequent cache references. This penalty is the necessity of waiting until the missed requested datum is received from central memory and, possibly, for cache update. For the two cases above, the cache references following a miss do not require the information of the datum not resident in the cache, and are therefore penalized in this fashion. In this paper, a cache organization is presented that essentially eliminates this penalty. This cache organizational feature has been incorporated in a cache/memory interface subsystem design, and the design has been implemented and prototyped. An existing simple instruction set machine has verified the advantage of this feature; future, more extensive and sophisticated instruction set machines may obviously take more advantage. Prior to prototyping, simulations verified the advantage.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
C. J. Conti. Concepts of buffer storage, IEEE Computer Group News, 2 (March 1969).
|
| |
2
|
R. M. Meade. How a cache memory enhances a computer's performance, Electronics (Jan. 1972).
|
| |
3
|
K. R. Kaplan and R. O. Winder. Cache-based computer systems, IEEE Computer (March 1973).
|
| |
4
|
J. Bell, D. Casasent, and C. G. Bell. An investigation of alternative cache organizations. IEEE Transactions on Computers, C-23 (April 1974).
|
| |
5
|
J. H. Kroeger and R. M. Meade (of Cogar Corporation, Woppingers Fall, NY). Cache buffer memory specification.
|
| |
6
|
A. V. Pohm, O. P. Agrawal, and R. N. Monroe. The cost and performance tradeoffs of buffered memories. Proceedings of the IEEE, 63 (Aug. 1973).
|
| |
7
|
A. J. Smith. Sequential program prefetching in memory hierachies, IEEE Computer (Dec 1978).
|
| |
8
|
G. H. Toole. Instruction lookahead and execution traffic considerations for the _____ cache design (Development division internal paper), Control Data-Canada, 1975.
|
CITED BY 136
|
|
|
|
|
|
|
|
Jude A. Rivers , Edward S. Tam , Gary S. Tyson , Edward S. Davidson , Matt Farrens, Utilizing reuse information in data cache management, Proceedings of the 12th international conference on Supercomputing, p.449-456, July 1998, Melbourne, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
K. Nakazawa , H. Nakamura , H. Imori , S. Kawabe, Pseudo vector processor based on register-windowed superscalar pipeline, Proceedings of the 1992 ACM/IEEE conference on Supercomputing, p.642-651, November 16-20, 1992, Minneapolis, Minnesota, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mikko H. Lipasti , William J. Schmidt , Steven R. Kunkel , Robert R. Roediger, SPAID: software prefetching in pointer- and call-intensive environments, Proceedings of the 28th annual international symposium on Microarchitecture, p.231-236, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
|
|
|
|
|
Nigel Topham , Antonio González , José González, The design and performance of a conflict-avoiding cache, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.71-80, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
|
|
Anant Agarwal , Ricardo Bianchini , David Chaiken , Kirk L. Johnson , David Kranz , John Kubiatowicz , Beng-Hong Lim , Kenneth Mackenzie , Donald Yeung, The MIT Alewife machine: architecture and performance, ACM SIGARCH Computer Architecture News, v.23 n.2, p.2-13, May 1995
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
R. L. Lee , P. C. Yew , D. H. Lawrie, Multiprocessor cache design considerations, Proceedings of the 14th annual international symposium on Computer architecture, p.253-262, June 02-05, 1987, Pittsburgh, Pennsylvania, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Anant Agarwal , Ricardo Bianchini , David Chaiken , Kirk L. Johnson , David Kranz , J. Kubiatowicz , B.-H. Lim , K. Mackenzie , D. Yeung, The MIT Alewife machine: architecture and performance, 25 years of the international symposia on Computer architecture (selected papers), p.509-520, June 27-July 02, 1998, Barcelona, Spain
|
|
|
|
|
|
|
|
|
Parthasarathy Ranganathan , Vijay S. Pai , Sarita V. Adve, Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models, Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, p.199-210, June 23-25, 1997, Newport, Rhode Island, United States
|
|
|
Tim Stanley , Michael Upton , Patrick Sherhart , Trevor Mudge , Richard Brown, A microarchitectural performance evaluation of a 3.2 Gbyte/s microprocessor bus, Proceedings of the 26th annual international symposium on Microarchitecture, p.31-40, December 01-03, 1993, Austin, Texas, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Michel Dubois , Jin Chin Wang , Luiz A. Barroso , Kangwoo Lee , Yung-Syau Chen, Delayed consistency and its effects on the miss rate of parallel programs, Proceedings of the 1991 ACM/IEEE conference on Supercomputing, p.197-206, November 18-22, 1991, Albuquerque, New Mexico, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
William Y. Chen , Scott A. Mahlke , Pohua P. Chang , Wen-mei W. Hwu, Data access microarchitectures for superscalar processors with compiler-assisted data prefetching, Proceedings of the 24th annual international symposium on Microarchitecture, p.69-73, September 1991, Albuquerque, New Mexico, Puerto Rico
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Toshihiro Ozawa , Yasunori Kimura , Shin'ichiro Nishizaki, Cache miss heuristics and preloading techniques for general-purpose programs, Proceedings of the 28th annual international symposium on Microarchitecture, p.243-248, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
|
|
|
|
|
Hiroshi Nakamura , Taisuke Boku , Hideo Wada , Hiromitsu Imori , Ikuo Nakata , Yasuhiro Inagami , Kisaburo Nakazawa , Yoshiyuki Yamashita, A scalar architecture for pseudo vector processing based on slide-windowed registers, Proceedings of the 7th international conference on Supercomputing, p.298-307, July 19-23, 1993, Tokyo, Japan
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kevin Skadron , Pritpal S. Ahuja , Margaret Martonosi , Douglas W. Clark, Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques, IEEE Transactions on Computers, v.48 n.11, p.1260-1281, November 1999
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Christopher Batten , Ronny Krashinsky , Steve Gerding , Krste Asanovic, Cache Refill/Access Decoupling for Vector Machines, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.331-342, December 04-08, 2004, Portland, Oregon
|
|
|
|
|
|
Anahita Shayesteh , Glenn Reinman , Norm Jouppi , Tim Sherwood , Suleyman Sair, Improving the performance and power efficiency of shared helpers in CMPs, Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, October 22-25, 2006, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
Jan Edler , Allan Gottlieb , Clyde P. Kruskal , Kevin P. McAuliffe , Larry Rudolph , Marc Snir , Patricia J. Teller , James Wilson, Issues related to MIMD shared-memory computers: the NYU ultracomputer approach, ACM SIGARCH Computer Architecture News, v.13 n.3, p.126-135, June 1985
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Haitham Akkary , Komal Jothi , Renjith Retnamma , Satyanarayana Nekkalapu , Doug Hall , Shahrokh Shahidzadeh, On the potential of latency tolerant execution in speculative multithreading, Proceedings of the 1st international forum on Next-generation multicore/manycore technologies, November 24-25, 2008, Cairo, Egypt
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Akihiro Musa , Yoshiei Sato , Takashi Soga , Koki Okabe , Ryusuke Egawa , Hiroyuki Takizawa , Hiroaki Kobayashi, A shared cache for a chip multi vector processor, Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture, p.24-29, October 26-26, 2008, Toronto, Canada
|
|
|
|
|
|
|
|
|
|
|
|
Takashi Soga , Akihiro Musa , Youichi Shimomura , Ryusuke Egawa , Ken'ichi Itakura , Hiroyuki Takizawa , Koki Okabe , Hiroaki Kobayashi, Performance evaluation of NEC SX-9 using real science and engineering applications, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, November 14-20, 2009, Portland, Oregon
|
|