ACM Home Page
Please provide us with feedback. Feedback
Memory bandwidth limitations of future microprocessors
Full text PdfPdf (1.60 MB)
Source International Symposium on Computer Architecture archive
Proceedings of the 23rd annual international symposium on Computer architecture table of contents
Philadelphia, Pennsylvania, United States
Pages: 78 - 89  
Year of Publication: 1996
ISBN:0-89791-786-3
Also published in ...
Authors
Doug Burger  Computer Sciences Department, University of Wisconsin-Madison, 1210 West Dayton Street, Madison, Wisconsin
James R. Goodman  Computer Sciences Department, University of Wisconsin-Madison, 1210 West Dayton Street, Madison, Wisconsin
Alain Kägi  Computer Sciences Department, University of Wisconsin-Madison, 1210 West Dayton Street, Madison, Wisconsin
Sponsors
IEEE-CS\TCCA : TC on Computer Arhitecture
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 140,   Citation Count: 56
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/232973.232983
What is a DOI?

ABSTRACT

This paper makes the case that pin bandwidth will be a critical consideration for future microprocessors. We show that many of the techniques used to tolerate growing memory latencies do so at the expense of increased bandwidth requirements. Using a decomposition of execution time, we show that for modern processors that employ aggressive memory latency tolerance techniques, wasted cycles due to insufficient bandwidth generally exceed those due to raw memory latencies. Given the importance of maximizing memory bandwidth, we calculate effective pin bandwidth, then estimate optimal effective pin bandwidth. We measure these quantities by determining the amount by which both caches and minimal-traffic caches filter accesses to the lower levels of the memory hierarchy. We see that there is a gap that can exceed two orders of magnitude between the total memory traffic generated by caches and the minimal-traffic caches---implying that the potential exists to increase effective pin bandwidth substantially. We decompose this traffic gap into four factors, and show they contribute quite differently to traffic reduction for different benchmarks. We conclude that, in the short term, pin bandwidth limitations will make more complex on-chip caches cost-effective. For example, flexible caches may allow individual applications to choose from a range of caching policies. In the long term, we predict that off-chip accesses will be so expensive that all system memory will reside on one or more processor chips.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Forest Baskett. Keynote address. International Symposium on Shared Memory MuItiprocessing, April 1991.
 
3
L.A. Belady A Study of Replacement Algorithms for a Virtual- Storage Computer. IBM Systems Journal, 5(2):78-101, I966.
 
4
Doug Burger and Todd M. Austin. Evaluating Future Microprocessors: the SimpleScalar Tool Set. Technical Report 1300, Computer Sciences Department, University of Wisconsin, Madison, WI, April 1996.
 
5
Douglas C. Burger, Alain Kagi, and James R. Goodman. The Declining Effectiveness of Dynamic Caching for General-Purpose Microprocessors. Technical Report 1261, Computer Sciences Department, University of Wisconsin, Madison, WI, January 1995.
6
7
8
 
9
10
 
11
Stefanos Damianakis, Kai Li, and Anne Rogers. An Analysis of a Combined Hardware-Software Mechanism for Speculative Loads. Technical Report TR-455-94, Princeton University, Princeton, NJ, April 1994.
12
13
14
 
15
Hector Garcia-Molina, Richard J. Lipton, and Jacobo Valdes. A Massive Memory Machine. IEEE Transactions on Computers, C- 33(5):391-399, May 1984.
16
 
17
J.D. Gindele. Buffer Block Prefetching Method. IBM Tech. Disclosure Bull., 20(2):696--697, July 1977.
18
19
20
21
22
 
23
24
25
26
 
27
 
28
29
30
31
32
33
 
34
35
36
37
 
38
Burton J. Smith. Architecture and Applications of the HEP Multiprocessor Computer System. In Real-Time Signal Processing IV, pages 241-248, 1981.
39
40
 
41
 
42
Standard Performance Evaluation Corporation. SPEC Newsletter, Fairfax, Virginia, December 1991.
 
43
Standard Performance Evaluation Corporation. SPEC Newsletter, Fairfax, Virginia, September 1995.
44
 
45

CITED BY  56
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Collaborative Colleagues:
Doug Burger: colleagues
James R. Goodman: colleagues
Alain Kägi: colleagues

Peer to Peer - Readers of this Article have also read: