ACM Home Page
Please provide us with feedback. Feedback
TLC: Transmission Line Caches
Full text PdfPdf (407 KB)
Source International Symposium on Microarchitecture archive
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture table of contents
Page: 43  
Year of Publication: 2003
ISBN:0-7695-2043-X
Authors
Bradford M. Beckmann  Computer Sciences Department, University of Wisconsin-Madison
David A. Wood  Computer Sciences Department, University of Wisconsin-Madison
Sponsor
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
Publisher
IEEE Computer Society  Washington, DC, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 21,   Citation Count: 11
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

It is widely accepted that the disproportionate scalingof transistor and conventional on-chip interconnect performancepresents a major barrier to future high performancesystems. Previous research has focused on wire-centricdesigns that use parallelism, locality, and on-chipwiring bandwidth to compensate for long wire latency.An alternative approach to this problem is to exploitnewly-emerging on-chip transmission line technology toreduce communication latency. Compared to conventionalRC wires, transmission lines can reduce delay by up to afactor of 30 for global wires, while eliminating the needfor repeaters. However, this latency reduction comes at thecost of a comparable reduction in bandwidth.In this paper, we investigate using transmission linesto access large level-2 on-chip caches. We propose a familyof Transmission Line Cache (TLC) designs that representdifferent points in the latency/bandwidth spectrum.Compared to the recently-proposed Dynamic Non-UniformCache Architecture (DNUCA) design, the base TLCdesign reduces the required cache area by 18% andreduces the interconnection network's dynamic powerconsumption by an average of 61%. The optimized TLCdesigns attain similar performance using fewer transmis-sionlines but with some additional complexity. Simulationresults using full-system simulation show that TLC providesmore consistent performance than the DNUCAdesign across a wide variety of workloads. TLC caches arelogically simpler than DNUCA designs, but requiregreater circuit and manufacturing complexity.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
[1] V. Agarwal, S. W. Keckler, and D. Burger. The Effect of Technology Scaling on Microarchitectural Structures. Technical Report TR-00-02, Department of Computer Sciences, University of Texas at Austin, May 2001.
 
2
 
3
[3] B. S. Amrutur and M. A. Horowitz. Speed and Power Scaling of SRAMs. IEEE Transactions on Solid-State Circuits, 35(2):175- 185, Feb. 2000.
 
4
[4] H. Bao, J. Bielak, O. Ghattas, L. F. Kallivokas, D. R. O'Hallaron, J. R. Shewchuk, and J. Xu. Large-scale simulation of elastic wave propagation in heterogeneous media on parallel computers. Computer Methods in Applied Mechanics and Engineering, pages 85-102, 1998.
5
 
6
[6] B. J. Benschneider and et. al. A 300-MHz 64-b Quad-Issue CMOS RISC Microprocessor. IEEE Journal of Solid-State Circuits, 30(11):1203-1214, Nov. 1995.
 
7
 
8
[8] R. T. Chang, N. Talwalkar, C. P. Yue, and S. S. Wong. Near Speed-of-Light Signaling Over On-Chip Electrical Interconnects. IEEE Journal of Solid-State Circuits, 38(5):834-838, May 2003.
 
9
[9] C. T. Chaung. Design Considerations of SOI Digital CMOS. In Proceedings of the IEEE 1998 International SOI Conference, pages 5-8, 1998.
 
10
 
11
[11] A. Deutsch. Electrical Characteristics of Interconnections for High-Performance Systems. Proceedings of the IEEE, 86(2):315-355, Feb. 1998.
 
12
[12] A. R. Djordjevic, M. B. Bazdar, T. K. Sarkar, and R. F. Harrington. Matrix Parameters for Multiconductor Transmission Lines: Software and User's Manual. Artech House, 1989.
 
13
[13] I. T. R. for Semiconductors. ITRS 1999 Edition. Semiconductor Industry Association, 1999.
 
14
[14] I. T. R. for Semiconductors. ITRS 2002 Update. Semiconductor Industry Association, 2002. http://public.itrs.net/Files/2002Update/2002Update.pdf.
 
15
 
16
[16] G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The microarchitecture of the Pentium 4 processor. Intel Technology Journal, Feb. 2001.
 
17
[17] R. Ho, K. W. Mai, and M. A. Horowitz. The Future of Wires. Proceedings of the IEEE, 89(4):490-504, Apr. 2001.
18
 
19
[19] S. Kempainen. LVDS Provides Higher Bit Rates, Lower Power, and Improved Noise Performance. http://www.measurement.tm.agilent.com/insight/2000_v5_i2/insig ht_v5i2_articl%e01.shtml, 2000.
 
20
21
22
 
23
[23] C. Kim. Personal Communication, May 2003.
24
 
25
[25] G. K. Konstadinidis and et. al. Implementation of a Third-Generation 1.1-GHz 64-bit Microprocessor. IEEE Journal of Solid-State Circuits, 37(11):1461-1469, Nov. 2002.
 
26
27
 
28
 
29
[29] M. Minzuno, K. Anjo, Y. Sumi, M. Fukaishi, H. Wakabayashi, T. Mogami, T. Horiuchi, and M. Yamashina. Clock Distribution Networks with On-Chip Transmission Lines. In Proceedings of the IEEE 2000 International Interconnect Technology Conference, pages 3-5, 2000.
 
30
31
 
32
[32] D. A. Priore. Inductance on Silicon for Sub-micron CMOS VLSI. In Proceedings of the 1993 Symposium on VLSI Circuits, pages 17-18, 1993.
 
33
[33] M. Racanelli and et. al. Ultra High Speed SiGe NPN for Advanced BiCMOS Technology. Electron Devices Meeting, IEDM Technical Digest. International, pages 15.3.1-15.3.4, 2001.
 
34
[34] D. Sylvester, W. Jiang, and K. Keutzer. BACPAC - Berkeley Advanced Chip Performance Calculator website. http://www-device.eecs.berkeley.edu/dennis/bacpac/.
35
 
36
[36] Systems Performance Evaluation Cooperation. SPEC Benchmarks. http://www.spec.org.
 
37
[37] J. M. Tendler, S. Dodson, S. Fields, H. Le, and B. Sinharoy. POWER4 System Microarchitecture. IBM Server Group Whitepaper, Oct. 2001.
 
38
[38] F. F. Tsui. JSP - A Research Signal Processor in Josephson Technology. IBM Journal of Research and Development, 24(2):243-252, Mar. 1980.
 
39
 
40
[40] J. D. Warnock and et. al. The Circuit and Physical Design of the POWER4 Microprocessor. IBM Journal of Research and Development, 46(1):27-51, Jan. 2002.
 
41
 
42
[42] C.-Y. Wu and M.-C. Shiau. Delay Models and Speed Improvement Techniques for RC Tree Interconnections Among Small-Geometry CMOS Inverters. IEEE Journal of Solid-State Circuits, 25(5):1247- 1256, Oct. 1990.
 
43
[43] T. Xanthopoulos, D. W. Bailey, M. K. G. Atul K. Gangwar, A. K. Jain, and B. K. Prewitt. The Design and Analysis of the Clock Distribution Network for a 1.2 GHz Alpha Microprocessor. In Proceedings of the IEEE 2001 International Solid-State Circuits Conference, pages 402-403, 2001.

CITED BY  11

Collaborative Colleagues:
Bradford M. Beckmann: colleagues
David A. Wood: colleagues