|
ABSTRACT
Buffers in on-chip networks consume significant energy, occupy chip area, and increase design complexity. In this paper, we make a case for a new approach to designing on-chip interconnection networks that eliminates the need for buffers for routing or flow control. We describe new algorithms for routing without using buffers in router input/output ports. We analyze the advantages and disadvantages of bufferless routing and discuss how router latency can be reduced by taking advantage of the fact that input/output buffers do not exist. Our evaluations show that routing without buffers significantly reduces the energy consumption of the on-chip cache/processor-to-cache network, while providing similar performance to that of existing buffered routing algorithms at low network utilization (i.e., on most real applications). We conclude that bufferless routing can be an attractive and energy-efficient design option for on-chip cache/processor-to-cache networks where network utilization is low.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Robert Alverson , David Callahan , Daniel Cummings , Brian Koblenz , Allan Porterfield , Burton Smith, The Tera computer system, Proceedings of the 4th international conference on Supercomputing, p.1-6, June 11-15, 1990, Amsterdam, The Netherlands
|
| |
2
|
P. Baran. On distributed communications networks. IEEE Trans. on Communications, Mar. 1964.
|
| |
3
|
|
 |
4
|
Sanjay Bhansali , Wen-Ke Chen , Stuart de Jong , Andrew Edwards , Ron Murray , Milenko Drinić , Darek Mihočka , Joe Chau, Framework for instruction-level tracing and analysis of program executions, Proceedings of the 2nd international conference on Virtual execution environments, June 14-16, 2006, Ottawa, Ontario, Canada
[doi> 10.1145/1134760.1220164]
|
| |
5
|
D. Boggs et al. The microarchitecture of the Intel Pentium 4 processor on 90nm technology. Intel Technology Journal, 8(1), Feb. 2004.
|
 |
6
|
|
| |
7
|
|
 |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
W. J. Dally and C. L. Seitz. The torus routing chip. Distributed Computing, 1:187--196, 1986.
|
| |
12
|
|
| |
13
|
|
| |
14
|
U. Feige and P. Raghavan. Exact analysis of hot-potato routing. In STOC, 1992.
|
| |
15
|
J. M. Frailong, W. Jalby, and J. Lenfant. XOR-Schemes: A flexible data organization in parallel memories. In ICPP, 1985.
|
| |
16
|
|
| |
17
|
|
| |
18
|
C. Gomez, M. E. Gomez, P. Lopez, and J. Duato. A bufferless switching technique for NoCs. In Wina, 2008.
|
| |
19
|
|
 |
20
|
Michael K. Gowan , Larry L. Biro , Daniel B. Jackson, Power considerations in the design of the Alpha 21264 microprocessor, Proceedings of the 35th annual Design Automation Conference, p.726-731, June 15-19, 1998, San Francisco, California, United States
[doi> 10.1145/277044.277226]
|
| |
21
|
P. Gratz, B. Grot, and S. W. Keckler. Regional congestion awareness for load balance in networks-on-chip. In HPCA-14, 2008.
|
| |
22
|
P. Gratz, C. Kim, R. McDonald, S. W. Keckler, and D. Burger. Implementation and evaluation of on-chip network architectures. In ICCD, 2006.
|
| |
23
|
|
| |
24
|
|
| |
25
|
|
 |
26
|
|
| |
27
|
|
 |
28
|
|
| |
29
|
|
| |
30
|
|
 |
31
|
|
 |
32
|
Chi-Keung Luk , Robert Cohn , Robert Muth , Harish Patil , Artur Klauser , Geoff Lowney , Steven Wallace , Vijay Janapa Reddi , Kim Hazelwood, Pin: building customized program analysis tools with dynamic instrumentation, Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, June 12-15, 2005, Chicago, IL, USA
|
| |
33
|
K. Luo, J. Gummaraju, and M. Franklin. Balancing throughput and fairness in SMT processors. In ISPASS, 2001.
|
 |
34
|
Milo M. K. Martin , Daniel J. Sorin , Anatassia Ailamaki , Alaa R. Alameldeen , Ross M. Dickson , Carl J. Mauer , Kevin E. Moore , Manoj Plakal , Mark D. Hill , David A. Wood, Timestamp snooping: an approach for extending SMPs, Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, p.25-36, November 2000, Cambridge, Massachusetts, United States
|
| |
35
|
G. Michelogiannakis, J. Balfour, and W. J. Dally. Elastic-buffer flow control for on-chip networks. In HPCA-15, 2009.
|
| |
36
|
|
| |
37
|
Micron. 1Gb DDR2 SDRAM Component: MT47H128M8HQ-25, May 2007. http://download.micron.com/pdf/datasheets/dram/ddr2/1GbDDR2.pdf.
|
| |
38
|
|
 |
39
|
|
| |
40
|
|
 |
41
|
|
| |
42
|
John D. Owens , William J. Dally , Ron Ho , D. N. (Jay) Jayasimha , Stephen W. Keckler , Li-Shiuan Peh, Research Challenges for On-Chip Interconnection Networks, IEEE Micro, v.27 n.5, p.96-108, September 2007
[doi> 10.1109/MM.2007.91]
|
| |
43
|
Harish Patil , Robert Cohn , Mark Charney , Rajiv Kapoor , Andrew Sun , Anand Karunanidhi, Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.81-92, December 04-08, 2004, Portland, Oregon
[doi> 10.1109/MICRO.2004.28]
|
| |
44
|
|
 |
45
|
|
| |
46
|
B. J. Smith. A pipelined shared resource MIMD computer. In ICPP, 1978.
|
| |
47
|
B. J. Smith. Architecture and applications of the HEP multiprocessor computer system. In Proc. of SPIE, 1981.
|
| |
48
|
B. J. Smith, Apr. 2008. Personal communication.
|
 |
49
|
|
 |
50
|
Michael Bedford Taylor , Walter Lee , Jason Miller , David Wentzlaff , Ian Bratt , Ben Greenwald , Henry Hoffmann , Paul Johnson , Jason Kim , James Psota , Arvind Saraf , Nathan Shnidman , Volker Strumpen , Matt Frank , Saman Amarasinghe , Anant Agarwal, Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams, Proceedings of the 31st annual international symposium on Computer architecture, p.2, June 19-23, 2004, München, Germany
|
| |
51
|
|
| |
52
|
X. Wang, A. Morikawa, and T. Aoyama. Burst optical deflection routing protocol for wavelength routing WDM networks. In SPIE/IEEE Opticom, 2004.
|
| |
53
|
David Wentzlaff , Patrick Griffin , Henry Hoffmann , Liewei Bao , Bruce Edwards , Carl Ramey , Matthew Mattina , Chyi-Chang Miao , John F. Brown III , Anant Agarwal, On-Chip Interconnection Architecture of the Tile Processor, IEEE Micro, v.27 n.5, p.15-31, September 2007
[doi> 10.1109/MM.2007.89]
|
|