|
ABSTRACT
Programming multi-processor ASIPs, such as network processors, remains an art due to the wide variety of architectures and due to little support for exploring different implementation alternatives. We present a study that implements an IP forwarding router application on two different network processors to better understand the main challenges in programming such multi-processor ASIPs. The goal of this study is to identify the elements central to a successful deployment of such systems based on a detailed profiling of the two architectures. Our results show that inefficient partitioning can impact the throughput by more than 30%; a better arbitration of resources increases the throughput by at least 10%, and localization of computation related to the memories can increase the available bandwidth on internal buses by a factor of two. The main observation of our study is that there is a critical lack of tools and methods that support an integrated approach to partitioning, scheduling and arbitration, and data transfer management for such system implementations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
ACE Associated Compiler Experts bv, the Netherlands, CoSy compiler development systemphhttp://www.ace.nl.
|
| |
2
|
CoWare, Inc., LISATek EDA Toolsphhttp://www.coware.com.
|
| |
3
|
Target Compiler Technologies n.v., Chess/Checkers retargetable tool-suitephhttp://www.retarget.com.
|
| |
4
|
Tensilica, Inc., Xtensa C/C++ Compiler (XCC) phhttp://www.tensilica.com.
|
| |
5
|
F. Baker. Requirements for IP version 4 routers. RFC1812, Internet Engineering Task Force (IETF), June 1995.
|
| |
6
|
William Blume , Rudolf Eigenmann , Jay Hoeflinger , David Padua , Paul Petersen , Lawrence Rauchwerger , Peng Tu, Automatic Detection of Parallelism: A Grand Challenge for High-Performance Computing, IEEE Parallel & Distributed Technology: Systems & Technology, v.2 n.3, p.37-47, September 1994
[doi> 10.1109/M-PDT.1994.329796]
|
| |
7
|
|
| |
8
|
|
| |
9
|
P. Chandra, F. Hady, R. Yavatkar, T. Bock, M. Cabot, and P. Mathew. Benchmarking network processors. In P. Crowley, M. Franklin, H. Hadimioglu, and P. Onufryk, editors, Network Processor Design: Issues and Practices, volume~1, pages 11--25. Morgan Kaufmann Publishers, Oct. 2002.
|
| |
10
|
|
| |
11
|
D. Naishlos, M. Biberstein, and A. Zaks. Compiler vectorization techniques for a disjoint SIMD architecture. Technical Report H0146, IBM Research, Nov. 2002.
|
 |
12
|
|
| |
13
|
G. Giacalone, T. Brightman, A. Brown, J. Brown, J. Farrell, R. Fortino, T. Franco, A. Funk, K. Gillespie, E. Gould, D. Husak, E. McLellan, B. Peregoy, D. Priore, M. Sankey, P. Stropparo, and J. Wise. A 200~MHz digital communications processor. In IEEE International Solid-State Circuits Conference (ISSCC), pages 416--417, Feb. 2000.
|
| |
14
|
M. Gries, C. Kulkarni, C. Sauer, and K. Keutzer. Exploring trade-offs in performance and programmability of processing element topologies for network processors. In Second Workshop on Network Processors at the 9th International Symposium on High Performance Computer Architecture (HPCA9), Mar. 2003.
|
| |
15
|
T. R. Halfhill. Intel network processor targets routers. Microprocessor Report, 13(12), Sept. 1999.
|
| |
16
|
Mary W. Hall , Jennifer M. Anderson , Saman P. Amarasinghe , Brian R. Murphy , Shih-Wei Liao , Edouard Bugnion , Monica S. Lam, Maximizing Multiprocessor Performance with the SUIF Compiler, Computer, v.29 n.12, p.84-89, December 1996
[doi> 10.1109/2.546613]
|
| |
17
|
J. Hoogerbrugge and L. Augusteijn. Instruction scheduling for Trimedia. Journal of Instruction-Level Parallelism, 1(1), Feb. 1999.
|
| |
18
|
W. W. Hwu, R. E. Hank, D. M. Gallagher, S. A. Mahlke, D. M. Lavery, G. E. Haab, J. C. Gyllenhaal, and D. I. August. Compiler technology for future microprocessors. Proceedings of the IEEE, 83(12):1625--1640, 1995.
|
 |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
D. Lanneer, J. V. Praet, A. Kifli, K. Schoofs, W. Geurts, F. Thoen, and G. Goossens. CHESS: Retargetable code generation for embedded DSP processors. In Code Generation for Embedded Processors, pages 85--102. Kluwer Academic Publishers, 1995.
|
| |
23
|
R. Leupers and P. Marwedel. Retargetable compilers for embedded DSPs. In 7th European Multimedia, Microprocessor Systems and Electronic Commerce Conference (EMMSEC), Nov. 1997.
|
| |
24
|
|
| |
25
|
C. Liem and P. Paulin. Compilation techniques and tools for embedded processor architectures. In J. Staunstrup and W. Wolf, editors, Hardware/Software Co-Design: Principles and Practise. Kluwer Academic Publishers, 1997.
|
| |
26
|
Andrew Mihal , Chidamber Kulkarni , Matthew Moskewicz , Mel Tsai , Niraj Shah , Scott Weber , Yujia Jin , Kurt Keutzer , Christian Sauer , Kees Vissers , Sharad Malik, Developing Architectural Platforms: A Disciplined Approach, IEEE Design & Test, v.19 n.6, p.6-16, November 2002
[doi> 10.1109/MDT.2002.1047739]
|
| |
27
|
J. Nickolls, L. J. Madar III, S. Johnson, V. Rustagi, K. Unger, and M. Choudhury. Broadcom Calisto: A multi-channel multi-service communication platform. In 14th Hot-Chips Symposium, Aug. 2002.
|
 |
28
|
P. R. Panda , F. Catthoor , N. D. Dutt , K. Danckaert , E. Brockmeyer , C. Kulkarni , A. Vandercappelle , P. G. Kjeldsberg, Data and memory optimization techniques for embedded systems, ACM Transactions on Design Automation of Electronic Systems (TODAES), v.6 n.2, p.149-206, April 2001
[doi> 10.1145/375977.375978]
|
 |
29
|
|
| |
30
|
|
| |
31
|
|
| |
32
|
N. Shah. Understanding network processors. Master's thesis, Dept. of Electrical Eng. and Computer Sciences, University of California, Berkeley, September 2001.
|
| |
33
|
N. Shah, W. Plishker, and K. Keutzer. NP-Click: A programming model for the Intel IXP 1200. In 2nd Workshop on Network Processors (NP2) at the 9th International Symposium on High Performance Computer Architecture (HPCA9), Feb. 2003.
|
| |
34
|
|
 |
35
|
Tammo Spalink , Scott Karlin , Larry Peterson , Yitzchak Gottlieb, Building a robust software-based router using network processors, Proceedings of the eighteenth ACM symposium on Operating systems principles, October 21-24, 2001, Banff, Alberta, Canada
|
| |
36
|
|
| |
37
|
Teja Technologies. IPv4 forwarding application performance. White paper, July 2002.
|
| |
38
|
|
| |
39
|
A. Tillmann. A case for using a specialized language for NPU design. EE Times, Aug. 2002.
|
| |
40
|
M. Tsai, C. Kulkarni, C. Sauer, N. Shah, and K. Keutzer. A benchmarking methodology for network processors. In P. Crowley, M. Franklin, H. Hadimioglu, and P. Onufryk, editors, Network Processor Design: Issues and Practices, volume~1, pages 141--165. Morgan Kaufmann Publishers, Oct. 2002.
|
 |
41
|
|
CITED BY 8
|
|
|
|
|
Michael K. Chen , Xiao Feng Li , Ruiqi Lian , Jason H. Lin , Lixia Liu , Tao Liu , Roy Ju, Shangri-La: achieving high performance from compiled network applications while enabling ease of programming, ACM SIGPLAN Notices, v.40 n.6, June 2005
|
|
|
|
|
|
Duo Liu , Bei Hua , Xianghui Hu , Xinan Tang, High-performance packet classification algorithm for many-core and multithreaded network processor, Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, October 22-25, 2006, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
|
|