|
ABSTRACT
Network processors are heterogeneous system-on-chip multiprocessors that are optimized to perform packet forwarding and processing tasks at Gigabit data rates. To meet the performance demands of increasing link speeds and complex network applications, network processors are implemented with several dozen embedded processor cores and hardware accelerators that run multiple packet processing applications in parallel. The parallel nature of the processing system makes it increasingly difficult for application developers to understand and manage resources and map processing tasks to the hardware. To address this problem, we present a methodology for profiling and analyzing network processor applications, mapping processing tasks to a generalized network processor architecture, and analytically determining the expected throughput performance. The key novelty of this work is not only the adaptation of application analysis and mapping algorithms to heterogeneous network processors, but also that the entire process can be automated and hidden from the application developer. Starting with the analysis of a uniprocessor implementation of the application, the process yields a mapping of the partitioned application that shows best performance for a given network processor system. The simplicity of the proposed randomized mapping algorithm allows the use of this methodology in network processor runtime systems where dynamic reallocation of tasks is necessary but processing power is limited. We present results that show the effectiveness of the analysis and mapping methodology as well as its application to design space exploration.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Austin, T. M. and Sohi, G. S. 1993. Tetra: evaluation of serial program performance on fine-grain parallel processors. Tech. rep. 1163, Computer Science Department, University of Wisconsin, Madison.
|
| |
3
|
|
| |
4
|
|
| |
5
|
Daemen, J. and Rijmen, V. 2000. The block cipher Rijndael. Lecture Notes in Computer Science. Vol. 1820. Springer-Verlag, Berlin, Germany, 288--296.
|
 |
6
|
|
| |
7
|
|
| |
8
|
Franklin, M. A. and Wolf, T. 2002. A network processor performance and design model with benchmark parameterization. In Proceedings of the 1st Network Processor Workshop (NP-1) in Conjunction with the 8th International Symposium on High-Performance Computer Architecture (HPCA-8). ACM, New York, 63--74.
|
| |
9
|
Franklin, M. A. and Wolf, T. 2003. Power considerations in network processor design. In Proceedings of the 2nd Network Processor Workshop (NP-2) in Conjunction with 9th International Symposium on High-Performance Computer Architecture (HPCA-9). ACM, New York, 10--22.
|
| |
10
|
Goglin, S. D., Hooper, D., Kumar, A., and Yavatkar, R. 2003. Advanced software framework, tools, and languages for the IXP family. Intel Tech. J. 7, 4, 64--76.
|
| |
11
|
Grasso et al., P. A. 1984. Memory interference in multimicroprocessor systems with a time-shared bus. Proc. IEEE 131, 10.
|
| |
12
|
Gries, M., Kulkarni, C., Sauer, C., and Keutzer, K. 2003. Exploring trade-offs in performance and programmability of processing element topologies for network processors. In Proceedings of the 2nd Network Processor Workshop (NP-2) in Conjunction with 9th International Symposium on High-Performance Computer Architecture (HPCA-9). ACM, New York, 75--87.
|
| |
13
|
|
| |
14
|
Intel Corporation 2003. Intel IXA software developers Kit 2.01.
|
| |
15
|
Ujval J. Kapasi , Scott Rixner , William J. Dally , Brucek Khailany , Jung Ho Ahn , Peter Mattson , John D. Owens, Programmable Stream Processors, Computer, v.36 n.8, p.54-62, August 2003
[doi> 10.1109/MC.2003.1220582]
|
| |
16
|
|
 |
17
|
|
| |
18
|
Kokku, R., Riché, T., Kunze, A., Mudigonda, J., Jason, J., and Vin, H. 2003. A case for run-time adaptation in packet processing systems. In Proceedings of the 2nd Workshop on Hot Topics in Networks (HOTNETSII). Cambridge, MA.
|
 |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
|
| |
23
|
Nilsson, S. and Karlsson, G. 1999. IP-address lookup using LC-tries. IEEE J. Sel. Areas Comm. 17, 6, 1083--1092.
|
| |
24
|
Ramaswamy, R., Weng, N., and Wolf, T. 2004. Application analysis and resource mapping for heterogeneous network processor architectures. In Proceedings of the 3rd Workshop on Network Processors and Applications (NP-3) in Conjunction with the 10th International Symposium on High Performance Computer Architecture (HPCA-10). ACM, New York, 103--119.
|
| |
25
|
|
| |
26
|
Ramaswamy, R. and Wolf, T. 2003. PacketBench: A tool for workload characterization of network processing. In Proceedings of the IEEE 6th Annual Workshop on Workload Characterization (WWC-6). IEEE, Los Alamitos, CA, 42--50.
|
| |
27
|
|
| |
28
|
Shah, N., Plishker, W., and Keutzer, K. 2003. NP-Click: A programming model for the intel IXP1200. In Proceedings of the 2nd Network Processor Workshop (NP-2) in Conjunction with 9th International Symposium on High-Performance Computer Architecture (HPCA-9). ACM, New York, 100--111.
|
| |
29
|
Michael Bedford Taylor , Jason Kim , Jason Miller , David Wentzlaff , Fae Ghodrat , Ben Greenwald , Henry Hoffman , Paul Johnson , Jae-Wook Lee , Walter Lee , Albert Ma , Arvind Saraf , Mark Seneski , Nathan Shnidman , Volker Strumpen , Matt Frank , Saman Amarasinghe , Anant Agarwal, The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs, IEEE Micro, v.22 n.2, p.25-35, March 2002
[doi> 10.1109/MM.2002.997877]
|
| |
30
|
Teja Technologies. 2003. TejaNP datasheet. Teja Technologies. http://www.teja.com.
|
| |
31
|
Thiele, L., Chakraborty, S., Gries, M., and Künzli, S. 2002. Design space exploration of network processor architectures. In Proceedings of the 1st Network Processor Workshop (NP-1) in Conjunction with the 8th International Symposium on High-Performance Computer Architecture (HPCA-8). ACM, New York, 30--41.
|
 |
32
|
|
| |
33
|
Wei, Y.-C. and Cheng, C.-K. 1991. Ratio cut partitioning for hierarchical designs. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 10, 7, 911--921.
|
| |
34
|
|
 |
35
|
|
|