|
ABSTRACT
Profilers play an important role in software/hardware design, optimization, and verification. Various approaches have been proposed to implement profilers. The most widespread approach adopted in the embedded domain is Instruction Set Simulation (ISS) based profiling, which provides uncompromised accuracy but limited execution speed. Source code profilers, on the contrary, are fast but less accurate. This paper introduces TotalProf, a fast and accurate source code cross profiler that estimates the performance of an application from three aspects: First, code optimization and a novel virtual compiler backend are employed to resemble the course of target compilation. Second, an optimistic static scheduler is introduced to estimate the behavior of the target processor's datapath. Last but not least, dynamic events, such as cache misses, bus contention and branch prediction failures, are simulated at runtime. With an abstract architecture description, the tool can be easily retargeted in a performance characteristics oriented way to estimate different processor architectures, including DSPs and VLIW machines. Multiple instances of TotalProf can be integrated with SystemC to support heterogeneous Multi-Processor System-on-Chip (MPSoC) profiling. With only about a 5 to 15% error rate introduced to the major performance metrics, such as cycle count, memory accesses and cache misses, a more than one Giga-Instruction-Per-Second (GIPS) execution speed is achieved.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. M. Anderson, L. M. Berc, J. Dean, S. Ghemawat, M. R. Henzinger, S.-T. A. Leung, R. L. Sites, M. T. Vandevoorde, C. A. Waldspurger, and W. E. Weihl. Continuous Profiling: Where Have All the Cycles Gone? ACM Transaction on Computer Systems, 15(4), 1997.
|
| |
2
|
D. Bartholomew. QEMU: a Multihost, Multitarget Emulator. Linux Journal, 2006(145):3, 2006.
|
| |
3
|
A. Bouchhima, P. Gerin, and F. Petrot. Automatic Instrumentation of Embedded Software for High Level Hardware/Software Co-Simulation. In ASP-DAC '09: Proceedings of the 2009 Conference on Asia and South Pacific Design Automation, 2009.
|
| |
4
|
D. C. Burger and T. M. Austin. The SimpleScalar Tool Set, Version 2.0. Technical Report CS-TR-1997-1342, 1997.
|
| |
5
|
B. D. Bus, B. D. Sutter, L. V. Put, D. Chanet, and K. D. Bosschere. Link-Time Optimization of ARM Binaries. In LCTES '04: Proceedings of Conference on Languages, Compilers, and Tools for Embedded Systems, pages 211--220, 2004.
|
| |
6
|
L. Cai, A. Gerstlauer, and D. Gajski. Retargetable Profiling for Rapid, Early System-Level Design Space Exploration. In DAC '04: Proceedings of the 41st annual conference on Design automation, pages 281--286, 2004.
|
| |
7
|
J. Ceng, J. Castrillon, W. Sheng, H. Scharwachter, R. Leupers, G. Ascheid, H. Meyr, T. Isshiki, and H. Kunieda. MAPS: An Integrated Framework for MPSoC Application Parallelization. In DAC '08, pages 754--759, 2008.
|
| |
8
|
E. Cheung, H. Hsieh, and F. Balarin. Fast and Accurate Performance Simulation of Embedded Software for MPSoC. In ASP-DAC '09, 2009.
|
| |
9
|
L. Eeckhout, K. de Bosschere, and H. Neefs. Performance analysis through synthetic trace generation. In ISPASS '00: Proceeding of IEEE International Symposium on Performance Analysis of Systems and Software, 2000.
|
| |
10
|
L. Gao, S. Kraemer, R. Leupers, G. Ascheid, and H. Meyr. A Fast and Generic Hybrid Simulation Approach using C Virtual Machine. In CASES '07: Proceedings of the 2007 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pages 3--12, 2007.
|
| |
11
|
S. L. Graham, P. B. Kessler, and M. K. McKusick. Gprof: A Call Graph Execution Profiler. Proceeding of SIGPLAN Symposium on Compiler Construction, 17(6):120--126, 1982.
|
| |
12
|
A. Halambi, P. Grun, V. Ganesh, A. Khare, N. Dutt, and A. Nicolau. EXPRESSION: a Language for Architecture Exploration through Compiler/Simulator Retargetability. In DATE '99: Proceedings of the Conference on Design, Automation and Test in Europe, 1999.
|
| |
13
|
Y. Hwang, S. Abdi, and D. Gajski. Cycle-Approximate Retargetable Performance Estimation at the Transaction Level. In DATE '08, pages 3--8, 2008.
|
| |
14
|
Intel VTune. software.intel.com/en-us/intel-vtune/.
|
| |
15
|
D. Jones and N. Topham. High Speed CPU Simulation Using LTU Dynamic Binary Translation. In HiPEAC '09: Proceeding of Conference on High Performance Embedded Architectures and Compilers, 2009.
|
| |
16
|
K. Karuri, M. A. A. Faruque, S. Kraemer, R. Leupers, G. Ascheid, and H. Meyr. Fine-Grained Application Source Code Profiling for ASIP Design. In DAC '05, pages 329--334, 2005.
|
| |
17
|
D. Kim, J. Eom, and C. Park. L4oprof: a performance-monitoring-unit-based software-profiling framework for the l4 microkernel. SIGOPS Operating System Review, 41(4):69--76, 2007.
|
| |
18
|
M. Lajolo, M. Lazarescu, and A. Sangiovanni-Vincentelli. A Compilation-Based Software Estimation Scheme for Hardware/Software Co-Simulation. In CODES '99: Proceedings of the seventh International Workshop on Hardware/Software Codesign, pages 85--89, 1999.
|
| |
19
|
C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In CGO '04: Proceedings of the International Symposium on Code Generation and Optimization, page 75, 2004.
|
| |
20
|
M. T. Lazarescu, J. R. Bammi, E. Harcourt, L. Lavagno, and M. Lajolo. Compilation-Based Software Performance Estimation for System Level Design. In HLDVT '00: Proceedings of the IEEE International High-Level Validation and Test Workshop, page 167, 2000.
|
| |
21
|
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In PLDI '05: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 190--200, 2005.
|
| |
22
|
T. Meyerowitz, A. Sangiovanni-Vincentelli, M. Sauermann, and D. Langen. Source-Level Timing Annotation and Simulation for A Heterogeneous Multiprocessor. In DATE '08, pages 276--279, 2008.
|
| |
23
|
P. Mishra and N. Dutt. Processor Description Languages, ISBN: 1875-9661. Morgan Kaufmann Publishers Inc., 2008.
|
| |
24
|
T. Moseley, A. Shye, V. J. Reddi, D. Grunwald, and R. Peri. Shadow Profiling: Hiding Instrumentation Costs with Parallelism. In CGO '07, 2007.
|
| |
25
|
N. Nethercote and J. Seward. Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation. In PLDI '07, pages 89--100, 2007.
|
| |
26
|
A. Nohl, G. Braun, O. Schliebusch, R. Leupers, H. Meyr, and A. Hoýmann. A Universal Technique for Fast and Flexible Instruction-Set Architecture Simulation. In DAC '02, 2002.
|
| |
27
|
S. Pees, A. Hoýmann, V. Zivojnovic, and H. Meyr. LISA -- Machine Description Language for Cycle-Accurate Models of Programmable DSP Architectures. In DAC '99, 1999.
|
| |
28
|
J. V. Praet, D. Lanneer, W. Geurts, and G. Goossens. nML: A Structural Processor Modeling Language for Retargetable Compilation and ASIP Design. Processor Description Languages, pages 65--94, 2008.
|
| |
29
|
W. Qin, J. D'Errico, and X. Zhu. A Multiprocessing Approach to Accelerate Retargetable and Portable Dynamic-compiled Instruction-set Simulation. In CODES+ISSS '06: Proceeding of Conference on Hardware/Software Codesign and System Synthesis, 2006.
|
| |
30
|
M. Reshadi, P. Mishra, and N. Dutt. Instruction Set Compiled Simulation: A Technique for Fast and Flexible Instruction Set Simulation. In DAC '03, 2003.
|
| |
31
|
A. Sahu, M. Balakrishnan, and P. R. Panda. A Generic Platform for Estimation of Multi-threaded Program Performance on Heterogeneous Multiprocessors. In DATE '09, 2009.
|
| |
32
|
J. Schnerr, O. Bringmann, A. Viehl, and W. Rosenstiel. High-Performance Timing Simulation of Embedded Software. In DAC '08, pages 290--295, 2008.
|
| |
33
|
T. Sherwood, E. Perelman, G. Hamerly, S. Sair, and B. Calder. Discovering and Exploiting Program Phases. IEEE Micro, 2003.
|
| |
34
|
A. Srivastava and A. Eustace. ATOM: A System for Building Customized Program Analysis Tools. In PLDI '94, pages 196--205, 1994.
|
| |
35
|
Trimaran. www.trimaran.org.
|
| |
36
|
K. Vaswani, M. J. Thazhuthaveetil, and Y. N. Srikant. A Programmable Hardware Path Profiler. In CGO '05, pages 217--228, 2005.
|
| |
37
|
M. T. Yourst. PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator. In ISPASS '07, 2007.
|
| |
38
|
J. Zhu and D. D. Gajski. A Retargetable, Ultra-Fast Instruction Set Simulator. In DATE '99, 1999.
|
|