|
ABSTRACT
Software has spent the bounty of Moore's law by solving harder problems and exploiting abstractions, such as high-level languages, virtual machine technology, binary rewriting, and dynamic analysis. Abstractions make programmers more productive and programs more portable, but usually slow them down. Since Moore's law is now delivering multiple cores instead of faster processors, future systems must either bear a relatively higher cost for abstractions or use some cores to help tolerate abstraction costs. This paper presents the design, implementation, and evaluation of a novel concurrent, configurable dynamic analysis framework that efficiently utilizes multicore cache architectures. It introduces Cache-friendly Asymmetric Buffering (CAB), a lock-free ring-buffer that implements efficient communication between application and analysis threads. We guide the design and implementation of our framework with a model of dynamic analysis overheads. The framework implements exhaustive and sampling event processing and is analysis-neutral. We evaluate the framework with five popular and diverse analyses, and show performance improvements even for lightweight, low-overhead analyses. Efficient inter-core communication is central to high performance parallel systems and we believe the CAB design gives insight into the subtleties and difficulties of attaining it for dynamic analysis and other parallel software.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
B. Alpern, D. Attanasio, J. J. Barton, M. G. Burke, P.Cheng, J.-D. Choi, A. Cocchi, S. J. Fink, D. Grove, M. Hind, S. F. Hummel, D. Lieber, V. Litvinov, M. Mergen, T. Ngo, J. R. Russell, V. Sarkar, M. J. Serrano, J. Shepherd, S. Smith, V. C. Sreedhar, H. Srinivasan, and J. Whaley. The Jalapeno virtual machine. IBM System Journal, 39(1):211--238, Feb. 2000.
|
| |
2
|
M. Arnold and D. Grove. Collecting and exploiting high-accuracy call graph profiles in virtual machines. In International Symposium on Code Generation and Optimization, pages 51--62, San Jose, CA, Mar. 2005.
|
| |
3
|
M. Arnold and B. G. Ryder. A framework for reducing the cost of instrumented code. In ACM Conference on Programming Language Design and Implementation, pages 168--179, Snowbird, UT, June 2001.
|
| |
4
|
T. Ball and J. R. Larus. Efficient path profiling. In ACM/IEEE International Symposium on Microarchitecture, pages 46--57, Paris, France, Dec. 1996.
|
| |
5
|
S. M. Blackburn and K. S. McKinley. Immix: A mark-region garbage collector with space efficiency, fast collection, and mutator locality. In ACM Conference on Programming Language Design and Implementation, pages 22--32, Tuscon, AZ, June 2008.
|
| |
6
|
S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovic, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 83--89, Portland, OR, Oct. 2006.
|
| |
7
|
S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovic, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo Benchmarks: Java benchmarking development and analysis (extended version). Technical Report TR-CS-06-01, Dept. of Computer Science, Australian National University, 2006. http://www.dacapobench.org.
|
| |
8
|
M. D. Bond and K. S. McKinley. Continuous path and edge profiling. In ACM/IEEE International Symposium on Microarchitecture, pages 130--140, Barcelona, Spain, Nov. 2005.
|
| |
9
|
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Supercomputing, pages 1--13, Article 42, 2000.
|
| |
10
|
D. Bruening. Efficient, Transparent, and Comprehensive Runtime Code Manipulation. PhD thesis, Massachusetts Institute of Technology, 2004.
|
| |
11
|
J. Chow, T. Garfinkel, and P. M. Chen. Decoupling dynamic program analysis from execution in virtual environments. In USENIX Annual Technical Conference, pages 1--14, Boston, MA, 2008.
|
| |
12
|
K. Gharachorloo and P. B. Gibbons. Detecting violations of sequential consistency. In ACM Symposium on Parallel Algorithms and Architectures, pages 316--326, Hilton Head, SC, 1991.
|
| |
13
|
J. Giacomoni, T. Moseley, and M. Vachharajani. FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue. In ACM Symposium on Principles and Practice of Parallel Programming, pages 43--52, Salt Lake City, UT, 2008.
|
| |
14
|
J. Ha, C. J. Rossbach, J. V. Davis, I. Roy, H. E. Ramadan, D. E. Porter, D. L. Chen, and E.Witchel. Improved error reporting for software that uses black-box components. In ACM Conference on Programming Language Design and Implementation, pages 101--111, San Diego, CA, 2007.
|
| |
15
|
J. Ha, M. Arnold, S. M. Blackburn, and K. S. McKinley. A concurrent dynamic analysis framework for multicore hardware. Technical Report TR-09-24, The University of Texas at Austin, 2009.
|
| |
16
|
S. Hangal and M. S. Lam. Tracking down software bugs using automatic anomaly detection. In International Conference on Software Engineering, pages 291--301, Orlando, FL, 2002.
|
| |
17
|
M. Herlihy. Wait-free synchronization. ACM Transactions on Programming Language Systems, 13(1):124--149, 1991.
|
| |
18
|
M. Hirzel and T. Chilimbi. Bursty tracing: A framework for lowoverhead temporal profiling. In ACM Workshop on Feedback-Directed and Dynamic Optimization, pages 117--126, December 2001.
|
| |
19
|
M. S. Lam, M. Martin, B. Livshits, and J. Whaley. Securing web applications with static and dynamic information flow tracking. In ACM Workshop Partial Evaluation and Semantics-Based Program Manipulation, pages 3--12, San Francisco, CA, 2008.
|
| |
20
|
L. Lamport. Specifying concurrent program modules. ACM Transactions on Programming Language Systems, 5(2):190--222, 1983.
|
| |
21
|
W. Lin, S. K. Reinhardt, and D. Burger. Reducing DRAM latencies with an integrated memory hierarchy design. In IEEE International Symposium on High Performance Computer Architecture, pages 302--312, Nuevo Leone, Mexico, Jan. 2001.
|
| |
22
|
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S.Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In ACM Conference on Programming Language Design and Implementation, pages 190--200, Chicago, IL, 2005.
|
| |
23
|
M. Martin, B. Livshits, and M. S. Lam. Finding application errors and security flaws using PQL: a Program Query Language. In ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 365--383, San Diego, CA, 2005.
|
| |
24
|
T. Moseley, A. Shye, V. J. Reddi, D. Grunwald, and R. Peri. Shadow Profiling: Hiding instrumentation costs with parallelism. In International Symposium on Code Generation and Optimization, pages 198--208, Washington, DC, 2007.
|
| |
25
|
N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In ACM Conference on Programming Language Design and Implementation, pages 89--100, San Diego, CA, 2007.
|
| |
26
|
M. Paleczny, C. Vick, and C. Click. The Java HotSpot server compiler. In Java Virtual Machine Research and Technology Symposium, Monterey, CA, April 2001. Sun Microsystems.
|
| |
27
|
M. Pettersson. Linux Intel/x86 performance counters, 2003. http://user.it.uu.se/mikpe/linux/perfctr/.
|
| |
28
|
R. Shetty, M. Kharbutli, Y. Solihin, and M. Prvulovic. Heap-Mon: a helper-thread approach to programmable, automatic, and lowoverhead memory bug detection. IBM Journal of Research and Development, 50(2/3):261--275, 2006.
|
| |
29
|
SPECjvm98 Documentation. Standard Performance Evaluation Corporation, release 1.03 edition, March 1999.
|
| |
30
|
S. Wallace and K. Hazelwood. SuperPin: Parallelizing dynamic instrumentation for real-time performance. In International Symposium on Code Generation and Optimization, pages 209--220, Washington, DC, 2007.
|
| |
31
|
Z. Wang, K. S. McKinley, A. Rosenberg, and C. C. Weems. Using the compiler to improve cache replacement decisions. In International Conference on Parallel Architectures and Compilation Techniques, pages 199--208, Charlottesville, VA, Sept. 2002.
|
| |
32
|
C. Yuan, N. Lao, J.-R. Wen, J. Li, Z. Zhang, Y.-M. Wang, and W.-Y. Ma. Automated known problem diagnosis with event traces. In ACM European Conference on Computer Systems, pages 375--388, Leuven, Belgium, 2006.
|
| |
33
|
Q. Zhao, I. Cutcutache, andW.-F.Wong. PiPA: Pipelined profiling and analysis on multi-core systems. In International Symposium on Code Generation and Optimization, pages 185--194, Boston, MA, 2008.
|
| |
34
|
P. Zhou, F. Qin, W. Liu, Y. Zhou, and J. Torrellas. iWatcher: Efficient Architectural Support for Software Debugging. In ACM/IEEE International Symposium on Computer Architecture, pages 224--235, Munchen, Germany, June 2004.
|
|