|
ABSTRACT
Characterizing the performance of scientific applications is essential for effective code optimization, both by compilers and by high-level adaptive numerical algorithms. While maximizing power efficiency is becoming increasingly important in current high-performance architectures, little or no hardware or software support exists for detailed power measurements. Hardware counter-based power models are a promising method for guiding software-based techniques for reducing power. We present a component-based infrastructure for performance and power modeling of parallel scientific applications. The power model leverages on-chip performance hardware counters and is designed to model power consumption for modern multiprocessor and multicore systems. Our tool infrastructure includes application components as well as performance and power measurement and analysis components. We collect performance data using the TAU performance component and apply the power model in the performance and power analysis of a PETSc-based parallel fluid dynamics application by using the PerfExplorer component.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
W. K. Anderson , W. D. Gropp , D. K. Kaushik , D. E. Keyes , B. F. Smith, Achieving high sustained performance in an unstructured mesh CFD application, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p.69-es, November 14-19, 1999, Portland, Oregon, United States
[doi> 10.1145/331532.331600]
|
 |
2
|
Matthew Arnold , Stephen Fink , David Grove , Michael Hind , Peter F. Sweeney, Adaptive optimization in the Jalapeño JVM, Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, p.47-65, October 2000, Minneapolis, Minnesota, United States
|
| |
3
|
|
| |
4
|
|
| |
5
|
S. Balay, K. Buschelman, V. Eijkhout, W. D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, B. F. Smith, and H. Zhang. PETSc users manual. Technical Report ANL-95/11 -- Revision 2.3.3, Argonne National Laboratory, 2007. http://www.mcs.anl.gov/petsc.
|
| |
6
|
|
| |
7
|
Benjamin A. Allan , Robert Armstrong , David E. Bernholdt , Felipe Bertrand , Kenneth Chiu , Tamara L. Dahlgren , Kostadin Damevski , Wael R. Elwasif , Thomas G. W. Epperly , Madhusudhan Govindaraju , Daniel S. Katz , James A. Kohl , Manoj Krishnan , Gary Kumfert , J. Walter Larson , Sophia Lefantzi , Michael J. Lewis , Allen D. Malony , Lois C. Mclnnes , Jarek Nieplocha , Boyana Norris , Steven G. Parker , Jaideep Ray , Sameer Shende , Theresa L. Windus , Shujia Zhou, A Component Architecture for High-Performance Scientific Computing, International Journal of High Performance Computing Applications, v.20 n.2, p.163-202, May 2006
[doi> 10.1177/1094342006064488]
|
| |
8
|
W. Bircher and L. John. Complete system power estimation: A trickle-down approach based on performance events. In ISPASS 2007: IEEE International Symposium on Performance Analysis of Systems & Software, pages 158--168, April 25--27 2007.
|
 |
9
|
W. L. Bircher , M. Valluri , J. Law , L. K. John, Runtime identification of microprocessor energy saving opportunities, Proceedings of the 2005 international symposium on Low power electronics and design, August 08-10, 2005, San Diego, CA, USA
[doi> 10.1145/1077603.1077668]
|
 |
10
|
C. Brandolese , W. Fornaciari , F. Salice , D. Sciuto, An instruction-level functionally-based energy estimation model for 32-bits microprocessors, Proceedings of the 37th conference on Design automation, p.346-351, June 05-09, 2000, Los Angeles, California, United States
[doi> 10.1145/337292.337437]
|
 |
11
|
|
| |
12
|
V. Bui, O. Hernandez, B. Chapman, R. Kufrin, D. Tafti, and P. Gopalkrishnan. Towards an implementation of the OpenMP collector API. In PARCO, 2007.
|
| |
13
|
|
| |
14
|
Common Component Architecture (CCA) Forum. http://www.cca-forum.org/.
|
| |
15
|
|
| |
16
|
D. Keyes (PI). Terascale Optimal PDE Simulations (TOPS) Center. http://tops-scidac.org/, 2006.
|
 |
17
|
|
| |
18
|
B. Fryxell, K. Olson, P. Ricker, and et al. FLASH: An adaptive-mesh hydrodynamics code for modeling astrophysical thermonuclear flashes. Astrophys. J. Suppl., pages 273--334, 2000.
|
| |
19
|
D. Griswold. The Java HotSpot virtual machine architecture, 1998.
|
| |
20
|
O. Hernandez, F. Song, B. Chapman, J. Dongarra, B. Mohr, S. Moore, and F. Wolf. Instrumentation and compiler optimizations for MPI/OpenMP applications. In International Workshop on OpenMP (IWOMP 2006), 2006.
|
 |
21
|
Patrick Hicks , Matthew Walnock , Robert Michael Owens, Analysis of power consumption in memory hierarchies, Proceedings of the 1997 international symposium on Low power electronics and design, p.239-242, August 18-20, 1997, Monterey, California, United States
[doi> 10.1145/263272.263342]
|
| |
22
|
Kevin A. Huck , Oscar Hernandez , Van Bui , Sunita Chandrasekaran , Barbara Chapman , Allen D. Malony , Lois Curfman McInnes , Boyana Norris, Capturing performance knowledge for automated analysis, Proceedings of the 2008 ACM/IEEE conference on Supercomputing, November 15-21, 2008, Austin, Texas
|
| |
23
|
|
| |
24
|
|
| |
25
|
|
| |
26
|
|
| |
27
|
M. Itzkowitz. The sun studio performance tools. Technical report, Sun Microsystems Inc., November 2005.
|
| |
28
|
S. Jarp. A methodology for using the itanium-2 performance counters for bottleneck analysis. Technical report, HP Labs, August 2002.
|
 |
29
|
I. Kadayif , M. Kandemir , G. Chen , N. Vijaykrishnan , M. J. Irwin , A. Sivasubramaniam, Compiler-directed high-level energy estimation and optimization, ACM Transactions on Embedded Computing Systems (TECS), v.4 n.4, p.819-850, November 2005
[doi> 10.1145/1113830.1113835]
|
 |
30
|
|
| |
31
|
R. Kufrin. PerfSuite: An accessible, open source, performance analysis environment for Linux. In 6th International Conference on Linux Clusters (LCI-2005), Chapel Hill, NC, April 2005.
|
 |
32
|
Sheayun Lee , Andreas Ermedahl , Sang Lyul Min, An Accurate Instruction-Level Energy Consumption Model for Embedded RISC Processors, Proceedings of the ACM SIGPLAN workshop on Languages, compilers and tools for embedded systems, p.1-10, August 2001, Snow Bird, Utah, United States
|
 |
33
|
|
| |
34
|
C. Liao, O. Hernandez, B. Chapman, W. Chen, and W. Zheng. OpenUH: An optimizing, portable OpenMP compiler. In Proceedings of the 12th Workshop on Compilers for Parallel Computers, 2006.
|
| |
35
|
A. Malony , S. Shende , N. Trebon , J. Ray , R. Armstrong , C. Rasmussen , M. Sottile, Performance technology for parallel and distributed component software: Research Articles, Concurrency and Computation: Practice & Experience, v.17 n.2-4, p.117-141, February 2005
[doi> 10.1002/cpe.v17:2/4]
|
| |
36
|
L. C. McInnes, J. Ray, R. Armstrong, T. L. Dahlgren, A. Malony, B. Norris, S. Shende, J. P. Kenny, and J. Steensland. Computational quality of service for scientific CCA applications: Composition, substitution, and reconfiguration. Technical Report ANL/MCS-P1326-0206, Argonne National Laboratory, Feb 2006.
|
 |
37
|
|
| |
38
|
|
| |
39
|
B. Mohr and F. Wolf. KOJAK -- a tool set for automatic performance analysis of parallel applications. In Proc. of the European Conference on Parallel Computing (EuroPar), pages 1301--1304, 2003.
|
| |
40
|
W. E. Nagel, A. Arnold, M. Weber, H.-C. Hoppe, and K. Solchenbach. VAMPIR: Visualization and analysis of MPI resources. SUPERCOMPUTER, 12(1):69--80, January 1996.
|
| |
41
|
J. Nocedal and S. J. Wright. Numerical Optimization. Springer-Verlag, 1999.
|
| |
42
|
B. Norris, J. Ray, R. Armstrong, L. C. McInnes, D. E. Bernholdt, W. R. Elwasif, A. D. Malony, and S. Shende. Computational quality of service for scientific components. In Proc. Int. Symp. on Component-Based Software Engineering, Edinburgh, Scotland, 2004.
|
| |
43
|
Pallas GmbH. Vampirtrace 2.0 Installation and User's Guide, November 1999.
|
| |
44
|
Perfsuite. http://perfsuite.ncsa.uiuc.edu/.
|
| |
45
|
V. Pillet, J. Labarta, T. Cortes, and S. Girona. PARAVER: A Tool to Visualize and Analyze Parallel Code. In P. Nixon, editor, Proceedings of WoTUG-18: Transputer and occam Developments, pages 17--31, March 1995.
|
 |
46
|
Gang Qu , Naoyuki Kawabe , Kimiyoshi Usami , Miodrag Potkonjak, Function-level power estimation methodology for microprocessors, Proceedings of the 37th conference on Design automation, p.810-813, June 05-09, 2000, Los Angeles, California, United States
[doi> 10.1145/337292.337786]
|
| |
47
|
|
| |
48
|
S. Shende, A. Malony, A. Morris, S. Parker, and J. de St. Germain. Performance evaluation of adaptive scientific applications using TAU. In Parallel Computational Fluid Dynamics -- Theory and Applications, pages 421--428. Elsevier B.V., 2006.
|
| |
49
|
|
| |
50
|
B. Smith et al. TOPS Solver Components. http://www-unix.mcs.anl.gov/scidac-tops/solver-components/tops.html, 2005.
|
 |
51
|
|
 |
52
|
|
| |
53
|
|
| |
54
|
D. K. Tafti. Genidlest - a scalable parallel computational tool for simulating complex turbulent flows. In Proceedings of the ASME Fluids Engineering Division, November 2001.
|
| |
55
|
D. K. Tafti and G. Wang. Application of embedded parallelism to large scale computations of complex industrial flows. In Proceedings of the ASME Fluids Engineering Division, pages 123--130, Anaheim, CA., November 1998. ASME-IMECE.
|
| |
56
|
X. Z. Tang, G. Y. Fu, S. C. Jardin, L. L. Lowe, W. Park, and H. R. Strauss. Resistive magnetohydrodynamics simulation of fusion plasmas. Technical Report PPPL--3532, Princeton Plasma Physics Laboratory, 2001.
|
| |
57
|
|
| |
58
|
The R Foundation for Statistical Computing. R Project for Statistical Computing. http://www.r-project.org, 2007.
|
| |
59
|
|
| |
60
|
|
| |
61
|
H. Truong and T. Fahringer. SCALEA: A performance analysis tool for parallel programs. Concurrency and Computation: Practice and Experience, 15(11--12):1001--1025, 2003.
|
| |
62
|
M. Valluri and L. John. Is compiling for performance == compiling for power, 2001.
|
 |
63
|
|
 |
64
|
N. Vijaykrishnan , M. Kandemir , M. J. Irwin , H. S. Kim , W. Ye, Energy-driven integrated hardware-software optimizations using SimplePower, Proceedings of the 27th annual international symposium on Computer architecture, p.95-106, June 2000, Vancouver, British Columbia, Canada
|
 |
65
|
|
| |
66
|
|
| |
67
|
|
 |
68
|
|
 |
69
|
W. Ye , N. Vijaykrishnan , M. Kandemir , M. J. Irwin, The design and use of simplepower: a cycle-accurate energy estimation tool, Proceedings of the 37th conference on Design automation, p.340-345, June 05-09, 2000, Los Angeles, California, United States
[doi> 10.1145/337292.337436]
|
 |
70
|
Kamen Yotov , Xiaoming Li , Gang Ren , Michael Cibulskis , Gerald DeJong , Maria Garzaran , David Padua , Keshav Pingali , Paul Stodghill , Peng Wu, A comparison of empirical and model-driven optimization, Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, June 09-11, 2003, San Diego, California, USA
|
| |
71
|
Y. Zhang, R. Chen, W. Ye, and M. Irwin. System level interconnect power modeling. In Eleventh Annual IEEE International ASIC Conference, 1998.
|
CITED BY
|
|
Kevin A. Huck , Oscar Hernandez , Van Bui , Sunita Chandrasekaran , Barbara Chapman , Allen D. Malony , Lois Curfman McInnes , Boyana Norris, Capturing performance knowledge for automated analysis, Proceedings of the 2008 ACM/IEEE conference on Supercomputing, November 15-21, 2008, Austin, Texas
|
|