| Prediction models for multi-dimensional power-performance optimization on many cores |
| Full text |
Pdf
(576 KB)
|
Source
|
PACT
archive
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
table of contents
Toronto, Ontario, Canada
SESSION: Middleware and runtime systems
table of contents
Pages 250-259
Year of Publication: 2008
ISBN:978-1-60558-282-5
|
|
Authors
|
|
Matthew Curtis-Maury
|
Virginia Tech, Blacksburg, VA, USA
|
|
Ankur Shah
|
Virginia Tech, Blacksburg, VA, USA
|
|
Filip Blagojevic
|
Virginia Tech, Blacksburg, VA, USA
|
|
Dimitrios S. Nikolopoulos
|
Virginia Tech, Blacksburg, VA, USA
|
|
Bronis R. de Supinski
|
Lawrence Livermore National Laboratory, Livermore, CA, USA
|
|
Martin Schulz
|
Lawrence Livermore National Laboratory, Livermore, CA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 36, Downloads (12 Months): 189, Citation Count: 0
|
|
|
ABSTRACT
Power has become a primary concern for HPC systems. Dynamic voltage and frequency scaling (DVFS) and dynamic concurrency throttling (DCT) are two software tools (or knobs) for reducing the dynamic power consumption of HPC systems. To date, few works have considered the synergistic integration of DVFS and DCT in performance-constrained systems, and, to the best of our knowledge, no prior research has developed application-aware simultaneous DVFS and DCT controllers in real systems and parallel programming frameworks. We present a multi-dimensional, online performance predictor, which we deploy to address the problem of simultaneous runtime optimization of DVFS and DCT on multi-core systems. We present results from an implementation of the predictor in a runtime library linked to the Intel OpenMP environment and running on an actual dual-processor quad-core system. We show that our predictor derives near-optimal settings of the power-aware program adaptation knobs that we consider. Our overall framework achieves significant reductions in energy (19% mean) and ED2 (40% mean), through simultaneous power savings (6% mean) and performance improvements (14% mean). We also find that our framework outperforms earlier solutions that adapt only DVFS or DCT, as well as one that sequentially applies DCT then DVFS. Further, our results indicate that prediction-based schemes for runtime adaptation compare favorably and typically improve upon heuristic search-based approaches in both performance and energy savings.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
M. Azimi, N. Cherukuri, D. Jayashima, A. Kumar, P. Kundu, S. Park, I. Schoinas, and A. Vaidya. Integration Challenges and Tradeoffs for Tera-scale Architectures. Intel Technology Journal, August 2007.
|
| |
2
|
S. Browne , J. Dongarra , N. Garner , K. London , P. Mucci, A scalable cross-platform infrastructure for application performance tuning using hardware counters, Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), p.42-es, November 04-10, 2000, Dallas, Texas, United States
|
 |
3
|
|
| |
4
|
K. Chakraborty, P. Wells, and G. Sohi. A Case for an Over-provisioned Multicore System: Energy Efficient Processing of Multithreaded Programs. Technical Report TR-1607, Department of Computer Sciences, University of Wisconsin-Madison, 2007.
|
| |
5
|
|
 |
6
|
Matthew Curtis-Maury , James Dzierwa , Christos D. Antonopoulos , Dimitrios S. Nikolopoulos, Online power-performance adaptation of multithreaded programs using hardware event-based prediction, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
[doi> 10.1145/1183401.1183426]
|
 |
7
|
Bruno Diniz , Dorgival Guedes , Wagner Meira, Jr. , Ricardo Bianchini, Limiting the power consumption of main memory, Proceedings of the 34th annual international symposium on Computer architecture, June 09-13, 2007, San Diego, California, USA
|
| |
8
|
|
| |
9
|
|
| |
10
|
C. Huang, O. Lawlor, and L. Kale. Adaptive MPI. In Proc. of the 16th International Workshop on Languages and Compilers for Parallel Computing, LNCS 2948, 2003.
|
| |
11
|
|
| |
12
|
|
| |
13
|
E. Joseph, A. Snell, C. G. Willard, S. Tichenor, D. Shaffer, and S. Conway. Council on Competitiveness Study of ISVs Serving the High Performance Computing Market. July 2005.
|
 |
14
|
|
| |
15
|
S. Kumar, H. Raj, K. Schwan, and I. Ganev. Re-architecting VMMs for Multicore Systems: The Sidecore Approach. In Proc. of the 2007 Workshop on the Interaction between Operating Systems and Computer Architecture, June 2007.
|
 |
16
|
|
 |
17
|
Benjamin C. Lee , David M. Brooks , Bronis R. de Supinski , Martin Schulz , Karan Singh , Sally A. McKee, Methods of inference and learning for performance modeling of parallel applications, Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, March 14-17, 2007, San Jose, California, USA
[doi> 10.1145/1229428.1229479]
|
| |
18
|
J. Li and J. Martinez. Dynamic Power-Performance Adaptation of Parallel Computation on Chip Multi-Processors. In Proc. of the International Symposium on High Performance Computer Architecture, February 2006.
|
| |
19
|
Y. Li and B. C. Lee and D. Brooks and Z. Hu and K. Skadron. CMP Design Space Exploration Subject to Physical Constraints. In Proc. of the IEEE International Symposium on High PerformanceComputer Architecture, February 2006.
|
| |
20
|
|
 |
21
|
|
| |
22
|
|
| |
23
|
V. Pallipadi and A. Starikovskiy. The Ondemand Governor. In Proc. of the Ottawa Linux Symposium, July 2006.
|
 |
24
|
Soyeon Park , Weihang Jiang , Yuanyuan Zhou , Sarita Adve, Managing energy-performance tradeoffs for multithreaded applications on multiprocessor architectures, Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, June 12-16, 2007, San Diego, California, USA
|
 |
25
|
Robert Springer , David K. Lowenthal , Barry Rountree , Vincent W. Freeh, Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, March 29-31, 2006, New York, New York, USA
[doi> 10.1145/1122971.1123006]
|
 |
26
|
Ankush Varma , Brinda Ganesh , Mainak Sen , Suchismita Roy Choudhury , Lakshmi Srinivasan , Jacob Bruce, A control-theoretic approach to dynamic voltage scheduling, Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, October 30-November 01, 2003, San Jose, California, USA
[doi> 10.1145/951710.951744]
|
 |
27
|
|
 |
28
|
Qiang Wu , Philo Juang , Margaret Martonosi , Douglas W. Clark, Formal online methods for voltage/frequency control in multiple clock domain microprocessors, Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, October 07-13, 2004, Boston, MA, USA
|
| |
29
|
Qiang Wu , Margaret Martonosi , Douglas W. Clark , Vijay Janapa Reddi , Dan Connors , Youfeng Wu , Jin Lee , David Brooks, Dynamic-Compiler-Driven Control for Microprocessor Energy and Performance, IEEE Micro, v.26 n.1, p.119-129, January 2006
[doi> 2006-02-17 02:00:03.800]
|
|