|
ABSTRACT
One general avenue to obtain optimized performance on large and complex systems is to approach optimization from a global perspective of the complete system in a customized manner for each application, i.e., application-centric optimization. Lately, there have been encouraging developments in reconfigurable operating systems and hardware that will enable customized optimization. For example, machines built with PIM's and FPGA's can be quickly reconfigured to better fit a certain application and operating systems, such as IBM's K42, can have their services customized to fit the needs and characteristics of an application. While progress in operating system and hardware and hardware has made re-configuration possible, we still need strategies and techniques to exploit them for improved application performance.In this paper, we describe the approach we are using in our smart application (SMARTAPPS) project. In the SMARTAPP executable, the compiler embeds most run-time system services and a feedback loop to monitor performance and trigger run-time adaptations. At run-time, after incorporating the code's input and determining the system's state, the SMARTAPP performs an instance specific optimization. During execution, the application continually monitors its performance and the available resources to determine if restructuring should occur. The framework includes mechanisms for performing the actual restructuring at various levels including: algorithmic adaptation, tuning reconfigurable OS services (scheduling policy, page size, etc.), and system configuration (e.g., number of processors). This paper concentrates on the techniques for providing customized system services for communication, thread scheduling, memory management, and performance monitoring and modeling.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
The CHARM++ Programming Language Manual. http://charm.cs.uiuc.edu, 2000.
|
| |
2
|
P. An, et al. STAPL: A standard template adaptive parallel C++ library. In Proc. of the Int. Workshop on Advanced Compiler Technology for High Performance and Embedded Processors, Bucharest, Romania, Jul. 2001.
|
| |
3
|
J. Appavoo , M. Auslander , M. Butrico , D. M. da Silva , O. Krieger , M. F. Mergen , M. Ostrowski , B. Rosenburg , R. W. Wisniewski , J. Xenidis, Experience with K42, an open-source, Linux-compatible, scalable operating-system kernel, IBM Systems Journal, v.44 n.2, p.427-440, January 2005
|
| |
4
|
|
| |
5
|
|
 |
6
|
|
| |
7
|
|
| |
8
|
|
 |
9
|
A. Krishnamurthy , D. E. Culler , A. Dusseau , S. C. Goldstein , S. Lumetta , T. von Eicken , K. Yelick, Parallel programming in Split-C, Proceedings of the 1993 ACM/IEEE conference on Supercomputing, p.262-273, December 1993, Portland, Oregon, United States
[doi> 10.1145/169627.169724]
|
| |
10
|
|
 |
11
|
Ewa Deelman , Aditya Dube , Adolfy Hoisie , Yong Luo , Richard L. Oliver , David Sundaram-Stukel , Harvey Wasserman , Vikram S. Adve , Rajive Bagrodia , James C. Browne , Elias Houstis , Olaf Lubeck , John Rice , Patricia J. Teller , Mary K. Vernon, Poems: end-to-end performance design of large parallel adaptive computational systems, Proceedings of the 1st international workshop on Software and performance, p.18-30, October 12-16, 1998, Santa Fe, New Mexico, United States
[doi> 10.1145/287318.292468]
|
| |
12
|
|
 |
13
|
Matteo Frigo , Charles E. Leiserson , Keith H. Randall, The implementation of the Cilk-5 multithreaded language, Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, p.212-223, June 17-19, 1998, Montreal, Quebec, Canada
|
| |
14
|
Madhusudhan Govindaraju , Aleksander Slominski , Venkatesh Choppella , Randall Bramley , Dennis Gannon, Requirements for and evaluation of RMI protocols for scientific computing, Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), p.61-es, November 04-10, 2000, Dallas, Texas, United States
|
| |
15
|
|
 |
16
|
|
| |
17
|
A. Jula and L. Rauchwerger. Defero memory allocator: A semantic driven memory allocator. Tech. Rep. TR-JR-05, Parasol Lab, Dept. of Computer Science, Texas A&M Univ., Nov. 2005.
|
 |
18
|
Laxmikant V. Kale , Sanjeev Krishnan, CHARM++: a portable concurrent object oriented system based on C++, Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications, p.91-108, September 26-October 01, 1993, Washington, D.C., United States
|
| |
19
|
L. Kale and S. Krishnan. Charm++: Parallel programming with message-driven objects. In Gregory Wilson and Paul Lu, editors, Parallel Programming using C++, pp. 175--213. Cambridge, MA: MIT Press, 1996.
|
 |
20
|
|
| |
21
|
|
| |
22
|
M. Olszewski and M. Voss. Proc. of the International Conference on Parallel and Distributed Processing Techniques and Applications, June 21--24, 2004. In Hamid R. Arabnia, editor, PDPTA. CSREA Press, 2004.
|
| |
23
|
M. Puschl et al. SPIRAL: Code Generation for DSP Transforms. Proceedings of the IEEE special issue on Program Generation, Optimization, and Adaptation, 93(2):232--275, 2005.
|
| |
24
|
|
| |
25
|
|
| |
26
|
|
| |
27
|
J. Reynders. Pooma: A framework for scientific simulation on parallel architectures, 1996. In Wilson, G., Lu, P. (Eds.): Parallel Programming using C++. M.I.T. Press, pp. 553--594, 1996.
|
| |
28
|
J. R. Rice. The algorithm selection problem. Advances in Computers, 15:65--118, 1976.
|
| |
29
|
|
| |
30
|
T. Sheffler. A portable MPI-based parallel vector template library. Tech. Rep. RIACS-TR-95.04, Research Inst. for Advanced Computer Science, March 1995.
|
 |
31
|
Nathan Thomas , Gabriel Tanase , Olga Tkachyshyn , Jack Perdue , Nancy M. Amato , Lawrence Rauchwerger, A framework for adaptive algorithm selection in STAPL, Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, June 15-17, 2005, Chicago, IL, USA
[doi> 10.1145/1065944.1065981]
|
| |
32
|
D. Vallejo, C. V. Jones, and N. M. Amato. An adaptive framework for 'single shot' motion planning. In Proc. IEEE Int. Conf. Intel. Rob. Syst. (IROS), pp. 1722--1727, 2000.
|
 |
33
|
Thorsten von Eicken , David E. Culler , Seth Copen Goldstein , Klaus Erik Schauser, Active messages: a mechanism for integrated communication and computation, Proceedings of the 19th annual international symposium on Computer architecture, p.256-266, May 19-21, 1992, Queensland, Australia
|
| |
34
|
|
| |
35
|
R. Whaley, A. Petitet, and J. Dongarra. Automated empirical optimizations of software and the ATLAS project. Parallel Computing, 27(1-2):3--35, Jan. 2001.
|
| |
36
|
|
 |
37
|
|
| |
38
|
|
| |
39
|
M. Morales, L. Tapia, R. Pearce, S. Rodriguez, and N. M. Amato. A machine learning approach for feature-sensitive motion planning. In Proc. Int. Workshop on Algorithmic Foundations of Robotics (WAFR), Utrecht/Zeist, The Netherlands, July 2004.
|
| |
40
|
M. A. Morales et. al. C-space subdivision and integration in feature-sensitive motion planning. In Proc. IEEE Int. Conf. Robot. Autom. (ICRA), April 2005.
|
| |
41
|
R. Wisniewski, et. al. Performance and Environment Monitoring for Whole-System Characterization and Optimization in Proc. Conf. on Power/Performance interaction with Architecture, Circuits and Compilers 2004.
|
|