|
ABSTRACT
Writing portable programs that perform well on multiple platforms or for varying input sizes and types can be very difficult because performance is often sensitive to the system architecture, the run-time environment, and input data characteristics. This is even more challenging on parallel and distributed systems due to the wide variety of system architectures. One way to address this problem is to adaptively select the best parallel algorithm for the current input data and system from a set of functionally equivalent algorithmic options. Toward this goal, we have developed a general framework for adaptive algorithm selection for use in the Standard Template Adaptive Parallel Library (STAPL). Our framework uses machine learning techniques to analyze data collected by STAPL installation benchmarks and to determine tests that will select among algorithmic options at run-time. We apply a prototype implementation of our framework to two important parallel operations, sorting and matrix multiplication, on multiple platforms and show that the framework determines run-time tests that correctly select the best performing algorithm from among several competing algorithmic options in 86-100% of the cases studied, depending on the operation and the system.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
P. An, A. Jula, S. Rus, S. Saunders, T. Smith, G. Tanase, N. Thomas, N. Amato, and L. Rauchwerger. STAPL: A standard template adaptive parallel C++ library. In Proc. of the Intern. Workshop on Advanced Compiler Technology for High Performance and Embedded Processors (IWACT), Bucharest, Romania, July 2001.
|
| |
2
|
P. An, A. Jula, S. Rus, S. Saunders, T. Smith, G. Tanase, N. Thomas, N. Amato, and L. Rauchwerger. STAPL: An adaptive, generic parallel programming library for C++. In Workshop on Languages and Compilers for Parallel Computing (LCPC), Cumberland Falls, KY, August 2001.
|
 |
3
|
Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-Dusseau , David E. Culler , Joseph M. Hellerstein , David A. Patterson, High-performance sorting on networks of workstations, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.243-254, May 11-15, 1997, Tucson, Arizona, United States
|
 |
4
|
Guy E. Blelloch , Charles E. Leiserson , Bruce M. Maggs , C. Greg Plaxton , Stephen J. Smith , Marco Zagha, A comparison of sorting algorithms for the connection machine CM-2, Proceedings of the third annual ACM symposium on Parallel algorithms and architectures, p.3-16, July 21-24, 1991, Hilton Head, South Carolina, United States
[doi> 10.1145/113379.113380]
|
 |
5
|
Guy E. Blelloch , Jonathan C. Hardwick , Siddhartha Chatterjee , Jay Sipelstein , Marco Zagha, Implementation of a portable nested data-parallel language, Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.102-111, May 19-22, 1993, San Diego, California, United States
|
 |
6
|
|
| |
7
|
W.H. Burge. Sorting, trees, and measures of order. Information and Control, 1(3):181--197, 1958.
|
 |
8
|
|
| |
9
|
Jaeyoung Choi , Jack J. Dongarra , L. Susan Ostrouchov , Antoine P. Petitet , David W. Walker , R. Clint Whaley, Design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines, Scientific Programming, v.5 n.3, p.173-184, Fall 1996
|
| |
10
|
Jaeyoung Choi, Jack J. Dongarra, and David W. Walker. PUMMA: Parallel Universal Matrix Multiplication Algorithms on distributed memory concurrent computers. Concurrency: Practice and Experience, 6(7):543--570, 1994.
|
 |
11
|
|
| |
12
|
Edsger W. Dijkstra. Smoothsort, an alternative for sorting in situ. Science of Computer Programming, 1(3):223--233, 1982.
|
| |
13
|
|
| |
14
|
|
 |
15
|
|
| |
16
|
G. C. Fox and S. W. Otto. Matrix algorithms on a hypercube I: matrix multiplication.
|
| |
17
|
R. A. Van De Geijn and J. Watts. SUMMA: scalable universal matrix multiplication algorithm. Concurrency: Practice and Experience, 9(4):255--274, 1997.
|
 |
18
|
|
 |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
Marek Olszewski and Michael Voss. Proc. of the international conference on parallel and distributed processing techniques and applications, pdpta '04, june 21-24, 2004, las vegas, nevada, usa, volume 1. In Hamid R. Arabnia, editor, PDPTA. CSREA Press, 2004.
|
| |
26
|
|
| |
27
|
Of Signal Processing. SPIRAL: A generator for platform-adapted libraries.
|
| |
28
|
|
| |
29
|
|
| |
30
|
J. R. Rice. The algorithm selection problem. Advances in Computers, 15:65--118, 1976.
|
| |
31
|
D. Rumelhart, G. Hinton, and R. Williams. Learning internal representations by error propagation. IEEE Transactions on Parallel and Distributed Systems: Explorations in the Microstructure of Cognition, 1, 1986.
|
 |
32
|
|
| |
33
|
Steven Saunders, Nathan Thomas, Nancy Amato, and Lawrence Rauchwerger. Adaptive parallel sorting in the STAPL library. Technical Report TR01-005, Parasol Laboratory, Texas A&M University, November 2001.
|
| |
34
|
|
| |
35
|
|
| |
36
|
R. Clint Whaley, Antoine Petitet, and J. Dongarra. Automated empirical optimizations of software and the ATLAS project. Parallel Computing, 27(1-2):3--35, January 2001.
|
 |
37
|
|
| |
38
|
|
CITED BY 10
|
|
|
|
|
|
|
|
|
|
|
Gabriel Tanase , Mauro Bianco , Nancy M. Amato , Lawrence Rauchwerger, The STAPL pArray, Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture, p.73-80, September 16-16, 2007, Brasov, Romania
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jason Ansel , Cy Chan , Yee Lok Wong , Marek Olszewski , Qin Zhao , Alan Edelman , Saman Amarasinghe, PetaBricks: a language and compiler for algorithmic choice, ACM SIGPLAN Notices, v.44 n.6, June 2009
|
|
|
|
|