|
ABSTRACT
Collective communication operations can dominate the cost of large-scale parallel algorithms. Image compositing in parallel scientific visualization is a reduction operation where this is the case. We present a new algorithm called Radix-k that in many cases performs better than existing compositing algorithms. It does so through a set of configurable parameters, the radices, that determine the number of communication partners in each message round. The algorithm embodies and unifies binary swap and direct-send, two of the best-known compositing methods, and enables numerous other configurations through appropriate choices of radices. While the algorithm is not tied to a particular computing architecture or network topology, the selection of radices allows Radix-k to take advantage of new supercomputer interconnect features such as multiporting. We show scalability across image size and system size, including both powers of two and nonpowers-of-two process counts.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Argonne Leadership Computing Facility. 2009. http://www.alcf.anl.gov/.
|
| |
2
|
J. Ahrens and J. Painter. Efficient sort-last rendering using compression-based image compositing. In Proc. Eurographics Parallel Graphics and Visualization Symposium 2008, Bristol, United Kingdom, 1998.
|
| |
3
|
Mike Barnett , Lance Shuler , Satya Gupta , David G. Payne , Robert van de Geijn , Jerrell Watts, Building a high-performance collective communication library, Proceedings of the 1994 conference on Supercomputing, p.107-116, December 1994, Washington, D.C., United States
|
| |
4
|
|
| |
5
|
M. Bernaschi and G. Iannello. Collective communication operations: Experimental results vs. theory. Concurrency, 10(5):359--386, 1998.
|
 |
6
|
Jehoshua Bruck , Ching-Tien Ho , Shlomo Kipnis , Derrick Weathersby, Efficient algorithms for all-to-all communications in multi-port message-passing systems, Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures, p.298-309, June 27-29, 1994, Cape May, New Jersey, United States
[doi> 10.1145/181014.181756]
|
| |
7
|
X. Cavin, C. Mion, and A. Fibois. Cots cluster-based sort-last rendering: Performance evaluation and pipelined implementation. In Proc. IEEE Visualization 2005, pages 111--118, 2005.
|
| |
8
|
|
| |
9
|
Ernie Chan , Marcel Heimlich , Avi Purkayastha , Robert van de Geijn, Collective communication: theory, practice, and experience: Research Articles, Concurrency and Computation: Practice & Experience, v.19 n.13, p.1749-1783, September 2007
[doi> 10.1002/cpe.v19:13]
|
 |
10
|
Ernie Chan , Robert van de Geijn , William Gropp , Rajeev Thakur, Collective communication on architectures that support simultaneous communication over multiple links, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, March 29-31, 2006, New York, New York, USA
[doi> 10.1145/1122971.1122975]
|
 |
11
|
|
 |
12
|
Greg Humphreys , Mike Houston , Ren Ng , Randall Frank , Sean Ahern , Peter D. Kirchner , James T. Klosowski, Chromium: a stream-processing framework for interactive rendering on clusters, ACM Transactions on Graphics (TOG), v.21 n.3, July 2002
|
 |
13
|
Sameer Kumar , Gabor Dozsa , Gheorghe Almasi , Philip Heidelberger , Dong Chen , Mark E. Giampapa , Michael Blocksome , Ahmad Faraj , Jeff Parker , Joseph Ratterman , Brian Smith , Charles J. Archer, The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer, Proceedings of the 22nd annual international conference on Supercomputing, June 07-12, 2008, Island of Kos, Greece
[doi> 10.1145/1375527.1375544]
|
| |
14
|
Sameer Kumar , Gabor Dozsa , Jeremy Berg , Bob Cernohous , Douglas Miller , Joseph Ratterman , Brian Smith , Philip Heidelberger, Architecture of the Component Collective Messaging Interface, Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, September 07-10, 2008, Dublin, Ireland
[doi> 10.1007/978-3-540-87475-1_10]
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
 |
20
|
|
| |
21
|
|
| |
22
|
J. Nonaka, K. Ono, and H. Miyachi. Theoretical and practical performance and scalability analyses of binary-swap image composition method on ibm blue gene/l. In Proc. 2008 International Workshop on Super Visualization (unpublished manuscript), Kos, Greece, 2008.
|
 |
23
|
|
| |
24
|
David Pugmire , Laura Monroe , Carolyn Connor Davenport , Andrew DuBois , David DuBois , Stephen Poole, NPU-Based Image Compositing in a Distributed Visualization System, IEEE Transactions on Visualization and Computer Graphics, v.13 n.4, p.798-809, July 2007
[doi> 10.1109/TVCG.2007.1026]
|
| |
25
|
R. Rabenseifner. New Optimized MPI Reduce Algorithm. 2004. http://www.hlrs.de/organization/par/services/models/mpi/myreduce.html.
|
| |
26
|
R. Rabenseifner and J. L. Traff. More efficient reduction algorithms for non-power-of-two number of processors in message-passing parallel systems. In Proc. EuroPVM/MPI 2004, pages 36--46, Budapest, Hungary, 2004.
|
| |
27
|
|
| |
28
|
|
| |
29
|
R. Thakur, R. Rabenseifner, and W. Gropp. Optimization of collective communication operations in mpich. International Journal of High Performance Computing Applications, 19:49--66, 2005.
|
| |
30
|
J. L. Traff. An improved algorithm for (non-commutative) reduce-scatter with an application. In Proc. EuroPVM/MPI 2005, pages 129--137, Sorrento, Italy, 2005.
|
| |
31
|
Jesper Larsson Träff , Andreas Ripke , Christian Siebert , Pavan Balaji , Rajeev Thakur , William Gropp, A Simple, Pipelined Algorithm for Large, Irregular All-gather Problems, Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, September 07-10, 2008, Dublin, Ireland
[doi> 10.1007/978-3-540-87475-1_16]
|
| |
32
|
|
|