ACM Home Page
Please provide us with feedback. Feedback
Balancing processor loads and exploiting data locality in N-body simulations
Full text HtmlHtml (5 KB),  PdfPdf (222 KB)
Source Conference on High Performance Networking and Computing archive
Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM) table of contents
San Diego, California, United States
Article No. 43  
Year of Publication: 1995
ISBN:0-89791-816-9
Authors
Ioana Banicescu  Polytechnic University, Six MetroTech Center, Brooklyn, NY
Susan Flynn Hummel  Polytechnic University and IBM T. J. Watson Research Center
Sponsors
IEEE-CS : Computer Society
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 14,   Citation Count: 12
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/224170.224306
What is a DOI?

ABSTRACT

Although N-body simulation algorithms are amenable to parallelization, performance gains from execution on parallel machines are difficult to obtain due to load imbalances caused by irregular distributions of bodies. In general, there is a tension between balancing processor loads and maintaining locality, as the dynamic re-assignment of work necessitates access to remote data. Fractiling is a dynamic scheduling scheme that simultaneously balances processor loads and maintains locality by exploiting the self-similarity properties of fractals. Fractiling is based on a probabilistic analysis, and thus, accommodates load imbalances caused by predictable phenomena, such as irregular data, and unpredictable phenomena, such as data-access latencies. In experiments on a KSR1, performance of N-body simulation codes were improved by as much as 53% by fractiling. Performance improvements were obtained on uniform and nonuniform distributions of bodies, underscoring the need for a scheduling scheme that accommodates system induced variance. As the fractiling scheme is orthogonal to the N-body algorithm, we could use simple codes that discretize space into equal-size subrectangles (2-d) or subcubes (3-d) as the base algorithms.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. W. Appel, An Efficient Program for Many-Body Simulations, SIAM Journal of Computing 6, 1985.
 
2
J. Barnes and P. Hutt, A Hierarchical O(N log N) Force Calculation Algorithm, Nature 324, 1986.
 
3
 
4
J. A. Board, KSR implementation of Greengard's PFMA, email, Oct. 1994.
 
5
J. A. Board, J. W. Causey, J. F. Leathrum, A. Windemuth and K. Schulten, Accelerated Molecular Dynamic Simulations with the Parallel Fast Multipole Algorithm, Chem. Phys. Let. 198, 1992.
 
6
J. A. Board, Z. H. Halura, W. D. Elliot and W. T. Rakin, Scalable Variants of Multipole-based Algorithms for Molecular Dynamics Applications, Proc. of SIAM Conf. on Parallel Processing for Scientific Computing pp 295-300, Feb. 1995.
 
7
M.D. Durand, T. Montaut, L. Kervella and W. Jalby, Impact of Memory Contention on Task Duration in Self-Scheduled Programs, Int. Conf. on Parallel Processing I258-I262, Aug. 1993.
 
8
L. E. Flynn and S. Flynn Hummel, The Mathematical Foundations of the Factoring Scheduling Method, IBM Research Report RC18462, Oct. 1992.
9
 
10
 
11
S. Flynn Hummel, D. Kimelman and E. Schonberg, Experience with Program Visualization in Tuning Parallel Loop Scheduling, Proc. of HICSS- 25 pp. II275-II286, Jan. 1992.
 
12
S. Flynn Hummel, Fractiling: A Method for Scheduling Parallel Loops on NUMA Machines, IBM Research Report RC18958, June 1993.
 
13
S. Flynn Hummel, C. Wang, and J. Wein, Simulations of Fractiling in the logP Model, unpublished manuscript, 1994.
 
14
S. Flynn Hummel, I. Banicescu, C. Wang, and J. Wein, On the Scalability of Dynamic Scheduling, to appear in Proc. Third Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers, May 1995.
 
15
 
16
D. C. Gray, Load Balancing the Parallel Fast Multipole Algorithm, Duke University Tech Report 94-003, Apr. 1994.
 
17
L. Greengard, The Rapid Evaluation of Potential Fields in Particle Systems, ACM Press, 1987.
 
18
 
19
S. Frank, H. Burkhardt and J. Rothnie, The KSR1: Bridging the Gap between Shared Memory and MMPs, Proc. Compcon '93.
 
20
H. Li, S. Tandri, M. Stumm and K. C. Sevcik, Locality and Loop Scheduling on NUMA Machines, Int. Conf. on Parallel Processing pp. II140--II147, 1993.
 
21
 
22
G. M. Morton, A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing, IBM, Ottowa, Canada, 1966.
 
23
C. Polychronopoulos, Loop Coalescing: A Compiler Transformation for Parallel Machines, Int. Conf. on Parallel Processing pp. 235--242, 1987.
 
24
 
25
K. E. Schmidt and M. A. Lee, Implementing the Fast Multiple Method in Parallel, J. Stat. Phy. 63:1120, 1991.
26
 
27
S. Talla, C implementation of Greengard's FMA, email, June 1994.
28
 
29
30

CITED BY  12

Collaborative Colleagues:
Ioana Banicescu: colleagues
Susan Flynn Hummel: colleagues