|
ABSTRACT
A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding development of techniques that yield performance across a range of current and future parallel machines. This paper offers a new parallel machine model, called LogP, that reflects the critical technology trends underlying parallel computers. it is intended to serve as a basis for developing fast, portable parallel algorithms and to offer guidelines to machine designers. Such a model must strike a balance between detail and simplicity in order to reveal important bottlenecks without making analysis of interesting problems intractable. The model is based on four parameters that specify abstractly the computing bandwidth, the communication bandwidth, the communication delay, and the efficiency of coupling communication and computation. Portable parallel algorithms typically adapt to the machine configuration, in terms of these parameters. The utility of the model is demonstrated through examples that are implemented on the CM-5.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
A. Aggarwal , A. K. Chandra , M. Snir, On communication latency in PRAM computations, Proceedings of the first annual ACM symposium on Parallel algorithms and architectures, p.11-21, June 18-21, 1989, Santa Fe, New Mexico, United States
[doi> 10.1145/72935.72937]
|
| |
2
|
|
| |
3
|
B. Alpem, L. Carter, E. Feig, and T. Selker. The Uniform Memory Hierarchy Model of Computation. Algorithmica, 1993. (to appear).
|
 |
4
|
|
 |
5
|
|
| |
6
|
G.E. Blelloch. Scans as Primitive Parallel Operations. In Proceedings of lnternational Conference on Parallel Processing, pages 355-362, 1987.
|
 |
7
|
|
| |
8
|
J.M. Cooley and J. W. Tukey. An algorithm for the machine calculation of complex Fourier series. Math. Comp, 19:297- 301, 1965.
|
| |
9
|
David E. Culler , Richard M. Karp , David A. Patterson , Abhijit Sahay , Klaus E Schauser , Eunice Santos , Ramesh Subramonian , Thorsten von Eicken, LogP: Towards a Realistic Model of Parallel Computation, University of California at Berkeley, Berkeley, CA, 1992
|
| |
10
|
W. et al Dally. The J-Machine: A Fine-Grain Concurrent Computer. In IFIP Congress, 1989.
|
| |
11
|
|
| |
12
|
Daniel Lenoski , James Laudon , Kourosh Gharachorloo , Wolf-Dietrich Weber , Anoop Gupta , John Hennessy , Mark Horowitz , Monica S. Lam, The Stanford Dash Multiprocessor, Computer, v.25 n.3, p.63-79, March 1992
[doi> 10.1109/2.121510]
|
 |
13
|
|
 |
14
|
|
| |
15
|
J. L. Hennessy. MIPS R4000 Overview. In Proc. of the 19th lnt'l Symposium on Computer Architecture, Gold Coast, Australia, May 1992.
|
| |
16
|
|
| |
17
|
IEEE. Symposium Record-- Hot Chips IV, August 1992.
|
 |
18
|
|
 |
19
|
Richard M. Karp , Michael Luby , Friedhelm Meyer auf der Heide, Efficient PRAM simulation on a distributed memory machine, Proceedings of the twenty-fourth annual ACM symposium on Theory of computing, p.318-326, May 04-06, 1992, Victoria, British Columbia, Canada
[doi> 10.1145/129712.129743]
|
| |
20
|
|
 |
21
|
Z. M. Kedem , K. V. Palem , P. G. Spirakis, Efficient robust parallel computations, Proceedings of the twenty-second annual ACM symposium on Theory of computing, p.138-148, May 13-17, 1990, Baltimore, Maryland, United States
[doi> 10.1145/100216.100231]
|
| |
22
|
C.U.Martel andA. Raghunathan.Asynchronous PRAMs with memory latency. Technical report, University of California, Davis, Division of Computer Science, 1991.
|
| |
23
|
|
 |
24
|
|
 |
25
|
|
| |
26
|
A.G. Ranade. How to emulate shared memory.in Proceedings of the 28th IEEE Annual Symposium on Foundations of Computer Science, pages 185-194, 1987.
|
| |
27
|
|
| |
28
|
|
| |
29
|
R. Subramonian. The influence of limited bandwidth on algorithm design and implementation. In Dartmouth Institute for Advanced Graduate Studies in Parallel Computation (DAGS/PC), June 1992.
|
 |
30
|
|
 |
31
|
Thorsten von Eicken , David E. Culler , Seth Copen Goldstein , Klaus Erik Schauser, Active messages: a mechanism for integrated communication and computation, Proceedings of the 19th annual international symposium on Computer architecture, p.256-266, May 19-21, 1992, Queensland, Australia
|
CITED BY 224
|
|
|
|
|
|
|
|
Sathish S. Vadhiyar , Graham E. Fagg , Jack Dongarra, Automatically tuned collective communications, Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), p.3-es, November 04-10, 2000, Dallas, Texas, United States
|
|
|
Remzi H. Arpaci , Andrea C. Dusseau , Amin M. Vahdat , Lok T. Liu , Thomas E. Anderson , David A. Patterson, The interaction of parallel and sequential workloads on a network of workstations, ACM SIGMETRICS Performance Evaluation Review, v.23 n.1, p.267-278, May 1995
|
|
|
David E. Culler , Richard M. Karp , David Patterson , Abhijit Sahay , Eunice E. Santos , Klaus Erik Schauser , Ramesh Subramonian , Thorsten von Eicken, LogP: a practical model of parallel computation, Communications of the ACM, v.39 n.11, p.78-85, Nov. 1996
|
|
|
|
|
|
|
|
|
Frank Dehne , Xiaotie Deng , Patrick Dymond , Andreas Fabri , Ashfaq A. Khokhar, A randomized parallel 3D convex hull algorithm for coarse grained multicomputers, Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures, p.27-33, June 24-26, 1995, Santa Barbara, California, United States
|
|
|
|
|
|
|
|
|
Mark W. Goudreau , Kevin Lang , Girija Narlikar , Satish B. Rao, BOS is boss: a case for bulk-synchronous object systems, Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures, p.115-125, June 27-30, 1999, Saint Malo, France
|
|
|
Micah Adler , Phillip B. Gibbons , Vijaya Ramachandran , Yossi Matias, Modeling parallel bandwidth: local vs. global restrictions, Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, p.94-105, June 23-25, 1997, Newport, Rhode Island, United States
|
|
|
|
|
|
|
|
|
Arvind Krishnamurthy , Klaus E. Schauser , Chris J. Scheiman , Randolph Y. Wang , David E. Culler , Katherine Yelick, Evaluation of architectural support for global address-based communication in large-scale parallel machines, ACM SIGOPS Operating Systems Review, v.30 n.5, p.37-48, Dec. 1996
|
|
|
|
|
|
Micah Adler , Wolfgang Dittrich , Ben Juurlink , Mirosław Kutyłowski , Ingo Rieping, Communication-optimal parallel minimum spanning tree algorithms (extended abstract), Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures, p.27-36, June 28-July 02, 1998, Puerto Vallarta, Mexico
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Vijaya Ramachandran , Brian Grayson , Michael Dahlin, Emulations between QSM, BSP, and LogP: a framework for general-purpose parallel algorithm design, Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms, p.957-958, January 17-19, 1999, Baltimore, Maryland, United States
|
|
|
|
|
|
|
|
|
|
|
|
Phillip B. Gibbons , Yossi Matias , Vijaya Ramachandran, Can shared-memory model serve as a bridging model for parallel computation?, Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, p.72-83, June 23-25, 1997, Newport, Rhode Island, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xiaotie Deng , Nian Gu , Tim Brecht , KaiCheng Lu, Preemptive scheduling of parallel jobs on multiprocessors, Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms, p.159-167, January 28-30, 1996, Atlanta, Georgia, United States
|
|
|
|
|
|
Y. Cho , M. Winslett , M. Subramaniam , Y. Chen , S. Kuo , K. E. Seamons, Exploiting local data in parallel array I/O on a practical network of workstations, Proceedings of the fifth workshop on I/O in parallel and distributed systems, p.1-13, November 17-17, 1997, San Jose, California, United States
|
|
|
|
|
|
Gianfranco Bilardi , Kieran T. Herley , Andrea Pietracaprina , Geppino Pucci , Paul Spirakis, BSP vs LogP, Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures, p.25-32, June 24-26, 1996, Padua, Italy
|
|
|
Albert Alexandrov , Mihai F. Ionescu , Klaus E. Schauser , Chris Scheiman, LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation, Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures, p.95-105, June 24-26, 1995, Santa Barbara, California, United States
|
|
|
|
|
|
|
|
|
Jaswinder Pal Singh , Edward Rothberg , Anoop Gupta, Modeling communication in parallel algorithms: a fruitful interaction between theory and systems?, Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures, p.189-199, June 27-29, 1994, Cape May, New Jersey, United States
|
|
|
Jehoshua Bruck , Ching-Tien Ho , Shlomo Kipnis , Derrick Weathersby, Efficient algorithms for all-to-all communications in multi-port message-passing systems, Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures, p.298-309, June 27-29, 1994, Cape May, New Jersey, United States
|
|
|
Mark Goudreau , Kevin Lang , Satish Rao , Torsten Suel , Thanasis Tsantilas, Towards efficiency and portability: programming with the BSP model, Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures, p.1-12, June 24-26, 1996, Padua, Italy
|
|
|
|
|
|
Bjoern Haake , Klaus E. Schauser , Chris Scheiman, Profiling a parallel language based on fine-grained communication, Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), p.17-es, January 01-01, 1996, Pittsburgh, Pennsylvania, United States
|
|
|
|
|
|
|
|
|
|
|
|
Andrew Sohn , Yuetsu Kodama , Jui Ku , Mitsuhisa Sato , Hirofumi Sakane , Hayato Yamana , Shuichi Sakai , Yoshinori Yamaguchi, Fine-grain multithreading with the EM-X multiprocessor, Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, p.189-198, June 23-25, 1997, Newport, Rhode Island, United States
|
|
|
|
|
|
|
|
|
Richard P. Martin , Amin M. Vahdat , David E. Culler , Thomas E. Anderson, Effects of communication latency, overhead, and bandwidth in a cluster architecture, ACM SIGARCH Computer Architecture News, v.25 n.2, p.85-97, May 1997
|
|
|
Yishay Mansour , Noam Nisan , Uzi Vishkin, Trade-offs between communication throughput and parallel time, Proceedings of the twenty-sixth annual ACM symposium on Theory of computing, p.372-381, May 23-25, 1994, Montreal, Quebec, Canada
|
|
|
Richard M. Karp , Abhijit Sahay , Eunice E. Santos , Klaus Erik Schauser, Optimal broadcast and summation in the LogP model, Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures, p.142-153, June 30-July 02, 1993, Velen, Germany
|
|
|
|
|
|
|
|
|
Susan Hinrichs , Corey Kosak , David R. O'Hallaron , Thomas M. Stricker , Riichiro Take, An architecture for optimal all-to-all personalized communication, Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures, p.310-319, June 27-29, 1994, Cape May, New Jersey, United States
|
|
|
|
|
|
Bogdan S. Chlebus , Stefan Dobrev , Dariusz R. Kowalski , Grzegorz Malewicz , Alex Shvartsman , Imrich Vrto, Towards practical deteministic write-all algorithms, Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures, p.271-280, July 2001, Crete Island, Greece
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Phillip B. Gibbons , Yossi Matias , Vijaya Ramachandran, The QRQW PRAM: accounting for contention in parallel algorithms, Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms, p.638-648, January 23-25, 1994, Arlington, Virginia, United States
|
|
|
|
|
|
|
|
|
|
|
|
Det Buaklee , Gregory F. Tracy , Mary K. Vernon , Stephen J. Wright, Near-optimal adaptive control of a large grid application, Proceedings of the 16th international conference on Supercomputing, June 22-26, 2002, New York, New York, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Qi Ning , Vincent Van Dongen , Guang R. Gao, Automatic decomposition in EPPP compiler, Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research, p.49, October 31-November 03, 1994, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
B. A. Vianna , A. A. Fonseca , N. T. Moura , L. T. Menezes , H. A. Mendes , J. A. Silva , C. Boeres , V. E. F. Rebello, A tool for the design and evaluation of hybrid scheduling algorithms for computational grids, Proceedings of the 2nd workshop on Middleware for grid computing, p.41-46, October 18-22, 2004, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Micah Adler , John W. Byers , Richard M. Karp, Parallel sorting with limited bandwidth, Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures, p.129-136, June 24-26, 1995, Santa Barbara, California, United States
|
|
|
Guy E. Blelloch , Phillip B. Gibbons , Yossi Matias , Marco Zagha, Accounting for memory bank contention and delay in high-bandwidth multiprocessors, Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures, p.84-94, June 24-26, 1995, Santa Barbara, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
Cynthia Dwork , Maurice Herlihy , Orli Waarts, Contention in shared memory algorithms, Proceedings of the twenty-fifth annual ACM symposium on Theory of computing, p.174-183, May 16-18, 1993, San Diego, California, United States
|
|
|
|
|
|
|
|
|
Gilles Hurteau , Vincent Van Dongen , Guang R. Gao, EPPP - an integrated environment for portable parallel programming, Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research, p.31, October 31-November 03, 1994, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Vikram S. Adve , Rajive Bagrodia , James C. Browne , Ewa Deelman , Aditya Dube , Elias N. Houstis , John R. Rice , Rizos Sakellariou , David J. Sundaram-Stukel , Patricia J. Teller , Mary K. Vernon, POEMS: End-to-End Performance Design of Large Parallel Adaptive Computational Systems, IEEE Transactions on Software Engineering, v.26 n.11, p.1027-1048, November 2000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Vasanth Bala , Jehoshua Bruck , Robert Cypher , Pablo Elustando , Alex Ho , Ching-Tien Ho , Shlomo Kipnis , Marc Snir, CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers, IEEE Transactions on Parallel and Distributed Systems, v.6 n.2, p.154-164, February 1995
|
|
|
|
|
|
G. R. Nudd , D. J. Kerbyson , E. Papaefstathiou , S. C. Perry , J. S. Harper , D. V. Wilcox, Pace--A Toolset for the Performance Prediction of Parallel and Distributed Systems, International Journal of High Performance Computing Applications, v.14 n.3, p.228-251, August 2000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mainak Chaudhuri , Mark Heinrich , Chris Holt , Jaswinder Pal Singh , Edward Rothberg , John Hennessy, Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation, IEEE Transactions on Computers, v.52 n.7, p.862-880, July 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jiuxing Liu , Jiesheng Wu , Sushmitha P. Kini , Pete Wyckoff , Dhabaleswar K. Panda, High performance RDMA-based MPI implementation over InfiniBand, Proceedings of the 17th annual international conference on Supercomputing, June 23-26, 2003, San Francisco, CA, USA
|
|
|
|
|
|
|
|
|
Gail A. Alverson , William G. Griswold , Calvin Lin , David Notkin , Lawrence Snyder, Abstractions for Portable, Scalable Parallel Programming, IEEE Transactions on Parallel and Distributed Systems, v.9 n.1, p.71-86, January 1998
|
|
|
|
|
|
Martha Mercaldi , Steven Swanson , Andrew Petersen , Andrew Putnam , Andrew Schwerin , Mark Oskin , Susan J. Eggers, Modeling instruction placement on a spatial architecture, Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures, July 30-August 02, 2006, Cambridge, Massachusetts, USA
|
|
|
|
|
|
|
|
|
Christian Engelmann , Stephen L. Scott , David E. Bernholdt , Narasimha R. Gottumukkala , Chokchai Leangsuksun , Jyothish Varma , Chao Wang , Frank Mueller , Aniruddha G. Shet , P. Sadayappan, MOLAR: adaptive runtime support for high-end computing operating and runtime systems, ACM SIGOPS Operating Systems Review, v.40 n.2, April 2006
|
|
|
|
|
|
J. Dongarra , G. Bosilca , Z. Chen , V. Eijkhout , G. E. Fagg , E. Fuentes , J. Langou , P. Luszczek , J. Pjesivac-Grbovic , K. Seymour , H. You , S. S. Vadhiyar, Self-adapting numerical software (SANS) effort, IBM Journal of Research and Development, v.50 n.2/3, p.223-238, March 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jyothish Varma , Chao Wang , Frank Mueller , Christian Engelmann , Stephen L. Scott, Scalable, fault tolerant membership for MPI tasks on HPC systems, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
|
|
|
|
|
|
Lúcia M. A. Drummond , Eduardo Uchoa , Alexandre D. Gonçalves , Juliana M. N. Silva , Marcelo C. P. Santos , Maria Clícia S. de Castro, A grid-enabled distributed branch-and-bound algorithm with application on the Steiner problem in graphs, Parallel Computing, v.32 n.9, p.629-642, October 2006
|
|
|
Hu Chen , Wenguang Chen , Jian Huang , Bob Robert , H. Kuhn, MPIPP: an automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
|
|
|
|
|
|
|
|
|
|
|
|
Hongzhang Shan , Erich Strohmaier , Ji Qiang , David H. Bailey , Kathy Yelick, Particles and contiuum---Performance modeling and optimization of a high energy colliding beam simulation code, Proceedings of the 2006 ACM/IEEE conference on Supercomputing, November 11-17, 2006, Tampa, Florida
|
|
|
Rod Fatoohi , Ken Kardys , Sumy Koshy , Soundarya Sivaramakrishnan , Jeffrey S. Vetter, Performance evaluation of high-speed interconnects using dense communication patterns, Parallel Computing, v.32 n.11-12, p.794-807, December, 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jelena Pješivac-Grbović , Thara Angskun , George Bosilca , Graham E. Fagg , Edgar Gabriel , Jack J. Dongarra, Performance analysis of MPI collective operations, Cluster Computing, v.10 n.2, p.127-143, June 2007
|
|
|
|
|
|
|
|
|
|
|
|
Jiuxing Liu , Balasubramanian Chandrasekaran , Jiesheng Wu , Weihang Jiang , Sushmitha Kini , Weikuan Yu , Darius Buntinas , Peter Wyckoff , D K. Panda, Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics, Proceedings of the 2003 ACM/IEEE conference on Supercomputing, p.58, November 15-21, 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Brian Holland , Karthik Nagarajan , Chris Conger , Adam Jacobs , Alan D. George, RAT: a methodology for predicting performance in application design migration to FPGAs, Proceedings of the 1st international workshop on High-performance reconfigurable computing technology and applications: held in conjunction with SC07, November 11-11, 2007, Reno, Nevada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rodrigo Fernandes De Mello , Rudinei Goularte , Evgueni Dodonov , Laurence T. Yang , Jong Hyuk Park , Tai-Hoon Kim, On modeling and evaluating multicomputer transcoding architectures for live-video streams, Multimedia Tools and Applications, v.43 n.2, p.109-129, June 2009
|
|
|
|
|
|
S. D. Hammond , G. R. Mudalige , J. A. Smith , S. A. Jarvis , J. A. Herdman , A. Vadgama, WARPP: a toolkit for simulating high-performance parallel scientific codes, Proceedings of the 2nd International Conference on Simulation Tools and Techniques, March 02-06, 2009, Rome, Italy
|
|
|
|
|
|
|
|
|
|
|
|
|
|