|
ABSTRACT
A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding development of techniques that yield performance across a range of current and future parallel machines. This paper offers a new parallel machine model, called LogP, that reflects the critical technology trends underlying parallel computers. it is intended to serve as a basis for developing fast, portable parallel algorithms and to offer guidelines to machine designers. Such a model must strike a balance between detail and simplicity in order to reveal important bottlenecks without making analysis of interesting problems intractable. The model is based on four parameters that specify abstractly the computing bandwidth, the communication bandwidth, the communication delay, and the efficiency of coupling communication and computation. Portable parallel algorithms typically adapt to the machine configuration, in terms of these parameters. The utility of the model is demonstrated through examples that are implemented on the CM-5.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
A. Aggarwal , A. K. Chandra , M. Snir, On communication latency in PRAM computations, Proceedings of the first annual ACM symposium on Parallel algorithms and architectures, p.11-21, June 18-21, 1989, Santa Fe, New Mexico, United States
[doi> 10.1145/72935.72937]
|
| |
2
|
|
| |
3
|
B. Alpem, L. Carter, E. Feig, and T. Selker. The Uniform Memory Hierarchy Model of Computation. Algorithmica, 1993. (to appear).
|
 |
4
|
|
 |
5
|
|
| |
6
|
G.E. Blelloch. Scans as Primitive Parallel Operations. In Proceedings of lnternational Conference on Parallel Processing, pages 355-362, 1987.
|
 |
7
|
|
| |
8
|
J.M. Cooley and J. W. Tukey. An algorithm for the machine calculation of complex Fourier series. Math. Comp, 19:297- 301, 1965.
|
| |
9
|
David E. Culler , Richard M. Karp , David A. Patterson , Abhijit Sahay , Klaus E Schauser , Eunice Santos , Ramesh Subramonian , Thorsten von Eicken, LogP: Towards a Realistic Model of Parallel Computation, University of California at Berkeley, Berkeley, CA, 1992
|
| |
10
|
W. et al Dally. The J-Machine: A Fine-Grain Concurrent Computer. In IFIP Congress, 1989.
|
| |
11
|
|
| |
12
|
Daniel Lenoski , James Laudon , Kourosh Gharachorloo , Wolf-Dietrich Weber , Anoop Gupta , John Hennessy , Mark Horowitz , Monica S. Lam, The Stanford Dash Multiprocessor, Computer, v.25 n.3, p.63-79, March 1992
[doi> 10.1109/2.121510]
|
 |
13
|
|
 |
14
|
|
| |
15
|
J. L. Hennessy. MIPS R4000 Overview. In Proc. of the 19th lnt'l Symposium on Computer Architecture, Gold Coast, Australia, May 1992.
|
| |
16
|
|
| |
17
|
IEEE. Symposium Record-- Hot Chips IV, August 1992.
|
 |
18
|
|
 |
19
|
Richard M. Karp , Michael Luby , Friedhelm Meyer auf der Heide, Efficient PRAM simulation on a distributed memory machine, Proceedings of the twenty-fourth annual ACM symposium on Theory of computing, p.318-326, May 04-06, 1992, Victoria, British Columbia, Canada
[doi> 10.1145/129712.129743]
|
| |
20
|
|
 |
21
|
Z. M. Kedem , K. V. Palem , P. G. Spirakis, Efficient robust parallel computations, Proceedings of the twenty-second annual ACM symposium on Theory of computing, p.138-148, May 13-17, 1990, Baltimore, Maryland, United States
[doi> 10.1145/100216.100231]
|
| |
22
|
C.U.Martel andA. Raghunathan.Asynchronous PRAMs with memory latency. Technical report, University of California, Davis, Division of Computer Science, 1991.
|
| |
23
|
|
 |
24
|
|
 |
25
|
|
| |
26
|
A.G. Ranade. How to emulate shared memory.in Proceedings of the 28th IEEE Annual Symposium on Foundations of Computer Science, pages 185-194, 1987.
|
| |
27
|
|
| |
28
|
|
| |
29
|
R. Subramonian. The influence of limited bandwidth on algorithm design and implementation. In Dartmouth Institute for Advanced Graduate Studies in Parallel Computation (DAGS/PC), June 1992.
|
 |
30
|
|
 |
31
|
Thorsten von Eicken , David E. Culler , Seth Copen Goldstein , Klaus Erik Schauser, Active messages: a mechanism for integrated communication and computation, Proceedings of the 19th annual international symposium on Computer architecture, p.256-266, May 19-21, 1992, Queensland, Australia
|
CITED BY 217
|
|
|
|
Xiaotie Deng , Nian Gu , Tim Brecht , KaiCheng Lu, Preemptive scheduling of parallel jobs on multiprocessors, Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms, p.159-167, January 28-30, 1996, Atlanta, Georgia, United States
|
|
|
|
|
|
|
|
|
|
|
Vijaya Ramachandran , Brian Grayson , Michael Dahlin, Emulations between QSM, BSP, and LogP: a framework for general-purpose parallel algorithm design, Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms, p.957-958, January 17-19, 1999, Baltimore, Maryland, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sathish S. Vadhiyar , Graham E. Fagg , Jack Dongarra, Automatically tuned collective communications, Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), p.3-es, November 04-10, 2000, Dallas, Texas, United States
|
|
|
|
|
|
|
G. R. Nudd , D. J. Kerbyson , E. Papaefstathiou , S. C. Perry , J. S. Harper , D. V. Wilcox, Pace--A Toolset for the Performance Prediction of Parallel and Distributed Systems, International Journal of High Performance Computing Applications, v.14 n.3, p.228-251, August 2000
|
|
|
|
|
Christian Engelmann , Stephen L. Scott , David E. Bernholdt , Narasimha R. Gottumukkala , Chokchai Leangsuksun , Jyothish Varma , Chao Wang , Frank Mueller , Aniruddha G. Shet , P. Sadayappan, MOLAR: adaptive runtime support for high-end computing operating and runtime systems, ACM SIGOPS Operating Systems Review, v.40 n.2, April 2006
|
|
Hu Chen , Wenguang Chen , Jian Huang , Bob Robert , H. Kuhn, MPIPP: an automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
|
|
|
Rod Fatoohi , Ken Kardys , Sumy Koshy , Soundarya Sivaramakrishnan , Jeffrey S. Vetter, Performance evaluation of high-speed interconnects using dense communication patterns, Parallel Computing, v.32 n.11-12, p.794-807, December, 2006
|
|
|
Jiuxing Liu , Balasubramanian Chandrasekaran , Jiesheng Wu , Weihang Jiang , Sushmitha Kini , Weikuan Yu , Darius Buntinas , Peter Wyckoff , D K. Panda, Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics, Proceedings of the 2003 ACM/IEEE conference on Supercomputing, p.58, November 15-21, 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bjoern Haake , Klaus E. Schauser , Chris Scheiman, Profiling a parallel language based on fine-grained communication, Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), p.17-es, January 01-01, 1996, Pittsburgh, Pennsylvania, United States
|
|
|
|
|
|
Qi Ning , Vincent Van Dongen , Guang R. Gao, Automatic decomposition in EPPP compiler, Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research, p.49, October 31-November 03, 1994, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jiuxing Liu , Jiesheng Wu , Sushmitha P. Kini , Pete Wyckoff , Dhabaleswar K. Panda, High performance RDMA-based MPI implementation over InfiniBand, Proceedings of the 17th annual international conference on Supercomputing, June 23-26, 2003, San Francisco, CA, USA
|
|
|
Lúcia M. A. Drummond , Eduardo Uchoa , Alexandre D. Gonçalves , Juliana M. N. Silva , Marcelo C. P. Santos , Maria Clícia S. de Castro, A grid-enabled distributed branch-and-bound algorithm with application on the Steiner problem in graphs, Parallel Computing, v.32 n.9, p.629-642, October 2006
|
|
|
Jelena Pješivac-Grbović , Thara Angskun , George Bosilca , Graham E. Fagg , Edgar Gabriel , Jack J. Dongarra, Performance analysis of MPI collective operations, Cluster Computing, v.10 n.2, p.127-143, June 2007
|
|
|
|
|
|
Rodrigo Fernandes De Mello , Rudinei Goularte , Evgueni Dodonov , Laurence T. Yang , Jong Hyuk Park , Tai-Hoon Kim, On modeling and evaluating multicomputer transcoding architectures for live-video streams, Multimedia Tools and Applications, v.43 n.2, p.109-129, June 2009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
B. A. Vianna , A. A. Fonseca , N. T. Moura , L. T. Menezes , H. A. Mendes , J. A. Silva , C. Boeres , V. E. F. Rebello, A tool for the design and evaluation of hybrid scheduling algorithms for computational grids, Proceedings of the 2nd workshop on Middleware for grid computing, p.41-46, October 18-22, 2004, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
David E. Culler , Richard M. Karp , David Patterson , Abhijit Sahay , Eunice E. Santos , Klaus Erik Schauser , Ramesh Subramonian , Thorsten von Eicken, LogP: a practical model of parallel computation, Communications of the ACM, v.39 n.11, p.78-85, Nov. 1996
|
|
|
|
|
|
|
|
|
|
Jaswinder Pal Singh , Edward Rothberg , Anoop Gupta, Modeling communication in parallel algorithms: a fruitful interaction between theory and systems?, Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures, p.189-199, June 27-29, 1994, Cape May, New Jersey, United States
|
|
|
|
|
Phillip B. Gibbons , Yossi Matias , Vijaya Ramachandran, The QRQW PRAM: accounting for contention in parallel algorithms, Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms, p.638-648, January 23-25, 1994, Arlington, Virginia, United States
|
|
|
|
|
|
Gilles Hurteau , Vincent Van Dongen , Guang R. Gao, EPPP - an integrated environment for portable parallel programming, Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research, p.31, October 31-November 03, 1994, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
Mainak Chaudhuri , Mark Heinrich , Chris Holt , Jaswinder Pal Singh , Edward Rothberg , John Hennessy, Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation, IEEE Transactions on Computers, v.52 n.7, p.862-880, July 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
S. D. Hammond , G. R. Mudalige , J. A. Smith , S. A. Jarvis , J. A. Herdman , A. Vadgama, WARPP: a toolkit for simulating high-performance parallel scientific codes, Proceedings of the 2nd International Conference on Simulation Tools and Techniques, March 02-06, 2009, Rome, Italy
|
|
|
|
|
|
|
|
Y. Cho , M. Winslett , M. Subramaniam , Y. Chen , S. Kuo , K. E. Seamons, Exploiting local data in parallel array I/O on a practical network of workstations, Proceedings of the fifth workshop on I/O in parallel and distributed systems, p.1-13, November 17-17, 1997, San Jose, California, United States
|
|
|
|
|
Yishay Mansour , Noam Nisan , Uzi Vishkin, Trade-offs between communication throughput and parallel time, Proceedings of the twenty-sixth annual ACM symposium on Theory of computing, p.372-381, May 23-25, 1994, Montreal, Quebec, Canada
|
|
Det Buaklee , Gregory F. Tracy , Mary K. Vernon , Stephen J. Wright, Near-optimal adaptive control of a large grid application, Proceedings of the 16th international conference on Supercomputing, June 22-26, 2002, New York, New York, USA
|
|
Guy E. Blelloch , Phillip B. Gibbons , Yossi Matias , Marco Zagha, Accounting for memory bank contention and delay in high-bandwidth multiprocessors, Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures, p.84-94, June 24-26, 1995, Santa Barbara, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Brian Holland , Karthik Nagarajan , Chris Conger , Adam Jacobs , Alan D. George, RAT: a methodology for predicting performance in application design migration to FPGAs, Proceedings of the 1st international workshop on High-performance reconfigurable computing technology and applications: held in conjunction with SC07, November 11-11, 2007, Reno, Nevada
|
|
|
|
Jehoshua Bruck , Ching-Tien Ho , Shlomo Kipnis , Derrick Weathersby, Efficient algorithms for all-to-all communications in multi-port message-passing systems, Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures, p.298-309, June 27-29, 1994, Cape May, New Jersey, United States
|
|
|
|
|
|
Andrew Sohn , Yuetsu Kodama , Jui Ku , Mitsuhisa Sato , Hirofumi Sakane , Hayato Yamana , Shuichi Sakai , Yoshinori Yamaguchi, Fine-grain multithreading with the EM-X multiprocessor, Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, p.189-198, June 23-25, 1997, Newport, Rhode Island, United States
|
|
|
|
Susan Hinrichs , Corey Kosak , David R. O'Hallaron , Thomas M. Stricker , Riichiro Take, An architecture for optimal all-to-all personalized communication, Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures, p.310-319, June 27-29, 1994, Cape May, New Jersey, United States
|
|
Bogdan S. Chlebus , Stefan Dobrev , Dariusz R. Kowalski , Grzegorz Malewicz , Alex Shvartsman , Imrich Vrto, Towards practical deteministic write-all algorithms, Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures, p.271-280, July 2001, Crete Island, Greece
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Gail A. Alverson , William G. Griswold , Calvin Lin , David Notkin , Lawrence Snyder, Abstractions for Portable, Scalable Parallel Programming, IEEE Transactions on Parallel and Distributed Systems, v.9 n.1, p.71-86, January 1998
|
|
Martha Mercaldi , Steven Swanson , Andrew Petersen , Andrew Putnam , Andrew Schwerin , Mark Oskin , Susan J. Eggers, Modeling instruction placement on a spatial architecture, Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures, July 30-August 02, 2006, Cambridge, Massachusetts, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Richard M. Karp , Abhijit Sahay , Eunice E. Santos , Klaus Erik Schauser, Optimal broadcast and summation in the LogP model, Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures, p.142-153, June 30-July 02, 1993, Velen, Germany
|
|
|
|
|
|
|
Micah Adler , John W. Byers , Richard M. Karp, Parallel sorting with limited bandwidth, Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures, p.129-136, June 24-26, 1995, Santa Barbara, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Phillip B. Gibbons , Yossi Matias , Vijaya Ramachandran, Can shared-memory model serve as a bridging model for parallel computation?, Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, p.72-83, June 23-25, 1997, Newport, Rhode Island, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Arvind Krishnamurthy , Klaus E. Schauser , Chris J. Scheiman , Randolph Y. Wang , David E. Culler , Katherine Yelick, Evaluation of architectural support for global address-based communication in large-scale parallel machines, ACM SIGOPS Operating Systems Review, v.30 n.5, p.37-48, Dec. 1996
|
|
Micah Adler , Wolfgang Dittrich , Ben Juurlink , Mirosław Kutyłowski , Ingo Rieping, Communication-optimal parallel minimum spanning tree algorithms (extended abstract), Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures, p.27-36, June 28-July 02, 1998, Puerto Vallarta, Mexico
|
|
Mark Goudreau , Kevin Lang , Satish Rao , Torsten Suel , Thanasis Tsantilas, Towards efficiency and portability: programming with the BSP model, Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures, p.1-12, June 24-26, 1996, Padua, Italy
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Gianfranco Bilardi , Kieran T. Herley , Andrea Pietracaprina , Geppino Pucci , Paul Spirakis, BSP vs LogP, Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures, p.25-32, June 24-26, 1996, Padua, Italy
|
|
|
|
|
|
|
|
|
Hongzhang Shan , Erich Strohmaier , Ji Qiang , David H. Bailey , Kathy Yelick, Particles and contiuum---Performance modeling and optimization of a high energy colliding beam simulation code, Proceedings of the 2006 ACM/IEEE conference on Supercomputing, November 11-17, 2006, Tampa, Florida
|
|
|
|
|
|
|
Remzi H. Arpaci , Andrea C. Dusseau , Amin M. Vahdat , Lok T. Liu , Thomas E. Anderson , David A. Patterson, The interaction of parallel and sequential workloads on a network of workstations, ACM SIGMETRICS Performance Evaluation Review, v.23 n.1, p.267-278, May 1995
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Frank Dehne , Xiaotie Deng , Patrick Dymond , Andreas Fabri , Ashfaq A. Khokhar, A randomized parallel 3D convex hull algorithm for coarse grained multicomputers, Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures, p.27-33, June 24-26, 1995, Santa Barbara, California, United States
|
|
Richard P. Martin , Amin M. Vahdat , David E. Culler , Thomas E. Anderson, Effects of communication latency, overhead, and bandwidth in a cluster architecture, ACM SIGARCH Computer Architecture News, v.25 n.2, p.85-97, May 1997
|
|
Cynthia Dwork , Maurice Herlihy , Orli Waarts, Contention in shared memory algorithms, Proceedings of the twenty-fifth annual ACM symposium on Theory of computing, p.174-183, May 16-18, 1993, San Diego, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
Albert Alexandrov , Mihai F. Ionescu , Klaus E. Schauser , Chris Scheiman, LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation, Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures, p.95-105, June 24-26, 1995, Santa Barbara, California, United States
|
|
|
|
|
|
Vikram S. Adve , Rajive Bagrodia , James C. Browne , Ewa Deelman , Aditya Dube , Elias N. Houstis , John R. Rice , Rizos Sakellariou , David J. Sundaram-Stukel , Patricia J. Teller , Mary K. Vernon, POEMS: End-to-End Performance Design of Large Parallel Adaptive Computational Systems, IEEE Transactions on Software Engineering, v.26 n.11, p.1027-1048, November 2000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Micah Adler , Phillip B. Gibbons , Vijaya Ramachandran , Yossi Matias, Modeling parallel bandwidth: local vs. global restrictions, Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, p.94-105, June 23-25, 1997, Newport, Rhode Island, United States
|
|
|
|
|
Jyothish Varma , Chao Wang , Frank Mueller , Christian Engelmann , Stephen L. Scott, Scalable, fault tolerant membership for MPI tasks on HPC systems, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
|
|
|
Vasanth Bala , Jehoshua Bruck , Robert Cypher , Pablo Elustando , Alex Ho , Ching-Tien Ho , Shlomo Kipnis , Marc Snir, CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers, IEEE Transactions on Parallel and Distributed Systems, v.6 n.2, p.154-164, February 1995
|
|
|
|
|
|
|
|
|
|
|
|
|
Mark W. Goudreau , Kevin Lang , Girija Narlikar , Satish B. Rao, BOS is boss: a case for bulk-synchronous object systems, Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures, p.115-125, June 27-30, 1999, Saint Malo, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
J. Dongarra , G. Bosilca , Z. Chen , V. Eijkhout , G. E. Fagg , E. Fuentes , J. Langou , P. Luszczek , J. Pjesivac-Grbovic , K. Seymour , H. You , S. S. Vadhiyar, Self-adapting numerical software (SANS) effort, IBM Journal of Research and Development, v.50 n.2/3, p.223-238, March 2006
|
|
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE Design Automation Conference on
Gwo-Dong Chen
, Daniel D. Gajski
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
|