|
ABSTRACT
Cilk (pronounced “silk”) is a C-based runtime system for multi-threaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the “work” and “critical path” of a Cilk computation can be used to accurately model performance. Consequently, a Cilk programmer can focus on reducing the work and critical path of his computation, insulated from load balancing and other runtime scheduling issues. We also prove that for the class of “fully strict” (well-structured) programs, the Cilk scheduler achieves space, time and communication bounds all within a constant factor of optimal.The Cilk runtime system currently runs on the Connection Machine CM5 MPP, the Intel Paragon MPP, the Silicon Graphics Power Challenge SMP, and the MIT Phish network of workstations. Applications written in Cilk include protein folding, graphic rendering, backtrack search, and the *Socrates chess program, which won third prize in the 1994 ACM International Computer Chess Championship.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Thomas E. Anderson , Brian N. Bershad , Edward D. Lazowska , Henry M. Levy, Scheduler activations: effective kernel support for the user-level management of parallelism, Proceedings of the thirteenth ACM symposium on Operating systems principles, p.95-109, October 13-16, 1991, Pacific Grove, California, United States
|
| |
2
|
Guy E. BIelloch. Programming parallel algorithms. In Proceedings of the 1992 Dartmouth Institute for Advanced Graduate Studies (DAGS) Symposium on Parallel Computation, pages 11- 18, Hanover, New Hampshire, June 1992.
|
| |
3
|
Robert D. Blumofe and Charles E. Leiserson. Scheduling multithreaded computations by work stealing. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science, pages 356-368, Santa Fe, New Mexico, November 1994.
|
| |
4
|
Robert D. Blumofe and David S. Park. Scheduling large-scale parallel computations on networks of workstations. In Proceedings of the Third International Symposium on High Performance Distributed Computing, pages 96-105, San Francisco, California, August 1994.
|
 |
5
|
|
| |
6
|
Eric A. Brewer and Robert Blumofe. Strata: A multi-layer communications library. Technical Report to appear, MIT Laboratory for Computer Science. Available as ftp: / / ftp. ics .mit. edu/pub/supertech/strata /strata. tar. Z.
|
 |
7
|
|
| |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
Eric C. Cooper and Richard R Draves. C Threads. Technical Report CMU-CS-88-154, School of Computer Science, Carnegie- Mellon University, June 1988.
|
 |
12
|
David E. Culler , Anurag Sah , Klaus E. Schauser , Thorsten von Eicken , John Wawrzynek, Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.164-175, April 08-11, 1991, Santa Clara, California, United States
|
 |
13
|
Ranier Feldmann , Peter Mysliwiete , Burkhard Monien, Studying overheads in massively parallel MIN/MAX-tree evaluation, Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures, p.94-103, June 27-29, 1994, Cape May, New Jersey, United States
[doi> 10.1145/181014.192325]
|
 |
14
|
|
| |
15
|
Vincent W. Freeh, David K. Lowenthal, and Gregory R. Andrews. Distributed Filaments: Efficient fine-grain parallelism on a cluster of workstations. In Proceedings of the First Symposium on Operating Systems Design and Implementation, pages 201-213, Monterey, California, November 1994.
|
| |
16
|
R. L. Graham Bounds for certain multiprocessing anomalies. The Bell System Technical Journal, 45:1563-1581, November 1966.
|
| |
17
|
R. L. Graham. Bounds on multiprocessing timing anomalies. SIAM Journal on Applied Mathematics, 17(2):416-429, March 1969.
|
| |
18
|
Michael Halbherr, Yuli Zhou, and Chris E Joerg. MIMD-style parallel programming with continuation-passing threads. In Proceedings of the 2nd International Workshop on Massive Parallelism: Hardware, Software, and Applications, Capn, Italy, September 1994.
|
 |
19
|
|
 |
20
|
|
 |
21
|
Wilson C. Hsieh , Paul Wang , William E. Weihl, Computation migration: enhancing locality for distributed-memory parallel systems, Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.239-248, May 19-22, 1993, San Diego, California, United States
|
 |
22
|
|
| |
23
|
Chris Joerg and Bradley C. Kuszmaul. Masswely parallel chess In Proceedings of the Third DIMACS Parallel Implementation Challenge, Rutgers University, New Jersey, October 1994. Available as ftp://theory.lcs.mit.edu /pub/cilk/dimacs 9 4. ps. Z.
|
| |
24
|
L. V. Kal6. The Chare kernel parallel programming system. In Proceedings of the 1990 International Conference on Parallel Processing, Volume II: Software, pages 17-25, August 1990.
|
 |
25
|
|
| |
26
|
|
 |
27
|
|
 |
28
|
D. A. Kranz , R. H. Halstead, Jr. , E. Mohr, Mul-T: a high-performance parallel Lisp, Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation, p.81-90, June 19-23, 1989, Portland, Oregon, United States
|
| |
29
|
|
 |
30
|
Charles E. Leiserson , Zahi S. Abuhamdeh , David C. Douglas , Carl R. Feynman , Mahesh N. Ganmukhi , Jeffrey V. Hill , Daniel Hillis , Bradley C. Kuszmaul , Margaret A. St. Pierre , David S. Wells , Monica C. Wong , Shaw-Wen Yang , Robert Zak, The network architecture of the Connection Machine CM-5 (extended abstract), Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures, p.272-285, June 29-July 01, 1992, San Diego, California, United States
[doi> 10.1145/140901.141883]
|
 |
31
|
|
| |
32
|
|
| |
33
|
|
| |
34
|
|
| |
35
|
Vijay S. Pande, Christopher E Joerg, Alexander Yu Grosberg, and Toyolchi Tanaka. Enumerations of the hamiltonian walks on a cubic sublattice. Journal of Physics A. 27, 1994.
|
| |
36
|
|
 |
37
|
Larry Rudolph , Miriam Slivkin-Allalouf , Eli Upfal, A simple load balancing scheme for task allocation in parallel machines, Proceedings of the third annual ACM symposium on Parallel algorithms and architectures, p.237-245, July 21-24, 1991, Hilton Head, South Carolina, United States
[doi> 10.1145/113379.113401]
|
| |
38
|
|
| |
39
|
Andrew S. Tanenbaum, Henri E, Bal, and M. Frans Kaashoek. Programming a distributed system using shared objects. In Proceedings of the Second International Symposium on High Performance Distributed Computing, pages 5-12, Spokane, Washington, July 1993.
|
| |
40
|
|
| |
41
|
|
CITED BY 99
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Gary M. Zoppetti , Gagan Agrawal , Lori Pollock , Jose Nelson Amaral , Xinan Tang , Guang Gao, Automatic compiler techniques for thread coarsening for multithreaded architectures, Proceedings of the 14th international conference on Supercomputing, p.306-315, May 08-11, 2000, Santa Fe, New Mexico, United States
|
|
|
|
|
|
|
|
|
Chi-Chao Chang , Grzegorz Czajkowski , Thorsten von Eicken , Carl Kesselman, Evaluating the performance limitations of MPMD communication, Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM), p.1-10, November 15-21, 1997, San Jose, CA
|
|
|
|
|
|
Jason Hill , Robert Szewczyk , Alec Woo , Seth Hollar , David Culler , Kristofer Pister, System architecture directions for networked sensors, ACM SIGPLAN Notices, v.35 n.11, p.93-104, Nov. 2000
|
|
|
Michael O. Neary , Sean P. Brydon , Paul Kmiec , Sami Rollins , Peter Cappello, Javelin++: scalability issues in global computing, Proceedings of the ACM 1999 conference on Java Grande, p.171-180, June 12-14, 1999, San Francisco, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
Siddhartha Chatterjee , Alvin R. Lebeck , Praveen K. Patnala , Mithuna Thottethodi, Recursive array layouts and fast parallel matrix multiplication, Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures, p.222-231, June 27-30, 1999, Saint Malo, France
|
|
|
|
|
|
|
|
|
Jason Hill , Robert Szewczyk , Alec Woo , Seth Hollar , David Culler , Kristofer Pister, System architecture directions for networked sensors, ACM SIGOPS Operating Systems Review, v.34 n.5, p.93-104, Dec. 2000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
John Plevyak , Vijay Karamcheti , Xingbin Zhang , Andrew A. Chien, A hybrid execution model for fine-grained languages on distributed memory multicomputers, Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), p.41-es, December 04-08, 1995, San Diego, California, United States
|
|
|
Suvas Vajracharya , Steve Karmesin , Peter Beckman , James Crotinger , Allen Malony , Sameer Shende , Rod Oldehoeft , Stephen Smith, SMARTS: exploiting temporal locality and parallelism through vertical execution, Proceedings of the 13th international conference on Supercomputing, p.302-310, June 20-25, 1999, Rhodes, Greece
|
|
|
|
|
|
|
|
|
Guy E. Blelloch , Phillip B. Gibbons , Girija J. Narlikar , Yossi Matias, Space-efficient scheduling of parallelism with synchronization variables, Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, p.12-23, June 23-25, 1997, Newport, Rhode Island, United States
|
|
|
|
|
|
Guang-Ien Cheng , Mingdong Feng , Charles E. Leiserson , Keith H. Randall , Andrew F. Stark, Detecting data races in Cilk programs that use locks, Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures, p.298-309, June 28-July 02, 1998, Puerto Vallarta, Mexico
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Remzi H. Arpaci-Dusseau , Eric Anderson , Noah Treuhaft , David E. Culler , Joseph M. Hellerstein , David Patterson , Kathy Yelick, Cluster I/O with River: making the fast case common, Proceedings of the sixth workshop on I/O in parallel and distributed systems, p.10-22, May 05-05, 1999, Atlanta, Georgia, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Robert D. Blumofe , Matteo Frigo , Christopher F. Joerg , Charles E. Leiserson , Keith H. Randall, An analysis of dag-consistent distributed shared-memory algorithms, Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures, p.297-308, June 24-26, 1996, Padua, Italy
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kunal Agrawal , Yuxiong He , Wen Jing Hsu , Charles E. Leiserson, Adaptive scheduling with parallelism feedback, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, March 29-31, 2006, New York, New York, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Arun Kejariwal , Alexandru Nicolau , Hideki Saito , Xinmin Tian , Milind Girkar , Utpal Banerjee , Constantine D. Polychronopoulos, A general approach for partitioning N-dimensional parallel nested loops with conditionals, Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures, July 30-August 02, 2006, Cambridge, Massachusetts, USA
|
|
|
|
|
|
|
|
|
Kayvon Fatahalian , Daniel Reiter Horn , Timothy J. Knight , Larkhoon Leem , Mike Houston , Ji Young Park , Mattan Erez , Manman Ren , Alex Aiken , William J. Dally , Pat Hanrahan, Memory---Sequoia: programming the memory hierarchy, Proceedings of the 2006 ACM/IEEE conference on Supercomputing, November 11-17, 2006, Tampa, Florida
|
|
|
Nikos Chrisochoides , Andriy Fedorov , Andriy Kot , Neculai Archip , Peter Black , Olivier Clatz , Alexandra Golby , Ron Kikinis , Simon K. Warfield, Imaging and visual analysis---Toward real-time image guided neurosurgery using distributed and grid computing, Proceedings of the 2006 ACM/IEEE conference on Supercomputing, November 11-17, 2006, Tampa, Florida
|
|
|
Bratin Saha , Ali-Reza Adl-Tabatabai , Anwar Ghuloum , Mohan Rajagopalan , Richard L. Hudson , Leaf Petersen , Vijay Menon , Brian Murphy , Tatiana Shpeisman , Eric Sprangle , Anwar Rohillah , Doug Carmean , Jesse Fang, Enabling scalability and performance in a large scale CMP environment, ACM SIGOPS Operating Systems Review, v.41 n.3, June 2007
|
|
|
Wei Lu , Dennis Gannon, Parallel XML processing by work stealing, Proceedings of the 2007 workshop on Service-oriented computing performance: aspects, issues, and approaches, p.31-38, June 25-25, 2007, Monterey, California, USA
|
|
|
|
|
|
|
|
|
Shimin Chen , Phillip B. Gibbons , Michael Kozuch , Vasileios Liaskovitis , Anastassia Ailamaki , Guy E. Blelloch , Babak Falsafi , Limor Fix , Nikos Hardavellas , Todd C. Mowry , Chris Wilkerson, Scheduling threads for constructive cache sharing on CMPs, Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures, June 09-11, 2007, San Diego, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mike Houston , Ji-Young Park , Manman Ren , Timothy Knight , Kayvon Fatahalian , Alex Aiken , William Dally , Pat Hanrahan, A portable runtime interface for multi-level memory hierarchies, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, February 20-23, 2008, Salt Lake City, UT, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ganesh Bikshandi , Jose G. Castanos , Sreedhar B. Kodali , V. Krishna Nandivada , Igor Peshansky , Vijay A. Saraswat , Sayantan Sur , Pradeep Varma , Tong Wen, Efficient, portable implementation of asynchronous multi-place programs, Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, February 14-18, 2009, Raleigh, NC, USA
|
|
|
|
|
|
Angela C. Sodan , Garima Gupta , Lin Han , Lun Liu , Benjamin Lafreniere, Time and space adaptation for computational grids with the ATOP-Grid middleware, Future Generation Computer Systems, v.24 n.6, p.561-581, June, 2008
|
|
|
Xavier Teruel , Priya Unnikrishnan , Xavier Martorell , Eduard Ayguadé , Raul Silvera , Guansong Zhang , Ettore Tiotto, OpenMP tasks in IBM XL compilers, Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds, October 27-30, 2008, Ontario, Canada
|
|
|
Olivier Certner , Zheng Li , Pierre Palatin , Olivier Temam , Frederic Arzel , Nathalie Drach, A practical approach for reconciling high and predictable performance in non-regular parallel programs, Proceedings of the conference on Design, automation and test in Europe, March 10-14, 2008, Munich, Germany
|
|
|
|
|
|
|
|
|
Milind Kulkarni , Martin Burtscher , Rajeshkar Inkulu , Keshav Pingali , Calin Casçaval, How much parallelism is there in irregular applications?, Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, February 14-18, 2009, Raleigh, NC, USA
|
|
|
|
|
|
|
|
|
Aydin Buluç , Jeremy T. Fineman , Matteo Frigo , John R. Gilbert , Charles E. Leiserson, Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks, Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, August 11-13, 2009, Calgary, AB, Canada
|
|
|
Angeles Navarro , Rafael Asenjo , Siham Tabik , Cǎlin Caşcaval, Load balancing using work-stealing for pipeline parallelism in emerging applications, Proceedings of the 23rd international conference on Supercomputing, June 08-12, 2009, Yorktown Heights, NY, USA
|
|
|
Jun Shirako , Jisheng M. Zhao , V. Krishna Nandivada , Vivek N. Sarkar, Chunking parallel loops in the presence of synchronization, Proceedings of the 23rd international conference on Supercomputing, June 08-12, 2009, Yorktown Heights, NY, USA
|
|
|
Mohammad Zalfany Urfianto , Tsuyoshi Isshiki , Arif Ullah Khan , Dongju Li , Hiroaki Kunieda, Decomposition of Task-Level Concurrency on C Programs Applied to the Design of Multiprocessor SoC, IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, v.E91-A n.7, p.1748-1756, July 2008
|
|
|
Daniel Spoonhower , Guy E. Blelloch , Phillip B. Gibbons , Robert Harper, Beyond nested parallelism: tight bounds on work-stealing overheads for parallel futures, Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, August 11-13, 2009, Calgary, AB, Canada
|
|
|
|
|