ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Efficient synchronization: let them eat QOLB
Full text PdfPdf (2.04 MB)
Source International Symposium on Computer Architecture archive
Proceedings of the 24th annual international symposium on Computer architecture table of contents
Denver, Colorado, United States
Pages: 170 - 180  
Year of Publication: 1997
ISBN:0-89791-901-7
Also published in ...
Authors
Alain Kägi  Computer Sciences Department, University of Wisconsin-Madison, 1210 West Dayton Street, Madison, Wisconsin
Doug Burger  Computer Sciences Department, University of Wisconsin-Madison, 1210 West Dayton Street, Madison, Wisconsin
James R. Goodman  Computer Sciences Department, University of Wisconsin-Madison, 1210 West Dayton Street, Madison, Wisconsin
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 45,   Citation Count: 28
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/264107.264166
What is a DOI?

ABSTRACT

Efficient synchronization primitives are essential for achieving high performance in fine-grain, shared-memory parallel programs. One function of synchronization primitives is to enable exclusive access to shared data and critical sections of code. This paper makes three contributions. (1) We enumerate the five sources of overhead that locking synchronization primitives can incur. (2) We describe four mechanisms (local spinning, queue-based locking, collocation, and synchronized prefetch) that reduce these synchronization overheads. (3) With detailed simulations, we show the extent to which these four mechanisms can improve the performance of shared-memory programs. We evaluate the space of these mechanisms using seventeen synchronization constructs, which are formed from six base typed of locks (TEST&SET, TEST&TEST&SET, MCS, LH, M, and QOLB). We show that large performance gains (speedups of more than 1.5 for three of five benchmarks) can be achieved if at least three optimizing mechanisms are used simultaneously. We find that QOLB, which incorporates all four mechanisms, outperforms all other primitives (including reactive synchronization) in all cases. Finally, we demonstrate the superior performance of a low-cost implementation of QOLB, which runs on an unmodified cluster of commodity workstations.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
 
4
 
5
Douglas C. Burger and James R. Goodman. Simulation of the SCI Transport Layer on the Wisconsin Wind Tunnel. In Proceedings of the Second International Workshop on SCI-Based High-Performance Low-Cost Computing, March 1995.
 
6
 
7
Convex Computer Corporation, Richardson, TX. SPPIO00 Systems Overview, 1994.
 
8
Travis S. Craig. Building FIFO and Priority-Queueing Spin Locks from Atomic Swap. Technical Report 93-02-02, Department of Computer Science and Engineering, University of Washington, Seattle, WA, February 1993.
 
9
Cypress Semiconductor, San Jose, CA. CY7C601 SPARC RISC User's Guide, second edition, 1990.
 
10
Joseph A. Fisher. Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Transactions on Computers, (2-30(7):478- 490, July 1981.
 
11
Kourosh Gharaehodoo, Sarita V. Adve, Anoop Gupta, John L. Hennessy, and Mark D. Hill. Programming for Different Memory Consisteney Models. Journal of Parallel and Distributed Computing, 15(4):399--407, 1992.
12
13
14
 
15
Allan Gottlieb, Ralph Grishman, Clyde P. Kruskal, Kevin P. MeAuliffe, Larry Rudolph, and Mare Snir. The NYU Ultraeomputer- Designing an MMD Shared Memory Parallel Computer. IEEE Transactions on Computers, (2-32(2):175-189, February 1983.
 
16
 
17
International Business Machines, Inc., Poughkeepsie, NY. IBM Systert/360 Principles of Operation, ninth edition, May 1970.
18
 
19
Alain Ktigi and James R. Goodman. SofiQOLB: An Ultra-Efficient Synchronization Primitive for Clusters of Commodity Workstations. Technical Report 1327, Computer Sciences Department, University of Wisconsin, Madison, WI, November 1996.
 
20
R.E. Kessler and J. L. Sehwartzmeier. CRAY T3D: A New Dimension for Cray Research. In Proceedings of the 38thlEEE Computer Society International Conference (COMPCON), pages 176--182, February I993.
21
22
23
24
 
25
26
27
 
28
29
30
31
 
32
G. 17. Pfister, W.C. Brantley, D.A. George, S. L, Harvey, W, J, Kleinfelder, K.P. MeAuliffe, E.A. Melton, V. A, Norton, and J. Weiss. The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture. In Proceedings of the 1985 bztenta, tional Conference on Parallel Processing, pages 764-771, August 1985.
33
34
35
 
36
ROSS Technology, Inc., Austin, TX. SPARC RISC User's Guide: hyperSPARC Edition, third edition, September 1993.
37
38
 
39
Ioannis Sehoinas, Babak Falsafi, Mark D, Hill, James R, 1,arus, Christopher E. Lukas, Shubhendu S. Mukherjee, Steven K. Relnhardt, Erie Sehnarr, and David A. Wood. Implementlng Fine-Graln Distributed Shared Memory on Commodity SMP Workstations, Technical Report 1307, UWCS, March 1996.
40
41
42
 
43
IEEE Computer Society. Sealable Coherent interface (SCI), ANSI/ IE Std 1596-1992, August 1993.
 
44
 
45
Webster. Webster's Seventh Dictionary. 1965,
46

CITED BY  28

Collaborative Colleagues:
Alain Kägi: colleagues
Doug Burger: colleagues
James R. Goodman: colleagues