|
ABSTRACT
In general, the hardware memory consistency model in a multiprocessor system is not identical to the memory model at the programming language level. Consequently, the programming language memory model must be mapped onto the hardware memory model. Memory fence instructions can be inserted by the compiler where needed to accomplish this mapping. We have developed and implemented several fence insertion and optimization algorithms in our Pensieve compiler project. We present the different fence insertion optimization techniques that were used in this system to guarantee sequential consistency at the language level, and compare them using performance data. Our techniques target two hardware relaxed memory consistency models provided by SMPs based on IBM Power 3 and Intel Pentium 4. Our fence insertion optimization shows up to 17.2% and 32.7% performance improvement on average, with the IBM PowerPC and Intel Pentium 4 (Xeon) multiprocessors respectively.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
Alfred V. Aho , Ravi Sethi , Jeffrey D. Ullman, Compilers: principles, techniques, and tools, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1986
|
| |
3
|
|
| |
4
|
Apple Computer, IBM, and Motorola. PowerPC Microprocessor Common Hardware Reference Platform. Morgan Kaufmann Publishers, Inc., 1995.
|
| |
5
|
|
| |
6
|
Intel Corporation, 2002. The IA-32 Intel® Architecture Software Developer's Manual.
|
 |
7
|
|
| |
8
|
Xing Fang. Inserting fences to guarantee sequential consistency. Master's thesis, Department of Computer Science and Engineering, Michigan State University, August 2002. Technical Report MSU-CSE-02-27.
|
| |
9
|
Michael R. Garey and David S. Johnson. Computers and Intractability. W. H. Freeman and Company, 1979.
|
 |
10
|
Kourosh Gharachorloo , Daniel Lenoski , James Laudon , Phillip Gibbons , Anoop Gupta , John Hennessy, Memory consistency and event ordering in scalable shared-memory multiprocessors, Proceedings of the 17th annual international symposium on Computer Architecture, p.15-26, May 28-31, 1990, Seattle, Washington, United States
|
| |
11
|
James R. Goodman. Cache consistency and sequential consistency. Technical Report CS-TR-91-1006, Department of Computer Science, University of Wisconsin, February 1991.
|
| |
12
|
|
| |
13
|
Leslie Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers, C-28(9):690--691, September 1979.
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
 |
17
|
|
| |
18
|
Zhiyuan Li and Walid Abu-sufah. On reducing data synchronization in multiprocessed loops. IEEE Transactions on Computers, C-36(1):105--109, January 1987.
|
| |
19
|
Samuel P. Midkiff and David A. Padua. Compiler generated synchronization for do loops. In the 1986 International Conference on Parallel Processing, pages 19--22, August 1986.
|
| |
20
|
|
 |
21
|
|
 |
22
|
|
| |
23
|
Zehra Sura, Chi-Leung Wong, Xing Fang, Jaejin Lee, Samuel P. Midkiff, and David Padua. Automatic implementation of programming language consistency models. In Proceedings of The 15th International Workshop on Languages and Compilers for Parallel Computing (LCPC), July 2002.
|
CITED BY 7
|
|
|
|
|
Zehra Sura , Xing Fang , Chi-Leung Wong , Samuel P. Midkiff , Jaejin Lee , David Padua, Compiler techniques for high performance sequentially consistent java programs, Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, June 15-17, 2005, Chicago, IL, USA
|
|
|
Arun Kejariwal , Hideki Saito , Xinmin Tian , Milind Girkar , Wel Li , Utpal Banerjee , Alexandru Nicolau , Constantine D. Polychronopoulos, Lightweight lock-free synchronization methods for multithreading, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|