|
ABSTRACT
The problem of writing software for multicore processors is greatly simplified if we could automatically parallelize sequential programs. Although auto-parallelization has been studied for many decades, it has succeeded only in a few application areas such as dense matrix computations. In particular, auto-parallelization of irregular programs, which are organized around large, pointer-based data structures like graphs, has seemed intractable. The Galois project is taking a fresh look at autoparallelization. Rather than attempt to parallelize all programs no matter how obscurely they are written, we are designing programming abstractions that permit programmers to highlight opportunities for exploiting parallelism in sequential programs, and building a runtime system that uses these hints to execute the program in parallel. In this paper, we describe the design and implementation of a system based on these ideas. Experimental results for two real-world irregular applications, a Delaunay mesh refinement application and a graphics application that performs agglomerative clustering, demonstrate that this approach is promising.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Burke, M., Carini, P., Choi, J.-D. Interprocedural Pointer Alias Analysis. Technical Report IBM RC 21055, IBM Yorktown Heights, 1997.
|
| |
2
|
Chew, L.P. Guaranteed-quality mesh generation for curved surfaces. In SCG'93: Proceedings of the 9th Annual Symposium on Computational Geometry (1993), 274--280.
|
| |
3
|
de Galas, J. The quest for more processing power: is the single core CPU doomed? http://www.anandtech.com/cpuchipsets/showdoc.aspx?I=2377, February 2005.
|
| |
4
|
Diniz, P.C., Rinard, M.C. Commutativity analysis: a new analysis technique for parallelizing compilers. ACM Trans. Prog. Lang. Syst. 19, 6 (1997), 942--991.
|
| |
5
|
Ghiya, R., Hendren, L. Is it a tree, a dag, or a cyclic graph? A shape analysis for heap-directed pointers c. In POPL, 1996.
|
| |
6
|
Herlihy, M., Koskinen, E. Transactional boosting: a methodology for highlyconcurrent transactional objects. In Principles and Practices of Parallel Programming (PPoPP), 2008.
|
| |
7
|
Herlihy, M., Moss, J.E.B. Transactional memory: architectural support for lock-free data structures. In ISCA '93: Proceedings of the 20th Annual International Symposium on Computer Architecture (1993).
|
| |
8
|
Hudson, B., Miller, G.L., Phillips, T. Sparse parallel Delaunay mesh refinement. In SPAA (2007).
|
| |
9
|
Jefferson, D.R. Virtual time. ACM Trans. Prog. Lang. Syst. 7, 3 (1985), 404--425.
|
| |
10
|
Kennedy, K., Allen, J., editors. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann, 2001.
|
| |
11
|
Kulkarni, M., Burtscher, M., Inkulu, R., Pingali, K., Cascaval, C. How much parallelism is there in irregular applications? In Principles and Practices of Parallel Programming (PPoPP), 2009.
|
| |
12
|
Kulkarni, M., Carribault, P., Pingali, K., Ramanarayanan, G., Walter, B., Bala, K., Chew, L.P. Scheduling strategies for optimistic parallel execution of irregular programs. In Symposium on Parallel Architectures and Algorithms (SPAA) (2008).
|
| |
13
|
Kulkarni M., Pingali, K., Ramanarayanan, G., Walter, B., Bala, K., Chew, L.P. Optimistic parallelism benefits from data partitioning. SIGARCH Comput. Archit. News 36, 1 (2008), 233--243.
|
| |
14
|
Kulkarni, M., Pingali, K., Walter, B., Ramanarayanan, G., Bala, K., Chew, L.P. Optimistic parallelism requires abstractions. SIGPLAN Not (Proceedings of PLDI 2007) 42, 6 (2007), 211--222.
|
| |
15
|
Larus, J., Rajwar, R. Transactional Memory (Synthesis Lectures on Computer Architecture). Morgan & Claypool Publishers, 2007.
|
| |
16
|
Ni, Y., Menon, V., Adl-Tabatabai, A.-R., Hosking, A.L., Hudson, R., Moss, J.E.B., Saha, B., Shpeisman, T. Open nesting in software transactional memory. In Principles and Practices of Parallel Programming (PPoPP), 2007.
|
| |
17
|
Pang-Ning Tan, M.S., Kumar, V., editors. Introduction to Data Mining. Pearson Addison Wesley, 2005.
|
| |
18
|
Ponnusamy, R., Saltz, J., Choudhary, A. Runtime compilation techniques for data partitioning and communication schedule reuse. In Proceedings of the 1993 ACM/IEEE Conference on Supercomputing (1993).
|
| |
19
|
Rauchwerger, L., Padua, D.A. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel Distrib. Syst. 10, 2 (1999), 160--180.
|
| |
20
|
Sagiv, M., Reps, T., Wilhelm, R. Solving shape-analysis problems in languages with destructive updating. In Proceedings of the 23rd Annual ACM Symposium on the Principles of Programming Languages (st. Petersburg Beach, FL, January 1996).
|
| |
21
|
Shewchuk, J.R. Triangle: Engineering a 2D quality mesh generator and Delaunay triangulator. In Applied Computational Geometry: Towards Geometric Engineering, volume 1148 of Lecture Notes in Computer Science. May 1996, 203--222.
|
| |
22
|
Steffan, J.G., Colohan, C.B., Zhai, A., Mowry, T.C. A scalable approach to thread-level speculation. In ISCA '00: Proceedings of the 27th Annual International Symposium on Computer Architecture (2000).
|
| |
23
|
Walter, B., Fernandez, S., Arbree, A., Bala, K., Donikian, M., Greenberg, D. Lightcuts: a scalable approach to illumination. ACM Trans. Graphics (SIGGRAPH) 24, 3 (July 2005), 1098--1107.
|
| |
24
|
Zhan, L.R.Y., Torrellas, J. Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors. In HPCA '98: Proceedings of the 4th International Symposium on High-Performance Computer Architecture (1998).
|
|