|
ABSTRACT
Irregular applications, which manipulate large, pointer-based data structures like graphs, are difficult to parallelize manually. Automatic tools and techniques such as restructuring compilers and run-time speculative execution have failed to uncover much parallelism in these applications, in spite of a lot of effort by the research community. These difficulties have even led some researchers to wonder if there is any coarse-grain parallelism worth exploiting in irregular applications. In this paper, we describe two real-world irregular applications: a Delaunay mesh refinement application and a graphics application thatperforms agglomerative clustering. By studying the algorithms and data structures used in theseapplications, we show that there is substantial coarse-grain, data parallelism in these applications, but that this parallelism is very dependent on the input data and therefore cannot be uncoveredby compiler analysis. In principle, optimistic techniques such asthread-level speculation can be used to uncover this parallelism, but we argue that current implementations cannot accomplish thisbecause they do not use the proper abstractions for the data structuresin these programs. These insights have informed our design of the Galois system, an object-based optimistic parallelization system for irregular applications. There are three main aspects to Galois: (1) a small number of syntactic constructs for packaging optimistic parallelism as iteration over ordered and unordered sets, (2)assertions about methods in class libraries, and (3) a runtime scheme for detecting and recovering from potentially unsafe accesses to shared memory made by an optimistic computation. We show that Delaunay mesh generation and agglomerative clustering can be parallelized in a straight-forward way using the Galois approach, and we present experimental measurements to show that this approach is practical. These results suggest that Galois is a practical approach to exploiting data parallelismin irregular programs.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Christos D. Antonopoulos , Xiaoning Ding , Andrey Chernikov , Filip Blagojevic , Dimitrios S. Nikolopoulos , Nikos Chrisochoides, Multigrain parallel Delaunay Mesh generation: challenges and opportunities for multithreaded architectures, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
[doi> 10.1145/1088149.1088198]
|
 |
3
|
|
| |
4
|
A. Bernstein. Analysis of programs for parallel processing. IEEE Transactions on Electronic Computers, 1966.
|
| |
5
|
Michael Burke, Paul Carini, and Jong-Deok Choi. Interprocedural pointer alias analysis. Technical Report IBM RC 21055, IBM Yorktown Heights, 1997.
|
 |
6
|
Brian D. Carlstrom , Austen McDonald , Michael Carbin , Christos Kozyrakis , Kunle Olukotun, Transactional collection classes, Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, March 14-17, 2007, San Jose, California, USA
[doi> 10.1145/1229428.1229441]
|
| |
7
|
C. C. Foster and E. M. Riseman. Percolation of code to enhance parallel dispatching and execution. IEEE Transactions on Computers, 21(12):1411--1415, 1972.
|
 |
8
|
|
| |
9
|
Johan de Galas. The quest for more processing power: is the single core CPU doomed? http://www.anandtech.com/cpuchipsets/ showdoc.aspx?i=2377, February 2005.
|
 |
10
|
|
 |
11
|
|
| |
12
|
Lance Hammond, Vicky Wong, Mike Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prabhu, Honggo Wijaya, Christos Kozyrakis, and Kunle Olukotun. Transactional memory coherence and consistency. ISCA 2004, 00:102, 2004.
|
 |
13
|
Tim Harris , Keir Fraser, Language support for lightweight transactions, Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications, October 26-30, 2003, Anaheim, California, USA
|
| |
14
|
|
 |
15
|
|
 |
16
|
|
 |
17
|
|
 |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, and David A. Wood. Logtm: Log-based transactional memory. In HPCA '06: Proceedings of the 12th International Symposium on High Performance Computer Architecture, 2006.
|
| |
22
|
J. Eliot B. Moss and Antony L. Hosking. Nested transactional memory: Model and preliminary architectural sketches. In SCOOL '05: Sychronization and Concurrency in Object-Oriented Languages, 2005.
|
| |
23
|
J. B. C Neto, P. A. Wawrzynek, M. T. M. Carvalho, L. F. Martha, and A. R. Ingraffea. An algorithm for three-dimensional mesh generation for arbitrary regions with cracks. Engineering with Computers, 17:75--91, 2001.
|
 |
24
|
Yang Ni , Vijay S. Menon , Ali-Reza Adl-Tabatabai , Antony L. Hosking , Richard L. Hudson , J. Eliot B. Moss , Bratin Saha , Tatiana Shpeisman, Open nesting in software transactional memory, Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, March 14-17, 2007, San Jose, California, USA
[doi> 10.1145/1229428.1229442]
|
| |
25
|
Openmp: A proposed industry standard api for shared memory programming. See www.openmp.org, October 28, 1997.
|
| |
26
|
Michael Steinbach Pang-Ning Tan and Vipin Kumar, editors. Introduction to Data Mining. Pearson Addison Wesley, 2005.
|
 |
27
|
|
 |
28
|
Hany E. Ramadan , Christopher J. Rossbach , Donald E. Porter , Owen S. Hofmann , Aditya Bhandari , Emmett Witchel, MetaTM/TxLinux: transactional memory for an operating system, Proceedings of the 34th annual international symposium on Computer architecture, June 09-13, 2007, San Diego, California, USA
|
| |
29
|
|
| |
30
|
|
 |
31
|
Mooly Sagiv , Thomas Reps , Reinhard Wilhelm, Solving shape-analysis problems in languages with destructive updating, Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages, p.16-31, January 21-24, 1996, St. Petersburg Beach, Florida, United States
[doi> 10.1145/237721.237725]
|
 |
32
|
Maged M. Michael , Michael L. Scott, Simple, fast, and practical non-blocking and blocking concurrent queue algorithms, Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing, p.267-275, May 23-26, 1996, Philadelphia, Pennsylvania, United States
[doi> 10.1145/248052.248106]
|
| |
33
|
B. Selman, H. Levesque, and D. Mitchell. A new method for solving hard satisfiability problems. In Proceedings of the Tenth National Conference on Artificial Intelligence, pages 440--446, 1992.
|
 |
34
|
|
| |
35
|
|
 |
36
|
J. Greggory Steffan , Christopher B. Colohan , Antonia Zhai , Todd C. Mowry, A scalable approach to thread-level speculation, Proceedings of the 27th annual international symposium on Computer architecture, p.1-12, June 2000, Vancouver, British Columbia, Canada
|
| |
37
|
Robert Tomasulo. An algorithm for exploiting multiple arithmetic units. IBM Journal, 11(1):25--33, 1967.
|
 |
38
|
|
 |
39
|
|
| |
40
|
|
| |
41
|
|
| |
42
|
|
| |
43
|
|
CITED BY 24
|
|
Milind Kulkarni , Patrick Carribault , Keshav Pingali , Ganesh Ramanarayanan , Bruce Walter , Kavita Bala , L. Paul Chew, Scheduling strategies for optimistic parallel execution of irregular programs, Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures, June 14-16, 2008, Munich, Germany
|
|
|
|
|
|
|
|
|
Lukasz Ziarek , Suresh Jagannathan , Matthew Fluet , Umut A. Acar, Speculative N-Way barriers, Proceedings of the 4th workshop on Declarative aspects of multicore programming, January 20-20, 2009, Savannah, GA, USA
|
|
|
|
|
|
J. Ceng , J. Castrillon , W. Sheng , H. Scharwächter , R. Leupers , G. Ascheid , H. Meyr , T. Isshiki , H. Kunieda, MAPS: an integrated framework for MPSoC application parallelization, Proceedings of the 45th annual conference on Design automation, June 08-13, 2008, Anaheim, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Calin Cascaval , Colin Blundell , Maged Michael , Harold W. Cain , Peng Wu , Stefanie Chiras , Siddhartha Chatterjee, Software Transactional Memory: Why is it Only a Research Toy?, Queue, v.6 n.5, September 2008
|
|
|
Calin Cascaval , Colin Blundell , Maged Michael , Harold W. Cain , Peng Wu , Stefanie Chiras , Siddhartha Chatterjee, Software transactional memory: why is it only a research toy?, Communications of the ACM, v.51 n.11, November 2008
|
|
|
|
|
|
|
|
|
Paruj Ratanaworabhan , Martin Burtscher , Darko Kirovski , Benjamin Zorn , Rahul Nagpal , Karthik Pattabiraman, Detecting and tolerating asymmetric races, Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, February 14-18, 2009, Raleigh, NC, USA
|
|
|
Milind Kulkarni , Martin Burtscher , Rajeshkar Inkulu , Keshav Pingali , Calin Casçaval, How much parallelism is there in irregular applications?, Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, February 14-18, 2009, Raleigh, NC, USA
|
|
|
Hany E. Ramadan , Indrajit Roy , Maurice Herlihy , Emmett Witchel, Committing conflicting transactions in an STM, Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, February 14-18, 2009, Raleigh, NC, USA
|
|
|
|
|
|
|
|
|
|
|
|
Albert Sidelnik , I-Jui Sung , Wanmin Wu , María Jesús Garzarán , Wen-mei Hwu , Klara Nahrstedt , David Padua , Sanjay J. Patel, Optimization of tele-immersion codes, Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, p.85-93, March 08-08, 2009, Washington, D.C.
|
|
|
Vladimir Gajinov , Ferad Zyulkyarov , Osman S. Unsal , Adrian Cristal , Eduard Ayguade , Tim Harris , Mateo Valero, QuakeTM: parallelizing a complex sequential application using transactional memory, Proceedings of the 23rd international conference on Supercomputing, June 08-12, 2009, Yorktown Heights, NY, USA
|
|
|
|
|