| Fundamental parallel algorithms for private-cache chip multiprocessors |
| Full text |
Pdf
(385 KB)
|
Source
|
ACM Symposium on Parallel Algorithms and Architectures
archive
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
table of contents
Munich, Germany
SESSION: Special track -- multicore algorithms
table of contents
Pages 197-206
Year of Publication: 2008
ISBN:978-1-59593-973-9
|
|
Authors
|
|
Lars Arge
|
University of Aarhus, Aarhus, Denmark
|
|
Michael T. Goodrich
|
University of California, Irvine, Irvine, CA, USA
|
|
Michael Nelson
|
University of California, Irvine, Irvine, CA, USA
|
|
Nodari Sitchinava
|
University of California, Irvine, Irvine, CA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 14, Downloads (12 Months): 173, Citation Count: 1
|
|
|
ABSTRACT
In this paper, we study parallel algorithms for private-cache chip multiprocessors (CMPs), focusing on methods for foundational problems that are scalable with the number of cores. By focusing on private-cache CMPs, we show that we can design efficient algorithms that need no additional assumptions about the way cores are interconnected, for we assume that all inter-processor communication occurs through the memory hierarchy. We study several fundamental problems, including prefix sums, selection, and sorting, which often form the building blocks of other parallel algorithms. Indeed, we present two sorting algorithms, a distribution sort and a mergesort. Our algorithms are asymptotically optimal in terms of parallel cache accesses and space complexity under reasonable assumptions about the relationships between the number of processors, the size of memory, and the size of cache blocks. In addition, we study sorting lower bounds in a computational model, which we call the parallel external-memory (PEM) model, that formalizes the essential properties of our algorithms for private-cache CMPs.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Michael A. Bender , Jeremy T. Fineman , Seth Gilbert , Bradley C. Kuszmaul, Concurrent cache-oblivious b-trees, Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures, July 18-20, 2005, Las Vegas, Nevada, USA
[doi> 10.1145/1073970.1074009]
|
| |
3
|
Guy E. Blelloch , Rezaul A. Chowdhury , Phillip B. Gibbons , Vijaya Ramachandran , Shimin Chen , Michael Kozuch, Provably good multicore cache performance for divide-and-conquer algorithms, Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms, p.501-510, January 20-22, 2008, San Francisco, California
|
 |
4
|
Shimin Chen , Phillip B. Gibbons , Michael Kozuch , Vasileios Liaskovitis , Anastassia Ailamaki , Guy E. Blelloch , Babak Falsafi , Limor Fix , Nikos Hardavellas , Todd C. Mowry , Chris Wilkerson, Scheduling threads for constructive cache sharing on CMPs, Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures, June 09-11, 2007, San Diego, California, USA
[doi> 10.1145/1248377.1248396]
|
| |
5
|
Yi-Jen Chiang , Michael T. Goodrich , Edward F. Grove , Roberto Tamassia , Darren Erik Vengroff , Jeffrey Scott Vitter, External-memory graph algorithms, Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms, p.139-149, January 22-24, 1995, San Francisco, California, United States
|
| |
6
|
|
| |
7
|
|
 |
8
|
|
 |
9
|
David Culler , Richard Karp , David Patterson , Abhijit Sahay , Klaus Erik Schauser , Eunice Santos , Ramesh Subramonian , Thorsten von Eicken, LogP: towards a realistic model of parallel computation, Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.1-12, May 19-22, 1993, San Diego, California, United States
|
| |
10
|
|
| |
11
|
F. Dehne, W. Dittrich, D. Hutchinson, and A. Maheshwari. Bulk synchronous parallel algorithms for the external memory model. Theory of Computing Systems, 35(6):567--598, 2002.
|
| |
12
|
|
 |
13
|
|
| |
14
|
|
 |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
 |
19
|
Richard M. Karp , Abhijit Sahay , Eunice E. Santos , Klaus Erik Schauser, Optimal broadcast and summation in the LogP model, Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures, p.142-153, June 30-July 02, 1993, Velen, Germany
[doi> 10.1145/165231.165250]
|
| |
20
|
G. Lowney. Why Intel is designing multi-core processors. https://conferences.umiacs.umd.edu/paa/lowney.pdf.
|
 |
21
|
|
 |
22
|
|
| |
23
|
|
| |
24
|
|
 |
25
|
|
| |
26
|
|
| |
27
|
|
| |
28
|
|
 |
29
|
|
| |
30
|
J. S. Vitter and E. A. M. Shriver. Algorithms for parallel memory I: Two-level memories. Algorithmica, 12(2--3):110--147, 1994.
|
|