ACM Home Page
Please provide us with feedback. Feedback
Provably good multicore cache performance for divide-and-conquer algorithms
Full text PdfPdf (422 KB)
Source Symposium on Discrete Algorithms archive
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms table of contents
San Francisco, California
Pages 501-510  
Year of Publication: 2008
Authors
Guy E. Blelloch  Carnegie Mellon University
Rezaul A. Chowdhury  University of Texas, Austin
Phillip B. Gibbons  Intel Research Pittsburgh
Vijaya Ramachandran  University of Texas, Austin
Shimin Chen  Intel Research Pittsburgh
Michael Kozuch  Intel Research Pittsburgh
Sponsors
: SIAM Activity Group on Discrete Mathematics
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
Society for Industrial and Applied Mathematics  Philadelphia, PA, USA
Bibliometrics
Downloads (6 Weeks): 21,   Downloads (12 Months): 232,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

This paper presents a multicore-cache model that reflects the reality that multicore processors have both per-processor private (L1) caches and a large shared (L2) cache on chip. We consider a broad class of parallel divide-and-conquer algorithms and present a new on-line scheduler, CONTROLLED-PDF, that is competitive with the standard sequential scheduler in the following sense. Given any dynamically unfolding computation DAG from this class of algorithms, the cache complexity on the multicore-cache model under our new scheduler is within a constant factor of the sequential cache complexity for both L1 and L2, while the time complexity is within a constant factor of the sequential time complexity divided by the number of processors p. These are the first such asymptotically-optimal results for any multicore model. Finally, we show that a separator-based algorithm for sparse-matrix-dense-vector-multiply achieves provably good cache performance in the multicore-cache model, as well as in the well-studied sequential cache-oblivious model.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
www.sun.com/processors/UltraSPARC-T1/, 2007.
 
2
www.tilera.com, 2007.
 
3
Intel shows off 80-core processor. www.news.com/2100-1006_3-6158181.html, 2007.
 
4
U. A. Acar, G. E. Blelloch, and R. D. Blumofe. The data locality of work stealing. Theory of Computing Systems, 35(3), 2002. Springer.
5
6
 
7
 
8
B. Alpern, L. Carter, E. Feig, and T. Selker. The uniform memory hierachy model of computation. Algorth-mica, 12(2/3), 1994. Springer.
9
10
11
12
13
14
15
16
17
18
19
 
20
 
21
22
 
23
M. T. Goodrich, M. Nelson, and N. Sitchinava. Sorting in parallel external-memory multicores. Technical report, U.C. Irvine, 2007.
 
24
 
25
 
26
R. J. Lipton and R. E. Tarjan. A separator theorem for planar graphs. SIAM Journal on Applied Mathematics, 36(2), 1979.
 
27
G. J. Narlikar. Scheduling threads for low space requirement and good locality. Theory of Computing Systems, 35(2), 2002. Springer.
28
 
29
V. Strassen. Gaussian elimination is not optimal. Numerische Mathematik, 13(4), 1969. Springer.
30


Collaborative Colleagues:
Guy E. Blelloch: colleagues
Rezaul A. Chowdhury: colleagues
Phillip B. Gibbons: colleagues
Vijaya Ramachandran: colleagues
Shimin Chen: colleagues
Michael Kozuch: colleagues