ACM Home Page
Please provide us with feedback. Feedback
Effectively sharing a cache among threads
Full text PdfPdf (224 KB)
Source ACM Symposium on Parallel Algorithms and Architectures archive
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures table of contents
Barcelona, Spain
SESSION: Caching II table of contents
Pages: 235 - 244  
Year of Publication: 2004
ISBN:1-58113-840-7
Authors
Guy E. Blelloch  Carnegie Mellon University
Phillip B. Gibbons  Intel Research Pittsburgh
Sponsors
ACM: Association for Computing Machinery
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 25,   Downloads (12 Months): 111,   Citation Count: 9
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1007912.1007948
What is a DOI?

ABSTRACT

We compare the number of cache misses M1 for running a computation on a single processor with cache size C1 to the total number of misses Mp for the same computation when using p processors or threads and a shared cache of size Cp. We show that for any computation, and with an appropriate (greedy) parallel schedule, if CpC1 + pd then MpM1. The depth d of the computation is the length of the critical path of dependences. This gives the perhaps surprising result that for sufficiently parallel computations the shared cache need only be an additive size larger than the single-processor cache, and gives some theoretical justification for designing machines with shared caches.We model a computation as a DAG and the sequential execution as a depth first schedule of the DAG. The parallel schedule we study is a parallel depth-first schedule (PDF schedule) based on the sequential one. The schedule is greedy and therefore work-efficient. Our main results assume the Ideal Cache model, but we also present results for other more realistic cache models.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
U. A. Acar, G. E. Blelloch, and R. D. Blumofe. The data locality of work stealing. Theory of Computing Systems, 35(3):321--347, 2002.
2
3
 
4
L. A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese.
5
 
6
 
7
L. A. Belady. A study of replacment algorithms for virtual storage computers. IBM Systems Journal, 5(2):78--101, 1966.
8
9
10
11
 
12
Y.-Y. Chen, J.-K. Peir, and C.-T. King. Performance of shared caches on multithreaded architectures. Journal of Information Science and Engineering, 14(2):499--514, 1998.
13
 
14
 
15
 
16
 
17
18
 
19
R. Kalla, B. Sinharoy, and J. Tendler. Simultaneous multi-threading implementation in POWER5. In 15th IEEE Hot Chips, Aug. 2003.
 
20
D. T. Marr, F. Binns, D. L. Hill, G. Hinton, D. A. Koufaty, J. A. Miller, and M. Upton. Hyper-threading technology architecture and microarchitecture, white paper. Intel Technical Journal, 6(1), Feb. 2002.
21
 
22
G. J. Narlikar. Scheduling threads for low space requirement and good locality. Theory of Computing Systems, 35(2):151--187, 2002.
23
24
25
26
 
27
 
28
J. M. Tendler, S. Dodson, S. Fields, H. Le, and B. Sinharoy. Power4 system microarchitecture, technical white paper. Technical Report 20, IBM Server Group, Oct. 2001.
29
 
30
 
31
32

CITED BY  9

Collaborative Colleagues:
Guy E. Blelloch: colleagues
Phillip B. Gibbons: colleagues