|
ABSTRACT
We present a model that enables us to analyze the running time of an algorithm on a computer with a memory hierarchy with limited associativity, in terms of various cache parameters. Our cache model, an extension of Aggarwal and Vitter's I/O model, enables us to establish useful relationships between the cache complexity and the I/O complexity of computations. As a corollary, we obtain cache-efficient algorithms in the single-level cache model for fundamental problems like sorting, FFT, and an important subclass of permutations. We also analyze the average-case cache behavior of mergesort, show that ignoring associativity concerns could lead to inferior performance, and present supporting experimental evidence.We further extend our model to multiple levels of cache with limited associativity and present optimal algorithms for matrix transpose and sorting. Our techniques may be used for systematic exploitation of the memory hierarchy starting from the algorithm design stage, and for dealing with the hitherto unresolved problem of limited associativity.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
A. Aggarwal , B. Alpern , A. Chandra , M. Snir, A model for hierarchical memory, Proceedings of the nineteenth annual ACM conference on Theory of computing, p.305-314, January 1987, New York, New York, United States
[doi> 10.1145/28395.28428]
|
| |
3
|
Aggarwal, A., Chandra, A., and Snir, M. 1987b. Hierarchical memory with block transfer. In Proceedings of IEEE Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., 204--216.
|
 |
4
|
|
| |
5
|
Alpern, B., Carter, L., Feig, E., and Selker, T. 1994. The uniform memory hierarchy model of computation. Algorithmica 12, 2, 72--109.
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
 |
9
|
Siddhartha Chatterjee , Vibhor V. Jain , Alvin R. Lebeck , Shyam Mundhra , Mithuna Thottethodi, Nonlinear array layouts for hierarchical memory systems, Proceedings of the 13th international conference on Supercomputing, p.444-453, June 20-25, 1999, Rhodes, Greece
[doi> 10.1145/305138.305231]
|
 |
10
|
Siddhartha Chatterjee , Alvin R. Lebeck , Praveen K. Patnala , Mithuna Thottethodi, Recursive array layouts and fast parallel matrix multiplication, Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures, p.222-231, June 27-30, 1999, Saint Malo, France
[doi> 10.1145/305619.305645]
|
| |
11
|
Chatterjee, S., and Sen, S. 2000. Cache-efficient matrix transposition. In Proceedings of HPCA-6 (Toulouse, France). 195--205.
|
| |
12
|
Yi-Jen Chiang , Michael T. Goodrich , Edward F. Grove , Roberto Tamassia , Darren Erik Vengroff , Jeffrey Scott Vitter, External-memory graph algorithms, Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms, p.139-149, January 22-24, 1995, San Francisco, California, United States
|
| |
13
|
|
| |
14
|
Floyd, R. 1972. Permuting information in idealized two-level storage. In Complexity of Computer Computations, R. E. Miller and J. W. Thatcher, Eds. Plenum Press, New York, N.Y., 105--109.
|
 |
15
|
|
| |
16
|
Frigo, M., and Johnson, S. G. 1998. FFTW: An adaptive software architecture for the FFT. In Proceedings of ICASSP'98, vol. 3 (Seattle, Wash.). IEEE Computer Society Press, Los Alamitos, Calif., 1381.
|
| |
17
|
|
| |
18
|
Goodrich, M., Tsay, J., Vengroff, D., and Vitter, J. 1993. External memory computational geometry. In Proceeding of IEEE Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., 714--723.
|
| |
19
|
|
 |
20
|
|
| |
21
|
Kamath, A., Motwani, R., Palem, K., and Spirakis, P. 1994. Tail bounds for occupancy and the satisfiability threshold conjecture. In Proceeding of IEEE Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., 592--603.
|
| |
22
|
Richard E. Ladner , James D. Fix , Anthony LaMarca, Cache performance analysis of traversals and random accesses, Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms, p.613-622, January 17-19, 1999, Baltimore, Maryland, United States
|
 |
23
|
Monica D. Lam , Edward E. Rothberg , Michael E. Wolf, The cache performance and optimizations of blocked algorithms, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.63-74, April 08-11, 1991, Santa Clara, California, United States
|
| |
24
|
|
| |
25
|
|
| |
26
|
Mehlhorn, K., and Sanders, P. 2000. Scanning multiple sequences via cache memory. citeseen.nj.nec.com/506957.html.
|
| |
27
|
|
| |
28
|
|
| |
29
|
|
 |
30
|
|
| |
31
|
|
| |
32
|
|
| |
33
|
Vitter, J., and Shriver, E. 1994. Algorithms for parallel memory. I: Two-level memories. Algorithmica 12, 2, 110--147.
|
CITED BY 8
|
|
|
|
|
|
|
|
|
|
|
|
|
|
David Kasik , Andreas Dietrich , Enrico Gobbetti , Fabio Marton , Dinesh Manocha , Philipp Slusallek , Abe Stephens , Sung-Eui Yoon, Massive model visualization techniques: course notes, ACM SIGGRAPH 2008 classes, August 11-15, 2008, Los Angeles, California
|
|
|
|
|
|
|
|
|
|
|