ABSTRACT
As memory access times grow larger relative to processor cycle
times, the cache performance of algorithms has an increasingly
large impact on overall performance. Unfortunately, most commonly
used algorithms were not designed with cache performance in mind.
This paper investigates the cache performance of implicit heaps. We
present optimizations which significantly reduce the cache misses
that heaps incur and improve their overall performance. We present
an analytical model called collective analysis that allows cache
performance to be predicted as a function of both cache
configuration and algorithm configuration. As part of our
investigation, we perform an approximate analysis of the cache
performance of both traditional heaps and our improved heaps in our
model. In addition empirical data is given for five architectures
to show the impact our optimizations have on overall performance.
We also revisit a priority queue study originally performed by
Jones [25]. Due to the increases in cache miss penalties, the
relative performance results we obtain on today's machines differ
greatly from the machines of only ten years ago. We compare the
performance of implicit heaps, skew heaps and splay trees and
discuss the difference between our results and Jones's.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
 |
3
|
A. Aggarwal , B. Alpern , A. Chandra , M. Snir, A model for hierarchical memory, Proceedings of the nineteenth annual ACM conference on Theory of computing, p.305-314, January 1987, New York, New York, United States
[doi> 10.1145/28395.28428]
|
| |
4
|
|
| |
5
|
{5} B. Alpern, L. Carter, E. Feig, and T. Selker. The uniform memory hierarchy model of computation. <i>Algorithmica</i>, 12:2-3:72-109, 1994.
|
 |
6
|
|
| |
7
|
|
 |
8
|
Steve Carr , Kathryn S. McKinley , Chau-Wen Tseng, Compiler optimizations for improving data locality, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.252-262, October 05-07, 1994, San Jose, California, United States
|
 |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
{14} E. Doberkat. Inserting a new element into a heap. <i>BIT</i>, 21:225-269, 1981.
|
| |
15
|
{15} E. Doberkat. Deleting the root of a heap. <i>Acta Informatica</i>, 17:245-265, 1982.
|
| |
16
|
J. J. Dongarra , Orlie Brewer , James Arthur Kohl , Samuel Fineberg, A tool to aid in the design, implementation, and understanding of matrix algorithms for parallel processors, Journal of Parallel and Distributed Computing, v.9 n.2, p.185-202, June 1990
[doi> 10.1016/0743-7315(90)90045-Q]
|
 |
17
|
M. Farrens , G. Tyson , A. R. Pleszkun, A study of single-chip processor/cache organizations for large numbers of transistors, Proceedings of the 21ST annual international symposium on Computer architecture, p.338-347, April 18-21, 1994, Chicago, Illinois, United States
|
| |
18
|
|
| |
19
|
{19} Robert W. Floyd. Treesort 3. <i>Communications of the ACM</i>, 7:12:701, 1964.
|
| |
20
|
|
| |
21
|
|
 |
22
|
Dirk Grunwald , Benjamin Zorn , Robert Henderson, Improving the cache locality of memory allocation, Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation, p.177-186, June 21-25, 1993, Albuquerque, New Mexico, United States
|
| |
23
|
|
| |
24
|
{24} D.B. Johnson. Priority queues with update and finding minimum spanning trees. <i>Information Processing Letters</i>, 4, 1975.
|
 |
25
|
|
 |
26
|
|
| |
27
|
|
| |
28
|
|
| |
29
|
|
| |
30
|
|
 |
31
|
Margaret Martonosi , Anoop Gupta , Thomas Anderson, MemSpy: analyzing memory system bottlenecks in programs, Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, p.1-12, June 01-05, 1992, Newport, Rhode Island, United States
|
| |
32
|
|
 |
33
|
|
| |
34
|
|
| |
35
|
|
 |
36
|
|
 |
37
|
|
 |
38
|
O. Temam , C. Fricker , W. Jalby, Cache interference phenomena, Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.261-271, May 16-20, 1994, Nashville, Tennessee, United States
|
 |
39
|
Richard Uhlig , David Nagle , Tim Stanley , Trevor Mudge , Stuart Sechrest , Richard Brown, Design tradeoffs for software-managed TLBs, ACM Transactions on Computer Systems (TOCS), v.12 n.3, p.175-205, Aug. 1994
[doi> 10.1145/185514.185515]
|
| |
40
|
|
 |
41
|
|
| |
42
|
{42} J. W. Williams. Heapsort. <i>Communications of the ACM</i>, 7:6:347-348, 1964.
|
 |
43
|
|
CITED BY 28
|
|
Richard E. Ladner , James D. Fix , Anthony LaMarca, Cache performance analysis of traversals and random accesses, Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms, p.613-622, January 17-19, 1999, Baltimore, Maryland, United States
|
|
|
|
|
|
|
|
|
|
|
|
Siddhartha Chatterjee , Vibhor V. Jain , Alvin R. Lebeck , Shyam Mundhra , Mithuna Thottethodi, Nonlinear array layouts for hierarchical memory systems, Proceedings of the 13th international conference on Supercomputing, p.444-453, June 20-25, 1999, Rhodes, Greece
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Christopher Barton , Peng Zhao , Robert Niewiadomski , José Nelson Amaral, Identifying opportunities for automatic remote field cloning, Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research, p.124-134, October 04-07, 2004, Markham, Ontario, Canada
|
|
|
|
|
|
Lars Arge , Michael A. Bender , Erik D. Demaine , Bryan Holland-Minkley , J. Ian Munro, Cache-oblivious priority queue and graph algorithm applications, Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, May 19-21, 2002, Montreal, Quebec, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|