ACM Home Page
Please provide us with feedback. Feedback
The cache behaviour of large lazy functional programs on stock hardware
Full text PdfPdf (1.26 MB)
Source ACM SIGPLAN Notices archive
Volume 38 ,  Issue 2 supplement  (February 2003) table of contents
MSP 2002 and ISMM 2002
Pages: 44 - 55  
Year of Publication: 2003
ISSN:0362-1340
Also published in ...
Authors
Nicholas Nethercote  Cambridge University, United Kingdom
Alan Mycroft  Cambridge University, United Kingdom
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 21,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/773039.773044
What is a DOI?

ABSTRACT

Lazy functional programs behave differently from imperative programs and these differences extend to cache behaviour. We use hardware counters and a simple yet accurate execution cost model to analyse some large Haskell programs on the x86 architecture. The programs do not interact well with modern processors---L2 cache data miss stalls and branch misprediction stalls account for up to 60% and 32% of execution time respectively. Moreover, the program code exhibits little exploitable instruction-level parallelism.We then use simulation to pinpoint cache misses at the instruction level. With this information we apply prefetching to minimise the cost of write misses, speeding up Haskell programs by up to 22%. We conclude with more ideas for changing the Glasgow Haskell Compiler and its garbage collector to improve the cache performance of large programs.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Advanced Micro Devices, Inc. AMD Athlon processor x86 code optimization guide, July 2001. http://www.amd.com.
 
2
H. G. Baker. Optimizing allocation and garbage collection of spaces. In Winston and Brown, editors, Artificial Intelligence, An MIT Perspective, volume 2, pages 391--396. MIT Press, 1979.
3
4
5
6
 
7
The Glasgow Haskell Compiler. http://www.haskell,org/ghc.
8
 
9
P. H. Hartel, et al. Benchmarking implementations of functional languages with "Pseudoknot" a float-intensive benchmark. Journal of Functional Programming, 6(4):621--655, 1996.
 
10
D. Heller. Rabbit: A performance counters library for Intel/AMD processors and Linux. http://www.scl.ameslab.gov/Projects/Rabbit/.
 
11
Intel. IA-32 Intel architecture software developer's manual, 2001. Order number 245472. http://www.intel.com.
12
 
13
S. Manegold and P. Boncz. Cache-memory and TLB calibration tool. http://www.cwi.nl/~manegold/Calibrator/.
 
14
G. C. Necula and L. George. Accounting for the performance of Standard ML on the DEC Alpha. Technical report, AT&T Bell Labs, Murray Hill, New Jersey, USA, Sept. 1994.
 
15
 
16
S. L. Peyton Jones. Implementing lazy functional languages on stock hardware: the spineless tagless G-machine. Journal of Functional Programming, 2(2):127--202, Apr. 1992.
 
17
 
18
J. Seward. Valgrind, an open-source memory debugger for x86-GNU/Linux. http://developer.kde.org/~jseward/.
19


Collaborative Colleagues:
Nicholas Nethercote: colleagues
Alan Mycroft: colleagues