|
ABSTRACT
Lazy functional programs behave differently from imperative programs and these differences extend to cache behaviour. We use hardware counters and a simple yet accurate execution cost model to analyse some large Haskell programs on the x86 architecture. The programs do not interact well with modern processors---L2 cache data miss stalls and branch misprediction stalls account for up to 60% and 32% of execution time respectively. Moreover, the program code exhibits little exploitable instruction-level parallelism.We then use simulation to pinpoint cache misses at the instruction level. With this information we apply prefetching to minimise the cost of write misses, speeding up Haskell programs by up to 22%. We conclude with more ideas for changing the Glasgow Haskell Compiler and its garbage collector to improve the cache performance of large programs.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Advanced Micro Devices, Inc. AMD Athlon processor x86 code optimization guide, July 2001. http://www.amd.com.
|
| |
2
|
H. G. Baker. Optimizing allocation and garbage collection of spaces. In Winston and Brown, editors, Artificial Intelligence, An MIT Perspective, volume 2, pages 391--396. MIT Press, 1979.
|
 |
3
|
Trishul M. Chilimbi , Bob Davidson , James R. Larus, Cache-conscious structure definition, Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, p.13-24, May 01-04, 1999, Atlanta, Georgia, United States
|
 |
4
|
|
 |
5
|
|
 |
6
|
|
| |
7
|
The Glasgow Haskell Compiler. http://www.haskell,org/ghc.
|
 |
8
|
|
| |
9
|
P. H. Hartel, et al. Benchmarking implementations of functional languages with "Pseudoknot" a float-intensive benchmark. Journal of Functional Programming, 6(4):621--655, 1996.
|
| |
10
|
D. Heller. Rabbit: A performance counters library for Intel/AMD processors and Linux. http://www.scl.ameslab.gov/Projects/Rabbit/.
|
| |
11
|
Intel. IA-32 Intel architecture software developer's manual, 2001. Order number 245472. http://www.intel.com.
|
 |
12
|
|
| |
13
|
S. Manegold and P. Boncz. Cache-memory and TLB calibration tool. http://www.cwi.nl/~manegold/Calibrator/.
|
| |
14
|
G. C. Necula and L. George. Accounting for the performance of Standard ML on the DEC Alpha. Technical report, AT&T Bell Labs, Murray Hill, New Jersey, USA, Sept. 1994.
|
| |
15
|
|
| |
16
|
S. L. Peyton Jones. Implementing lazy functional languages on stock hardware: the spineless tagless G-machine. Journal of Functional Programming, 2(2):127--202, Apr. 1992.
|
| |
17
|
|
| |
18
|
J. Seward. Valgrind, an open-source memory debugger for x86-GNU/Linux. http://developer.kde.org/~jseward/.
|
 |
19
|
Paul R. Wilson , Michael S. Lam , Thomas G. Moher, Caching considerations for generational garbage collection, Proceedings of the 1992 ACM conference on LISP and functional programming, p.32-42, June 22-24, 1992, San Francisco, California, United States
|
CITED BY
|
|
Cristian Perfumo , Nehir Sönmez , Srdjan Stipic , Osman Unsal , Adrián Cristal , Tim Harris , Mateo Valero, The limits of software transactional memory (STM): dissecting Haskell STM applications on a many-core environment, Proceedings of the 2008 conference on Computing frontiers, May 05-07, 2008, Ischia, Italy
|
|