|
ABSTRACT
On modern computer systems, the memory performance of an application depends on its locality. For a single execution, locality-correlated measures like average miss rate or working-set size have long been analyzed using reuse distance—the number of distinct locations accessed between consecutive accesses to a given location. This article addresses the analysis problem at the program level, where the size of data and the locality of execution may change significantly depending on the input. The article presents two techniques that predict how the locality of a program changes with its input. The first is approximate reuse-distance measurement, which is asymptotically faster than exact methods while providing a guaranteed precision. The second is statistical prediction of locality in all executions of a program based on the analysis of a few executions. The prediction process has three steps: dividing data accesses into groups, finding the access patterns in each group, and building parameterized models. The resulting prediction may be used on-line with the help of distance-based sampling. When evaluated on fifteen benchmark applications, the new techniques predicted program locality with good accuracy, even for test executions that are orders of magnitude larger than the training executions. The two techniques are among the first to enable quantitative analysis of whole-program locality in general sequential code. These findings form the basis for a unified understanding of program locality and its many facets. Concluding sections of the article present a taxonomy of related literature along five dimensions of locality and discuss the role of reuse distance in performance modeling, program optimization, cache and virtual memory management, and network traffic analysis.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
Virgílio Almeida , Azer Bestavros , Mark Crovella , Adriana de Oliveira, Characterizing reference locality in the WWW, Proceedings of the fourth international conference on on Parallel and distributed information systems, p.92-107, December 18-20, 1996, Miami Beach, Florida, United States
|
 |
5
|
Noga Alon , Yossi Matias , Mario Szegedy, The space complexity of approximating the frequency moments, Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, p.20-29, May 22-24, 1996, Philadelphia, Pennsylvania, United States
[doi> 10.1145/237814.237823]
|
 |
6
|
|
| |
7
|
|
| |
8
|
Batson, A. P. and Madison, A. W. 1976. Measurements of major locality phases in symbolic reference strings. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems.
|
| |
9
|
Bennett, B. T. and Kruskal, V. J. 1975. LRU stack processing. IBM J. Resear. Devel. 353--357.
|
| |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
Beyls, K. and D'Hollander, E. 2006a. Discovery of locality-improving refactoring by reuse path analysis. In Proceedings of the High-Performance Computing and Communications Council. Springer. Lecture Notes in Computer Science, vol. 4208. 220--229.
|
 |
15
|
|
| |
16
|
|
 |
17
|
|
 |
18
|
Brad Calder , Chandra Krintz , Simmi John , Todd Austin, Cache-conscious data placement, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.139-149, October 02-07, 1998, San Jose, California, United States
|
| |
19
|
|
| |
20
|
|
 |
21
|
|
 |
22
|
|
| |
23
|
|
 |
24
|
Siddhartha Chatterjee , Erin Parker , Philip J. Hanlon , Alvin R. Lebeck, Exact analysis of the cache behavior of nested loops, Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation, p.286-297, June 2001, Snowbird, Utah, United States
|
| |
25
|
|
| |
26
|
Cheng, R. and Ding, C. 2005. Measuring temporal locality variation across program inputs. Tech. rep. TR 875, Department of Computer Science, University of Rochester.
|
 |
27
|
|
| |
28
|
|
 |
29
|
Trishul M. Chilimbi , Mark D. Hill , James R. Larus, Cache-conscious structure layout, Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, p.1-12, May 01-04, 1999, Atlanta, Georgia, United States
|
 |
30
|
|
 |
31
|
|
| |
32
|
Cocke, J. and Kennedy, K. 1974. Profitability computations on program flow graphs. Tech. rep. RC 5123, IBM.
|
| |
33
|
|
| |
34
|
|
| |
35
|
|
 |
36
|
|
| |
37
|
|
 |
38
|
|
| |
39
|
|
| |
40
|
|
| |
41
|
|
| |
42
|
|
 |
43
|
|
 |
44
|
Xiaoming Gu , Ian Christopher , Tongxin Bai , Chengliang Zhang , Chen Ding, A component model of spatial locality, Proceedings of the 2009 international symposium on Memory management, June 19-20, 2009, Dublin, Ireland
[doi> 10.1145/1542431.1542446]
|
| |
45
|
|
| |
46
|
|
| |
47
|
|
| |
48
|
|
 |
49
|
|
 |
50
|
Yunlian Jiang , Xipeng Shen , Jie Chen , Rahul Tripathi, Analysis and approximation of optimal co-scheduling on chip multiprocessors, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
[doi> 10.1145/1454115.1454146]
|
| |
51
|
|
| |
52
|
Kelly, T., Cohen, I., Goldszmidt, M., and Keeton, K. 2004. Inducing models of black-box storage arrays. Tech. rep. HPL-2004-108, HP Laboratories Palo Alto, CA.
|
| |
53
|
Wayne Kelly , Vadim Maslov , William Pugh , Evan Rosser , Tatiana Shpeisman , David Wonnacott, The Omega Library interface guide, University of Maryland at College Park, College Park, MD, 1995
|
| |
54
|
|
 |
55
|
Yul H. Kim , Mark D. Hill , David A. Wood, Implementing stack simulation for highly-associative memories, Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.212-213, May 21-24, 1991, San Diego, California, United States
|
| |
56
|
|
 |
57
|
|
| |
58
|
Knuth, D. 1971. An empirical study of FORTRAN programs. Softw. Pract. Exper. 1, 105--133.
|
 |
59
|
Induprakas Kodukula , Nawaaz Ahmed , Keshav Pingali, Data-centric multi-level blocking, Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, p.346-357, June 16-18, 1997, Las Vegas, Nevada, United States
|
| |
60
|
|
| |
61
|
Liu, J., Chen, H., Yew, P.-C., and Hsu, W.-C. 2004. Design and implementation of a lightweight dynamic optimization system. J. Instruct.-Level Paral. 6.
|
 |
62
|
|
| |
63
|
Marin, G. and Mellor-Crummey, J. 2005. Scalable cross-architecture predictions of memory hierarchy response for scientific applications. In Proceedings of the Symposium of the Las Alamos Computer Science Institute.
|
| |
64
|
Mattson, R. L., Gecsei, J., Slutz, D., and Traiger, I. L. 1970. Evaluation techniques for storage hierarchies. IBM Syst. J. 9, 2, 78--117.
|
 |
65
|
|
| |
66
|
|
| |
67
|
Olken, F. 1981. Efficient methods for calculating the success function of fixed space replacement policies. Tech. rep. LBL-12370, Lawrence Berkeley Laboratory.
|
 |
68
|
|
| |
69
|
Porterfield, A. 1989. Software methods for improvement of cache performance. Ph.D. thesis, Department of Computer Science, Rice University.
|
| |
70
|
Rawlings, J. O. 1988. Applied Regression Analysis: A Research Tool. Wadsworth and Brooks.
|
 |
71
|
Edward Rothberg , Jaswinder Pal Singh , Anoop Gupta, Working sets, cache sizes, and node granularity issues for large-scale multiprocessors, Proceedings of the 20th annual international symposium on Computer architecture, p.14-26, May 16-19, 1993, San Diego, California, United States
|
 |
72
|
|
 |
73
|
|
 |
74
|
|
 |
75
|
Xipeng Shen , Michael L. Scott , Chengliang Zhang , Sandhya Dwarkadas , Chen Ding , Mitsunori Ogihara, Analysis of input-dependent program behavior using active profiling, Proceedings of the 2007 workshop on Experimental computer science, p.5-es, June 13-14, 2007, San Diego, California
[doi> 10.1145/1281700.1281705]
|
 |
76
|
|
| |
77
|
Shen, X., Zhong, Y., and Ding, C. 2004b. Phase-based miss rate prediction. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing.
|
| |
78
|
|
 |
79
|
|
| |
80
|
|
| |
81
|
|
 |
82
|
|
 |
83
|
|
 |
84
|
|
 |
85
|
|
| |
86
|
|
 |
87
|
|
| |
88
|
|
 |
89
|
|
 |
90
|
Rémi Triolet , Francois Irigoin , Paul Feautrier, Direct parallelization of call statements, Proceedings of the 1986 SIGPLAN symposium on Compiler construction, p.176-185, June 25-27, 1986, Palo Alto, California, United States
|
 |
91
|
|
 |
92
|
|
 |
93
|
|
| |
94
|
|
| |
95
|
|
| |
96
|
|
| |
97
|
|
 |
98
|
Qing Yi , Vikram Adve , Ken Kennedy, Transforming loops to recursion for multi-level memory hierarchies, Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, p.169-181, June 18-21, 2000, Vancouver, British Columbia, Canada
|
 |
99
|
Chengliang Zhang , Chen Ding , Mitsunori Ogihara , Yutao Zhong , Youfeng Wu, A hierarchical model of data locality, Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages, p.16-29, January 11-13, 2006, Charleston, South Carolina, USA
|
 |
100
|
|
 |
101
|
|
| |
102
|
Zhong, Y., Ding, C., and Kennedy, K. 2002. Reuse distance analysis for scientific programs. In Proceedings of Workshop on Languages, Compilers, and Runtime Systems for Scalable Computers.
|
| |
103
|
|
 |
104
|
|
 |
105
|
Pin Zhou , Vivek Pandey , Jagadeesan Sundaresan , Anand Raghuraman , Yuanyuan Zhou , Sanjeev Kumar, Dynamic tracking of page miss ratio curve for memory management, Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, October 07-13, 2004, Boston, MA, USA
|
| |
106
|
|
|