|
Warning: The download time has expired please click on the item to try again.
ABSTRACT
As memory performance becomes increasingly important to overall system performance, the need to carefully schedule memory operations also increases. This paper presents a new approach to memory scheduling that considers the history of recently scheduled operations. This history-based approach provides two conceptual advantages: (1) it allows the scheduler to better reason about the delays associated with its scheduling decisions, and (2) it allows the scheduler to select operations so that they match the program's mixture of Reads and Writes, thereby avoiding certain bottlenecks within the memory controller. We evaluate our solution using a cycle-accurate simulator for the recently announced IBM Power5. When compared with an in-order scheduler, our solution achieves IPC improvements of 10.9% on the NAS benchmarks and 63% on the data-intensive Stream benchmarks. Using microbenchmarks, we illustrate the growing importance of memory scheduling in the context of CMP's, hardware controlled prefetching, and faster CPU speeds.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
[1] D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter, L. Dagum, R. Fatoohi, S. Fineberg, P. Frederickson, T. Lasinski, R. Schreiber, H. Simon, V. Venkatakrishnan, and S. Weeratunga. The NAS parallel benchmarks (94). Technical report, RNR Technical Report RNR-94-007, March 1994.
|
| |
2
|
J. Carter , W. Hsieh , L. Stoller , M. Swanson , L. Zhang , E. Brunvand , A. Davis , C.-C. Kuo , R. Kuramkote , M. Parker , L. Schaelicke , T. Tateyama, Impulse: Building a Smarter Memory Controller, Proceedings of the 5th International Symposium on High Performance Computer Architecture, p.70, January 09-12, 1999
|
 |
3
|
Alan Charlesworth , Nicholas Aneshansley , Mark Haakmeester , Dan Drogichen , Gary Gilbert , Ricki Williams , Andrew Phelps, The Starfire SMP interconnect, Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM), p.1-20, November 15-21, 1997, San Jose, CA
[doi> 10.1145/509593.509630]
|
 |
4
|
Joachim Clabes , Joshua Friedrich , Mark Sweet , Jack DiLullo , Sam Chu , Donald Plass , James Dawson , Paul Muench , Larry Powell , Michael Floyd , Balaram Sinharoy , Mike Lee , Michael Goulet , James Wagoner , Nicole Schwartz , Steve Runyon , Gary Gorman , Phillip Restle , Ronald Kalla , Joseph McGill , Steve Dodson, Design and implementation of the POWER5™ microprocessor, Proceedings of the 41st annual conference on Design automation, June 07-11, 2004, San Diego, CA, USA
[doi> 10.1145/996566.996749]
|
| |
5
|
|
 |
6
|
|
| |
7
|
[7] R. Kalla, B. Sinharoy, and J. Tendler. IBM Power5 chip: A dual-core multithreaded processor. IEEE Micro, 24(2):40- 47, 2004.
|
| |
8
|
Brucek Khailany , William J. Dally , Ujval J. Kapasi , Peter Mattson , Jinyung Namkoong , John D. Owens , Brian Towles , Andrew Chang , Scott Rixner, Imagine: Media Processing with Streams, IEEE Micro, v.21 n.2, p.35-46, March 2001
[doi> 10.1109/40.918001]
|
| |
9
|
[9] J. D. McCalpin. Stream: Sustainable memory bandwidth in high performance computers. Technical report, http://www.cs.virginia.edu/stream/.
|
| |
10
|
Sally A. McKee , William A. Wulf , James H. Aylor , Maximo H. Salinas , Robert H. Klenke , Sung I. Hong , Dee A. B. Weikle, Dynamic Access Ordering for Streamed Computations, IEEE Transactions on Computers, v.49 n.11, p.1255-1271, November 2000
[doi> 10.1109/12.895941]
|
| |
11
|
|
 |
12
|
Scott Rixner , William J. Dally , Ujval J. Kapasi , Peter Mattson , John D. Owens, Memory access scheduling, Proceedings of the 27th annual international symposium on Computer architecture, p.128-138, June 2000, Vancouver, British Columbia, Canada
|
 |
13
|
|
| |
14
|
[14] J. M. Tendler, J. S. Dodson, J. S. Fields Jr., H. Lee, and B. Sinharoy. Power4 system microarchitecture. IBM Journal of Research and Development, 46(1):5-26, 2002.
|
 |
15
|
Mateo Valero , Tomás Lang , José M. Llabería , Montse Peiron , Eduard Ayguadé , Juan J. Navarra, Increasing the number of strides for conflict-free vector access, Proceedings of the 19th annual international symposium on Computer architecture, p.372-381, May 19-21, 1992, Queensland, Australia
|
| |
16
|
Richard Vuduc , James W. Demmel , Katherine A. Yelick , Shoaib Kamil , Rajesh Nishtala , Benjamin Lee, Performance optimizations and bounds for sparse matrix-vector multiply, Proceedings of the 2002 ACM/IEEE conference on Supercomputing, p.1-35, November 16, 2002, Baltimore, Maryland
|
 |
17
|
|
|