|
ABSTRACT
The need to carefully schedule memory operations has increased as memory performance has become increasingly important to overall system performance. This article describes the adaptive history-based (AHB) scheduler, which uses the history of recently scheduled operations to provide three conceptual benefits: (1) it allows the scheduler to better reason about the delays associated with its scheduling decisions, (2) it provides a mechanism for combining multiple constraints, which is important for increasingly complex DRAM structures, and (3) it allows the scheduler to select operations so that they match the program's mixture of Reads and Writes, thereby avoiding certain bottlenecks within the memory controller. We have previously evaluated this scheduler in the context of the IBM Power5. When compared with the state of the art, this scheduler improves performance by 15.6%, 9.9%, and 7.6% for the Stream, NAS, and commercial benchmarks, respectively. This article expands our understanding of the AHB scheduler in a variety of ways. Looking backwards, we describe the scheduler in the context of prior work that focused exclusively on avoiding bank conflicts, and we show that the AHB scheduler is superior for the IBM Power5, which we argue will be representative of future microprocessor memory controllers. Looking forwards, we evaluate this scheduler in the context of future systems by varying a number of microarchitectural features and hardware parameters. For example, we show that the benefit of this scheduler increases as we move to multithreaded environments.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Bailey, D., Barszcz, E., Barton, J., Browning, D., Carter, R., Dagum, L., Fatoohi, R., Fineberg, S., Frederickson, P., Lasinski, T., Schreiber, R., Simon, H., Venkatakrishnan, V., and Weeratunga, S. 1994. The NAS parallel benchmarks (94). Tech. rep. RNR-94-007, NASA Ames Research Center.
|
| |
2
|
J. Carter , W. Hsieh , L. Stoller , M. Swanson , L. Zhang , E. Brunvand , A. Davis , C.-C. Kuo , R. Kuramkote , M. Parker , L. Schaelicke , T. Tateyama, Impulse: Building a Smarter Memory Controller, Proceedings of the 5th International Symposium on High Performance Computer Architecture, p.70, January 09-12, 1999
|
 |
3
|
Joachim Clabes , Joshua Friedrich , Mark Sweet , Jack DiLullo , Sam Chu , Donald Plass , James Dawson , Paul Muench , Larry Powell , Michael Floyd , Balaram Sinharoy , Mike Lee , Michael Goulet , James Wagoner , Nicole Schwartz , Steve Runyon , Gary Gorman , Phillip Restle , Ronald Kalla , Joseph McGill , Steve Dodson, Design and implementation of the POWER5™ microprocessor, Proceedings of the 41st annual conference on Design automation, June 07-11, 2004, San Diego, CA, USA
[doi> 10.1145/996566.996749]
|
| |
4
|
|
 |
5
|
|
| |
6
|
Foster, J. E. 2000. Memory controller and method for dynamic page management. U.S. Patent 6,052,134.
|
 |
7
|
|
 |
8
|
|
| |
9
|
Harriman, D. J., Langendorf, B. K., and Ajanovic, J. 2000. Method and apparatus for improving system performance when reordering commands. U.S. Patent 6,088,772.
|
| |
10
|
Harris, J. G. 2003. Apparatus and method for handling memory access requests in a data processing system. U.S. Patent 6,601,151.
|
| |
11
|
|
| |
12
|
Hur, I. 2007. Method and system for creating and dynamically selecting an arbiter design in a data processing system. US patent 7,287,111.
|
| |
13
|
|
| |
14
|
|
| |
15
|
Jenne, J. E. and Olarig, S. P. 2003. Method and apparatus for scheduling memory calibrations based on transactions. U.S. Patent 6,631,440.
|
| |
16
|
|
| |
17
|
Kessler, R. E., Bertone, M. S., Braganza, M. C., Bouchard, G. A., and Steinman, M. B. 2003. System for minimizing memory bank conflicts in a computer system. U.S. Patent 6,622,225.
|
| |
18
|
Brucek Khailany , William J. Dally , Ujval J. Kapasi , Peter Mattson , Jinyung Namkoong , John D. Owens , Brian Towles , Andrew Chang , Scott Rixner, Imagine: Media Processing with Streams, IEEE Micro, v.21 n.2, p.35-46, March 2001
[doi> 10.1109/40.918001]
|
| |
19
|
Larson, D. A. 2001. Apparatus for controlling pipelined memory access requests. U.S. Patent 6,321,233.
|
| |
20
|
Mathew, B. 2000. Parallel vector access: A technique for improving memory system performance. M.S. thesis, University of Utah.
|
 |
21
|
Binu K. Mathew , Sally A. McKee , John B. Carter , Al Davis, Algorithmic foundations for a parallel vector access memory system, Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures, p.156-165, July 09-13, 2000, Bar Harbor, Maine, United States
[doi> 10.1145/341800.341819]
|
| |
22
|
Mathew, B., McKee, S. A., Carter, J. B., and Davis, A. 2000b. Design of a parallel vector access unit for SDRAM memory systems. In Proceedings of the 6th International Symposium on High-Performance Computer Architecture (HPCA-6). 39--48.
|
| |
23
|
McCalpin, J. D. 1995. Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter.
|
| |
24
|
McGee, B. J. and Chau, J. B. 2005. Memory controller and method using read and write queues and an ordering queue for dispatching read and write memory requests out of order to reduce memory latency. U.S. Patent 6,877,077.
|
| |
25
|
|
| |
26
|
|
| |
27
|
Sally A. McKee , Robert H. Klenke , Kenneth L. Wright , William A. Wulf , Maximo H. Salinas , James H. Aylor , Alan P. Batson, Smarter Memory: Improving Bandwidth for Streamed References, Computer, v.31 n.7, p.54-63, July 1998
[doi> 10.1109/2.689677]
|
| |
28
|
Sally A. McKee , William A. Wulf , James H. Aylor , Maximo H. Salinas , Robert H. Klenke , Sung I. Hong , Dee A. B. Weikle, Dynamic Access Ordering for Streamed Computations, IEEE Transactions on Computers, v.49 n.11, p.1255-1271, November 2000
[doi> 10.1109/12.895941]
|
| |
29
|
Micron. 2004. http://download.micron.com/pdf/datasheets/dram/ddr2/512MbDDR2.pdf.
|
| |
30
|
|
 |
31
|
Montse Peiron , Mateo Valero , Eduard Ayguadé , Tomás Lang, Vector multiprocessors with arbitrated memory access, Proceedings of the 22nd annual international symposium on Computer architecture, p.243-252, June 22-24, 1995, S. Margherita Ligure, Italy
|
| |
32
|
|
 |
33
|
|
| |
34
|
|
 |
35
|
Scott Rixner , William J. Dally , Ujval J. Kapasi , Peter Mattson , John D. Owens, Memory access scheduling, Proceedings of the 27th annual international symposium on Computer architecture, p.128-138, June 2000, Vancouver, British Columbia, Canada
|
| |
36
|
Sah, S., Kulick, S. S., Udompanyanan, V., Natarajan, C., and Pai, H. S. 2006. Memory read/write reordering. U.S. Patent 7,047,374.
|
 |
37
|
|
| |
38
|
|
| |
39
|
Tendler, J. M., Dodson, J. S., Fields Jr., J. S., Lee, H., and Sinharoy, B. 2002. Power4 system microarchitecture. IBM J. Resear. Develop. 46, 1, 5--26.
|
 |
40
|
Mateo Valero , Tomás Lang , José M. Llabería , Montse Peiron , Eduard Ayguadé , Juan J. Navarra, Increasing the number of strides for conflict-free vector access, Proceedings of the 19th annual international symposium on Computer architecture, p.372-381, May 19-21, 1992, Queensland, Australia
|
|