|
ABSTRACT
Simultaneous Multithreading machines benefit from jobscheduling software that monitors how well coscheduled jobs share CPU resources, and coschedules jobs that interact well to make more efficient use of those resources. As a result, informed coscheduling can yield significant performance gains over naive schedulers. However, prior work on coscheduling focused on equal-priority job mixes, which is an unrealistic assumption for modern operating systems.This paper demonstrates that a scheduler for an SMT machine can both satisfy process priorities and symbiotically schedule low and high priority threads to increase system throughput. Naive priority schedulers dedicate the machine to high priority jobs to meet priority goals, and as a result decrease opportunities for increased performance from multithreading and coscheduling. More informed schedulers, however, can dynamically monitor the progress and resource utilization of jobs on the machine, and dynamically adjust the degree of multithreading to improve performance while still meeting priority goals.Using detailed simulation of an SMT architecture, we introduce and evaluate a series of five software and hardware-assisted priority schedulers. Overall, our results indicate that coscheduling priority jobs can significantly increase system throughput by as much as 40%, and that (1) the benefit depends upon the relative priority of the coscheduled jobs, and (2) more sophisticated schedulers are more effective when the differences in priorities are greatest. We show that our priority schedulers can decrease average turnaround times for a random jobmix by as much as 33%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Anant Agarwal , Beng-Hong Lim , David Kranz , John Kubiatowicz, APRIL: a processor architecture for multiprocessing, Proceedings of the 17th annual international symposium on Computer Architecture, p.104-114, May 28-31, 1990, Seattle, Washington, United States
|
 |
3
|
Robert Alverson , David Callahan , Daniel Cummings , Brian Koblenz , Allan Porterfield , Burton Smith, The Tera computer system, Proceedings of the 4th international conference on Supercomputing, p.1-6, June 11-15, 1990, Amsterdam, The Netherlands
|
 |
4
|
Andrea C. Arpaci-Dusseau , David E. Culler , Alan M. Mainwaring, Scheduling with implicit information in distributed systems, Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, p.233-243, June 22-26, 1998, Madison, Wisconsin, United States
|
| |
5
|
R. Blumofe and C. Leiserson. Scheduling multithreaded computations by work stealing. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science, Nov. 1994.
|
 |
6
|
Rohit Chandra , Scott Devine , Ben Verghese , Anoop Gupta , Mendel Rosenblum, Scheduling and page migration for multiprocessor compute servers, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.12-24, October 05-07, 1994, San Jose, California, United States
|
 |
7
|
|
| |
8
|
H. Cofer, N. Camp, and R. Gomperts. Turnaround vs. throughput: Optimal utilization of a multiprocessor system. In SGI Technical Reports, May 1999.
|
| |
9
|
J. Delany. Daylight multithreading toolkit interface. http://www.daylight.com, May 1999.
|
| |
10
|
Marco Fillo , Stephen W. Keckler , William J. Dally , Nicholas P. Carter , Andrew Chang , Yevgeny Gurevich , Whay S. Lee, The M-Machine multicomputer, Proceedings of the 28th annual international symposium on Microarchitecture, p.146-156, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
 |
11
|
Anoop Gupta , Andrew Tucker , Shigeru Urushibara, The impact of operating system scheduling policies and synchronization methods of performance of parallel applications, Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.120-132, May 21-24, 1991, San Diego, California, United States
|
| |
12
|
|
 |
13
|
Hiroaki Hirata , Kozo Kimura , Satoshi Nagamine , Yoshiyuki Mochizuki , Akio Nishimura , Yoshimori Nakase , Teiji Nishizawa, An elementary processor architecture with simultaneous instruction issuing from multiple threads, Proceedings of the 19th annual international symposium on Computer architecture, p.136-145, May 19-21, 1992, Queensland, Australia
|
| |
14
|
|
| |
15
|
S. Leffler, M. McKusick, M. Karels, and J. Quarterman. The Design and Implementation of the 4.3BSD UNIX Operating System. Addison-Wesley, 1989.
|
| |
16
|
J. Little. A simple proof of the queuing formula L =λ W. Operations Research, 9:383-387, 1961.
|
 |
17
|
Jack L. Lo , Joel S. Emer , Henry M. Levy , Rebecca L. Stamm , Dean M. Tullsen , S. J. Eggers, Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading, ACM Transactions on Computer Systems (TOCS), v.15 n.3, p.322-354, Aug. 1997
[doi> 10.1145/263326.263382]
|
| |
18
|
|
| |
19
|
W. Pfeiffer, L. Carter, A. Snavely, R. Leary, A. Majumdar, S. Brunett, J. Feo, B. Koblenz, L. Stern, J. Manke, and T. Boggess. Evaluation of a multithreaded architecture for defense applications. In SDSC Techical Report, June 1999.
|
| |
20
|
|
| |
21
|
|
| |
22
|
S. Sistare, N. Nevin, T. Kimball, and E. Loh. Coscheduling mpi jobs using the spin daemon. In SC 99, Nov. 1999.
|
| |
23
|
A. Snavely and L. Carter. Symbiotic jobscheduling on the MTA. In Workshop on Multi-Threaded Execution, Architecture, and Compilers, Jan. 2000.
|
| |
24
|
A. Snavely, N. Mitchell, L. Carter, J. Ferrante, and D. Tullsen. Explorations in symbiosis on two multithreaded architectures. In Workshop on Multi-Threaded Execution, Architecture, and Compilers, Jan. 1999.
|
 |
25
|
|
| |
26
|
P. Sobalvarro, S. Pakin, W. Weihl, and A. Chien. Dynamic coscheduling on workstation clusters. In SRC Technical Note 1997-017, Mar. 1997.
|
| |
27
|
|
| |
28
|
K. Thompson. Unix implementation. In The Bell System Technical Journal, July 1978.
|
 |
29
|
|
 |
30
|
Josep Torrellas , Andrew Tucker , Anoop Gupta, Benefits of cache-affinity scheduling in shared-memory multiprocessors: a summary, Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.272-274, May 10-14, 1993, Santa Clara, California, United States
|
 |
31
|
|
 |
32
|
Dean M. Tullsen , Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm, Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, Proceedings of the 23rd annual international symposium on Computer architecture, p.191-202, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
33
|
|
 |
34
|
Raj Vaswani , John Zahorjan, The implications of cache affinity on processor scheduling for multiprogrammed, shared memory multiprocessors, Proceedings of the thirteenth ACM symposium on Operating systems principles, p.26-40, October 13-16, 1991, Pacific Grove, California, United States
|
| |
35
|
|
CITED BY 24
|
|
|
|
|
|
|
|
Francisco J. Cazorla , Peter M. W. Knijnenburg , Rizos Sakellariou , Enrique Fernández , Alex Ramirez , Mateo Valero, Architectural support for real-time task scheduling in SMT processors, Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems, September 24-27, 2005, San Francisco, California, USA
|
|
|
|
|
|
|
|
|
Tipp Moseley , Alex Shye , Vijay Janapa Reddi , Matthew Iyer , Dan Fay , David Hodgdon , Joshua L. Kihm , Alex Settle , Dirk Grunwald , Daniel A. Connors, Dynamic run-time architecture techniques for enabling continuous optimization, Proceedings of the 2nd conference on Computing frontiers, May 04-06, 2005, Ischia, Italy
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yunlian Jiang , Xipeng Shen , Jie Chen , Rahul Tripathi, Analysis and approximation of optimal co-scheduling on chip multiprocessors, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
|
|
|
Hongzhong Zheng , Jiang Lin , Zhao Zhang , Eugene Gorbatov , Howard David , Zhichun Zhu, Mini-rank: Adaptive DRAM architecture for improving memory power efficiency, Proceedings of the 2008 41st IEEE/ACM International Symposium on Microarchitecture, p.210-221, November 08-12, 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|