| Analysis and approximation of optimal co-scheduling on chip multiprocessors |
| Full text |
Pdf
(408 KB)
|
Source
|
PACT
archive
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
table of contents
Toronto, Ontario, Canada
SESSION: Multicore memory hierarchy design (part 2)
table of contents
Pages 220-229
Year of Publication: 2008
ISBN:978-1-60558-282-5
|
|
Authors
|
|
Yunlian Jiang
|
College of William and Mary, Williamsburg, VA, USA
|
|
Xipeng Shen
|
College of William and Mary, Williamsburg, VA, USA
|
|
Jie Chen
|
Thomas Jefferson National Accelerator Facility, Newport News, VA, USA
|
|
Rahul Tripathi
|
University of South Florida, Tampa, FL, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 13, Downloads (12 Months): 175, Citation Count: 3
|
|
|
ABSTRACT
Cache sharing among processors is important for Chip Multiprocessors to reduce inter-thread latency, but also brings cache contention, degrading program performance considerably. Recent studies have shown that job co-scheduling can effectively alleviate the contention, but it remains an open question how to efficiently find optimal co-schedules. Solving the question is critical for determining the potential of a co-scheduling system. This paper presents a theoretical analysis of the complexity of co-scheduling, proving its NP-completeness. Furthermore, for a special case when there are two sharers per chip, we propose an algorithm that finds the optimal co-schedules in polynomial time. For more complex cases, we design and evaluate a sequence of approximation algorithms, among which, the hierarchical matching algorithm produces near-optimal schedules and shows good scalability. This study facilitates the evaluation of co-scheduling systems, as well as offers some techniques directly usable in proactive job co-scheduling.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
P. Denning. Thrashing: Its causes and prevention. In Proceedings of the AFIPS 1968 Fall Joint Computer Conference, volume 33, pages 915--922, 1968.
|
| |
5
|
M. DeVuyst, R. Kumar, and D. M. Tullsen. Exploiting unbalanced thread scheduling for energy and performance on a cmp of smt processors. In Proceedings of International Parallel and Distribute Processing Symposium (IPDPS), 2006.
|
| |
6
|
J. Edmonds. Maximum matching and a polyhedron with 0,1-vertices. Journal of Research of the National Bureau of Standards B, 69B:125--130, 1965.
|
| |
7
|
A. El-Moursy, R. Garg, D. H. Albonesi, and S. Dwarkadas. Compatible phase co-scheduling on a cmp of multi-threaded processors. In Proceedings of International Parallel and Distribute Processing Symposium (IPDPS), 2006.
|
| |
8
|
Alexandra Fedorova , Margo Seltzer , Christoper Small , Daniel Nussbaum, Performance of multithreaded chip multiprocessors and implications for operating system design, Proceedings of the annual conference on USENIX Annual Technical Conference, p.26-26, April 10-15, 2005, Anaheim, CA
|
| |
9
|
|
 |
10
|
|
| |
11
|
M. Garey and D. Johnson. Computers and Intractability. Feeman, San Francisco, CA, 1979.
|
 |
12
|
Lisa R. Hsu , Steven K. Reinhardt , Ravishankar Iyer , Srihari Makineni, Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource, Proceedings of the 15th international conference on Parallel architectures and compilation techniques, September 16-20, 2006, Seattle, Washington, USA
[doi> 10.1145/1152154.1152161]
|
 |
13
|
Jaehyuk Huh , Changkyu Kim , Hazim Shafi , Lixin Zhang , Doug Burger , Stephen W. Keckler, A NUCA substrate for flexible CMP cache sharing, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
[doi> 10.1145/1088149.1088154]
|
| |
14
|
Y. Jiang and X. Shen. Exploration of the influence of program inputs on cmp co-scheduling. In European Conference on Parallel Computing (Euro-Par), August 2008.
|
| |
15
|
R. Karp. Reducibility among combinatiorial problems. In R. Miller and J. Thatcher, editors, Complexity of Computer Computations, pages 85--103. Plenum Press, 1972.
|
| |
16
|
|
 |
17
|
|
| |
18
|
J. McCalpin. Memory bandwidth and machine balance in current high performance computers. IEEE TCCA Newsletter, 1995. http://www.cs.virginia.edu/stream.
|
| |
19
|
|
| |
20
|
|
| |
21
|
S. Parekh, S. Eggers, H. Levy, and J. Lo. Thread-sensitive scheduling for smt processors. Technical Report 2000-04-02, University of Washington, June 2000.
|
 |
22
|
|
| |
23
|
|
| |
24
|
X. Shen and J. Shaw. Scalable implementation of efficient locality approximation. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing, 2008.
|
 |
25
|
|
 |
26
|
|
 |
27
|
|
 |
28
|
|
 |
29
|
|
| |
30
|
|
| |
31
|
G. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2002.
|
| |
32
|
|
| |
33
|
Xiao Zhang , Sandhya Dwarkadas , Girts Folkmanis , Kai Shen, Processor hardware counter statistics as a first-class system resource, Proceedings of the 11th USENIX workshop on Hot topics in operating systems, p.1-6, May 07-09, 2007, San Diego, CA
|
 |
34
|
|
CITED BY 3
|
|
|
|
|
Josefa Díaz , J. Ignacio Hidalgo , Francisco Fernández , Oscar Garnica , Sonia López, Improving SMT performance: an application of genetic algorithms to configure resizable caches, Proceedings of the 11th annual conference companion on Genetic and evolutionary computation conference, July 08-12, 2009, Montreal, Québec, Canada
|
|
|
|
|