|
ABSTRACT
We study a distributed version of the order scheduling problem that arises when scheduling memory requests in shared DRAM systems of many-core architectures. In this problem, a set of n customer orders needs to be scheduled on multiple facilities. An order can consist of multiple requests, each of which has to be serviced on one designated facility, and an order is completed only when all its requests have been serviced. In the distributed setting, every facility has its own request buffer and must schedule the requests having only limited knowledge about the buffer state at other facilities In this paper, we quantify the trade-off between the amount of communication among different facilities and the quality of the resulting global solution. We show that without communication, the average completion time of all orders can be by a factor Ω(√n) worse than in the optimal schedule. On the other hand, there exists a 2-approximation algorithm if the complete buffer states are exchanged in n communication rounds. We then prove a general upper bound that characterizes the region between these extreme points. Specifically, we devise a distributed scheduling algorithm that, for any k, achieves an approximation ratio of O(k) in n/k communication rounds. Finally, we empirically test the performance of our different algorithms in a many-core environment using SPEC CPU2006 benchmarks as well as Windows desktop application traces.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Sanjay Bhansali , Wen-Ke Chen , Stuart de Jong , Andrew Edwards , Ron Murray , Milenko Drinić , Darek Mihočka , Joe Chau, Framework for instruction-level tracing and analysis of program executions, Proceedings of the 2nd international conference on Virtual execution environments, June 14-16, 2006, Ottawa, Ontario, Canada
[doi> 10.1145/1134760.1220164]
|
| |
2
|
Z. L. Chen and N. G. Hall. Supply Chain Scheduling: Assembly Systems. Working Paper, Department of Systems Engineering, University of Pennsylvania, 2000.
|
| |
3
|
I. Duenyas. Estimating the Throughput of Cyclic. Management Science, 39:616--625, 1993.
|
| |
4
|
J. M. Frailong, W. Jalby, and J. Lenfant. "XOR-Schemes: A flexible data organization in parallel memories". In Proc. of International Conference on Parallel Processing (ICPP), 1985.
|
| |
5
|
N. Garg, A. Kumar, and V. Pandit. Order Scheduling Models : Hardness and Algorithms. In Proc. of the Foundations of Software Technology and Theoretical Computer Science (FSTTCS), 2007.
|
| |
6
|
T. Karkhanis and J. E. Smith. A day in the life of a data cache miss. In Workshop on Memory Performance Issues, 2002.
|
 |
7
|
|
| |
8
|
|
| |
9
|
|
 |
10
|
Chi-Keung Luk , Robert Cohn , Robert Muth , Harish Patil , Artur Klauser , Geoff Lowney , Steven Wallace , Vijay Janapa Reddi , Kim Hazelwood, Pin: building customized program analysis tools with dynamic instrumentation, Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, June 12-15, 2005, Chicago, IL, USA
|
| |
11
|
Micron. 1Gb DDR2 SDRAM Component: MT47H128M8HQ-25, May 2007.
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
 |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
Harish Patil , Robert Cohn , Mark Charney , Rajiv Kapoor , Andrew Sun , Anand Karunanidhi, Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.81-92, December 04-08, 2004, Portland, Oregon
[doi> 10.1109/MICRO.2004.28]
|
| |
19
|
|
| |
20
|
|
 |
21
|
Scott Rixner , William J. Dally , Ujval J. Kapasi , Peter Mattson , John D. Owens, Memory access scheduling, Proceedings of the 27th annual international symposium on Computer architecture, p.128-138, June 2000, Vancouver, British Columbia, Canada
|
| |
22
|
|
| |
23
|
|
 |
24
|
|
| |
25
|
Standard Performance Evaluation Corporation. SPEC CPU2006. http://www.spec.org/cpu2006/.
|
| |
26
|
C. S. Sung and S. H. Yoon. Minimizing Total Weighted Completion Time at a Pre-Assembly Stage Composed of Two Feeding Machines. International Journal of Production Economics, 54:247--255, 1998.
|
| |
27
|
G. Wang and T. C. E. Cheng. Customer Order Scheduling to Minimize Total Weighted Completion Time. In Proc. of the 1st Multidisciplinary Conference on Scheduling Theory and Applications, pages 409--416, 2003.
|
| |
28
|
L. A. Wolsey. Mixed Integer Programming Formulations for Production Planning and Scheduling Problems. In Invited talk at 12th ACM-SIAM Symposium on Mathematical Programming, 1985.
|
|