|
ABSTRACT
A coarse-grain multithreaded processor can effectively hide long memory latencies by quickly switching to an alternate task when the active task issues a memory request, improving overall throughput. However, dynamic switching cannot be safely exploited to improve throughput in hard-real-time embedded systems. The schedulability of a task-set (guaranteeing all tasks meet deadlines) must be determined a priori using offline schedulability tests. Any computation/memory overlap must be statically accounted for. We develop a novel analytical framework that bounds the overlap between computation of a pipeline-resident-task and on-going memory transfers of other tasks. A simple closed-form schedulability test is derived, that only depends on the aggregate computation (C) and memory (M) components of tasks. Namely, the technique does not require specificity regarding the location of memory transfers within and among tasks and avoids searching all task permutations for a specific feasible schedule. To the best of our knowledge, this is the first work to provide the necessary formalism for safely and tractably exploiting coarse-grain multithreaded processors to tolerate memory latency in hard-real-time systems, exceeding the schedulability limits of classic real-time theory for uniprocessors. Our techniques make it possible to capitalize on higher frequency embedded processors, despite the widening processor-memory speed gap. Experiments with task-sets from C-lab benchmarks reveal improvement in the schedulability of task-sets, measured as the ability to schedule previously infeasible task-sets or reduce utilization for already feasible task-sets. We also demonstrate proof-of-concept by deploying our method in a cycle-level simulator of an ARM11-like embedded microprocessor augmented with multiple register contexts, the same hardware multithreading support available in Ubicom's IP3023 embedded microprocessor.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Robert Alverson , David Callahan , Daniel Cummings , Brian Koblenz , Allan Porterfield , Burton Smith, The Tera computer system, Proceedings of the 4th international conference on Supercomputing, p.1-6, June 11-15, 1990, Amsterdam, The Netherlands
|
 |
2
|
Aravindh Anantaraman , Kiran Seth , Kaustubh Patil , Eric Rotenberg , Frank Mueller, Virtual simple architecture (VISA): exceeding the complexity limit in safe real-time systems, Proceedings of the 30th annual international symposium on Computer architecture, June 09-11, 2003, San Diego, California
|
| |
3
|
ARM, Inc. ARM-11 Technical Reference Manual. Available from: http://www.arm.com/pdfs/DDI0211D_arm1136_r0p2_trm.pdf.
|
| |
4
|
D. Burger, T. Austin, and S. Bennett. The Simplescalar Tool Set, Version 2.0. Technical Report 1342, Computer Science Department, University of Wisconsin-Madison, 1997.
|
| |
5
|
|
| |
6
|
C-Lab WCET Benchmarks. Available from: http://www.c-lab.de/home/en/download.html.
|
| |
7
|
B. Cogswell and Z. Segall. MACS: A Predictable Architecture for Real Time Systems. In Proceedings of the 12th IEEE Real-Time Systems Symposium, December 1991.
|
| |
8
|
|
 |
9
|
Richard J. Eickemeyer , Ross E. Johnson , Steven R. Kunkel , Mark S. Squillante , Shiafun Liu, Evaluation of multithreaded uniprocessors for commercial application environments, Proceedings of the 23rd annual international symposium on Computer architecture, p.203-212, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
10
|
Kristián Flautner , Rich Uhlig , Steve Reinhardt , Trevor Mudge, Thread-level parallelism and interactive performance of desktop applications, Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, p.129-138, November 2000, Cambridge, Massachusetts, United States
|
| |
11
|
T. Hand. Real-Time Systems Need Predictability. Computer Design (RISC Supplement), August 1989.
|
| |
12
|
|
| |
13
|
|
| |
14
|
D. Kirk. SMART (Strategic Memory Allocation for Real-Time) Cache Design. In Proceedings of the 10th IEEE Real-Time Systems Symposium, December 1989.
|
| |
15
|
J. Kreuzinger , A. Schulz , M. Pfeffer , T. Ungerer , U. Brinkschulte , C. Krakowski, Real-time scheduling on multithreaded processors, Proceedings of the Seventh International Conference on Real-Time Systems and Applications (RTCSA'00), p.155, December 12-14, 2000
|
 |
16
|
|
| |
17
|
|
| |
18
|
|
 |
19
|
|
| |
20
|
B. Smith. Architecture and Applications of the HEP Multiprocessor Computer System. In Proceedings of Real Time Signal Processing IV, 1981.
|
| |
21
|
|
| |
22
|
S. Storino and J. Borkenhagen. A Multi-Threaded 64-bit PowerPC Commercial RISC Processor Design. In Proceedings of the International Symposium on High-Performance Chips, August 1999.
|
 |
23
|
Dean M. Tullsen , Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm, Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, Proceedings of the 23rd annual international symposium on Computer architecture, p.191-202, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
| |
24
|
Ubicom, Inc. The Ubicom IP3023 Wireless Network Processor. Available from: http://www.ubicom.com/products/ip3000/ip3000.html
|
 |
25
|
|
| |
26
|
A. Wolfe. Software-Based Cache Partitioning for Real-Time Applications. In Proceedings of the 3rd International Workshop on Responsive Computer Systems, September 1993.
|
CITED BY 3
|
|
Ali El-Haj-Mahmoud , Ahmed S. AL-Zawawi , Aravindh Anantaraman , Eric Rotenberg, Virtual multiprocessor: an analyzable, high-performance architecture for real-time computing, Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems, September 24-27, 2005, San Francisco, California, USA
|
|
|
|
|
|
|
|