ACM Home Page
Please provide us with feedback. Feedback
Safely exploiting multithreaded processors to tolerate memory latency in real-time systems
Full text PdfPdf (226 KB)
Source International Conference on Compilers, Architecture and Synthesis for Embedded Systems archive
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems table of contents
Washington DC, USA
SESSION: Memory systems table of contents
Pages: 2 - 13  
Year of Publication: 2004
ISBN:1-58113-890-3
Authors
Ali El-Haj-Mahmoud  North Carolina State University, Raleigh, NC
Eric Rotenberg  North Carolina State University, Raleigh, NC
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 47,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1023833.1023837
What is a DOI?

ABSTRACT

A coarse-grain multithreaded processor can effectively hide long memory latencies by quickly switching to an alternate task when the active task issues a memory request, improving overall throughput. However, dynamic switching cannot be safely exploited to improve throughput in hard-real-time embedded systems. The schedulability of a task-set (guaranteeing all tasks meet deadlines) must be determined a priori using offline schedulability tests. Any computation/memory overlap must be statically accounted for. We develop a novel analytical framework that bounds the overlap between computation of a pipeline-resident-task and on-going memory transfers of other tasks. A simple closed-form schedulability test is derived, that only depends on the aggregate computation (C) and memory (M) components of tasks. Namely, the technique does not require specificity regarding the location of memory transfers within and among tasks and avoids searching all task permutations for a specific feasible schedule. To the best of our knowledge, this is the first work to provide the necessary formalism for safely and tractably exploiting coarse-grain multithreaded processors to tolerate memory latency in hard-real-time systems, exceeding the schedulability limits of classic real-time theory for uniprocessors. Our techniques make it possible to capitalize on higher frequency embedded processors, despite the widening processor-memory speed gap. Experiments with task-sets from C-lab benchmarks reveal improvement in the schedulability of task-sets, measured as the ability to schedule previously infeasible task-sets or reduce utilization for already feasible task-sets. We also demonstrate proof-of-concept by deploying our method in a cycle-level simulator of an ARM11-like embedded microprocessor augmented with multiple register contexts, the same hardware multithreading support available in Ubicom's IP3023 embedded microprocessor.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
ARM, Inc. ARM-11 Technical Reference Manual. Available from: http://www.arm.com/pdfs/DDI0211D_arm1136_r0p2_trm.pdf.
 
4
D. Burger, T. Austin, and S. Bennett. The Simplescalar Tool Set, Version 2.0. Technical Report 1342, Computer Science Department, University of Wisconsin-Madison, 1997.
 
5
 
6
C-Lab WCET Benchmarks. Available from: http://www.c-lab.de/home/en/download.html.
 
7
B. Cogswell and Z. Segall. MACS: A Predictable Architecture for Real Time Systems. In Proceedings of the 12th IEEE Real-Time Systems Symposium, December 1991.
 
8
9
10
 
11
T. Hand. Real-Time Systems Need Predictability. Computer Design (RISC Supplement), August 1989.
 
12
 
13
 
14
D. Kirk. SMART (Strategic Memory Allocation for Real-Time) Cache Design. In Proceedings of the 10th IEEE Real-Time Systems Symposium, December 1989.
 
15
16
 
17
 
18
19
 
20
B. Smith. Architecture and Applications of the HEP Multiprocessor Computer System. In Proceedings of Real Time Signal Processing IV, 1981.
 
21
 
22
S. Storino and J. Borkenhagen. A Multi-Threaded 64-bit PowerPC Commercial RISC Processor Design. In Proceedings of the International Symposium on High-Performance Chips, August 1999.
23
 
24
Ubicom, Inc. The Ubicom IP3023 Wireless Network Processor. Available from: http://www.ubicom.com/products/ip3000/ip3000.html
25
 
26
A. Wolfe. Software-Based Cache Partitioning for Real-Time Applications. In Proceedings of the 3rd International Workshop on Responsive Computer Systems, September 1993.


Collaborative Colleagues:
Ali El-Haj-Mahmoud: colleagues
Eric Rotenberg: colleagues