| Per-thread cycle accounting in SMT processors |
| Full text |
Pdf
(428 KB)
|
Source
|
Architectural Support for Programming Languages and Operating Systems
archive
Proceeding of the 14th international conference on Architectural support for programming languages and operating systems
table of contents
Washington, DC, USA
SESSION: Prediction and accounting
table of contents
Pages 133-144
Year of Publication: 2009
ISBN:978-1-60558-406-5
Also published in ...
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 22, Downloads (12 Months): 145, Citation Count: 0
|
|
|
ABSTRACT
This paper proposes a cycle accounting architecture for Simultaneous Multithreading (SMT) processors that estimates the execution times for each of the threads had they been executed alone, while they are running simultaneously on the SMT processor. This is done by accounting each cycle to either a base, miss event or waiting cycle component during multi-threaded execution. Single-threaded alone execution time is then estimated as the sum of the base and miss event components; the waiting cycle component represents the lost cycle count due to SMT execution. The cycle accounting architecture incurs reasonable hardware cost (around 1KB of storage) and estimates single-threaded performance with average prediction errors around 7.2% for two-program workloads and 11.7% for four-program workloads. The cycle accounting architecture has several important applications to system software and its interaction with SMT hardware. For one, the estimated single-thread alone execution time provides an accurate picture to system software of the actually consumed processor cycles per thread. The alone execution time instead of the total execution time (timeslice) may make system software scheduling policies more effective. Second, a new class of thread-progress aware SMT fetch policies based on per-thread progress indicators enable system software level priorities to be enforced at the hardware level.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Carlos Boneti , Francisco J. Cazorla , Roberto Gioiosa , Alper Buyuktosunoglu , Chen-Yong Cher , Mateo Valero, Software-Controlled Priority Characterization of POWER5 Processor, Proceedings of the 35th International Symposium on Computer Architecture, p.415-426, June 21-25, 2008
|
| |
2
|
Francisco J. Cazorla , Peter M. W. Knijnenburg , Rizos Sakellariou , Enrique Fernandez , Alex Ramirez , Mateo Valero, Predictable Performance in SMT Processors: Synergy between the OS and SMTs, IEEE Transactions on Computers, v.55 n.7, p.785-799, July 2006
[doi> 10.1109/TC.2006.108]
|
| |
3
|
Francisco J. Cazorla , Alex Ramirez , Mateo Valero , Enrique Fernandez, Dynamically Controlled Resource Allocation in SMT Processors, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.171-182, December 04-08, 2004, Portland, Oregon
[doi> 10.1109/MICRO.2004.17]
|
| |
4
|
Francisco J. Cazorla , Alex Ramirez , Mateo Valero , Peter M. W. Knijnenburg , Rizos Sakellariou , Enrique Fernández, QoS for High-Performance SMT Processors in Embedded Systems, IEEE Micro, v.24 n.4, p.24-31, July 2004
[doi> 10.1109/MM.2004.37]
|
 |
5
|
|
 |
6
|
|
| |
7
|
E. Cota=Robles. Priority Based Simultaneous Multi-Threading, Dec. 2003. United States Patent No. 6,658,447 B2.
|
| |
8
|
Jeffrey Dean , James E. Hicks , Carl A. Waldspurger , William E. Weihl , George Chrysos, ProfileMe: hardware support for instruction-level profiling on out-of-order processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.292-302, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
9
|
J. Emer. EV8: The post-ultimate alpha. Keynote presentation at PACT, Sept. 2001.
|
| |
10
|
|
| |
11
|
|
 |
12
|
Stijn Eyerman , Lieven Eeckhout , Tejas Karkhanis , James E. Smith, A performance counter architecture for computing accurate CPI components, Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, October 21-25, 2006, San Jose, California, USA
|
| |
13
|
A. Fedorova, M. Seltzer, and M. D. Smith. A non-work-conserving operating system scheduler for SMT processors. In WIOSCA, in conjunction with ISCA, June 2006.
|
 |
14
|
|
 |
15
|
|
| |
16
|
|
 |
17
|
|
| |
18
|
K. Luo, J. Gummaraju, and M. Franklin. Balancing throughput and fairness in SMT processors. In ISPASS, pages 164--171, Nov. 2001.
|
| |
19
|
A. Mericas. Performance monitoring on the POWER5 microprocessor. In L. K. John and L. Eeckhout, editors, Performance Evaluation and Benchmarking, pages 247--266. CRC Press, 2006.
|
 |
20
|
|
| |
21
|
|
 |
22
|
|
 |
23
|
|
 |
24
|
|
| |
25
|
|
| |
26
|
D. Tullsen. Simulation and modeling of a simultaneous multithreading processor. In Proceedings of the 22nd Annual Computer Measurement Group Conference, Dec. 1996.
|
| |
27
|
|
 |
28
|
Dean M. Tullsen , Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm, Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, Proceedings of the 23rd annual international symposium on Computer architecture, p.191-202, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
29
|
|
|