|
ABSTRACT
Helper threading is a technology to accelerate a program by exploiting a processor's multithreading capability to run ``assist'' threads. Previous experiments on hyper-threaded processors have demonstrated significant speedups by using helper threads to prefetch hard-to-predict delinquent data accesses. In order to apply this technique to processors that do not have built-in hardware support for multithreading, we introduce virtual multithreading (VMT), a novel form of switch-on-event user-level multithreading, capable of fly-weight multiplexing of event-driven thread executions on a single processor without additional operating system support. The compiler plays a key role in minimizing synchronization cost by judiciously partitioning register usage among the user-level threads. The VMT approach makes it possible to launch dynamic helper thread instances in response to long-latency cache miss events, and to run helper threads in the shadow of cache misses when the main thread would be otherwise stalled.The concept of VMT is prototyped on an Itanium ® 2 processor using features provided by the Processor Abstraction Layer (PAL) firmware mechanism already present in currently shipping processors. On a 4-way MP physical system equipped with VMT-enabled Itanium 2 processors, helper threading via the VMT mechanism can achieve significant performance gains for a diverse set of real-world workloads, ranging from single-threaded workstation benchmarks to heavily multithreaded large scale decision support systems (DSS) using the IBM DB2 Universal Database. We measure a wall-clock speedup of 5.8% to 38.5% for the workstation benchmarks, and 5.0% to 12.7% on various queries in the DSS workload.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Anant Agarwal , Beng-Hong Lim , David Kranz , John Kubiatowicz, APRIL: a processor architecture for multiprocessing, Proceedings of the 17th annual international symposium on Computer Architecture, p.104-114, May 28-31, 1990, Seattle, Washington, United States
|
 |
3
|
|
| |
4
|
|
| |
5
|
Jay Bharadwaj , William Y. Chen , Weihaw Chuang , Gerolf Hoflehner , Kishore Menezes , Kalyan Muthukumar , Jim Pierce, The Intel IA-64 Compiler Code Generator, IEEE Micro, v.20 n.5, p.44-53, September 2000
[doi> 10.1109/40.877949]
|
| |
6
|
J. M. Borkenhagen, R. J. Eickemeyer, R. N. Kalla, and S. Kunkel. A Multithreaded PowerPC Processor for Commercial Servers. IBM Journal of Research and Development, 44(6):885--898, 2000.
|
 |
7
|
Robert S. Chappell , Jared Stark , Sangwook P. Kim , Steven K. Reinhardt , Yale N. Patt, Simultaneous subordinate microthreading (SSMT), Proceedings of the 26th annual international symposium on Computer architecture, p.186-195, May 01-04, 1999, Atlanta, Georgia, United States
|
 |
8
|
Robert S. Chappell , Francis Tseng , Adi Yoaz , Yale N. Patt, Difficult-path branch prediction using subordinate microthreads, Proceedings of the 29th annual international symposium on Computer architecture, p.307, May 25-29, 2002, Anchorage, Alaska
|
| |
9
|
|
 |
10
|
Jamison D. Collins , Hong Wang , Dean M. Tullsen , Christopher Hughes , Yong-Fong Lee , Dan Lavery , John P. Shen, Speculative precomputation: long-range prefetching of delinquent loads, Proceedings of the 28th annual international symposium on Computer architecture, p.14-25, June 30-July 04, 2001, Göteborg, Sweden
|
 |
11
|
|
| |
12
|
Richard J. Eickemeyer , Ross E. Johnson , Steven R. Kunkel , Beng-Hong Lim , Mark S. Squillante , Ching-Farn Eric Wu, Evaluation of Multithreaded Processors and Thread-Switch Policies, Proceedings of the International Symposium on High Performance Computing, p.75-90, November 04-06, 1997
|
 |
13
|
|
| |
14
|
Graphviz - open source graph drawing software. http://www.research.att.com/sw/tools/graphviz/.
|
| |
15
|
J. W. Haskins Jr., K. R. Hirst, and K. Skadron. Inexpensive Throughput Enhancement in Small-Scale Embedded Micro- processors with Block Multithreading: Extensions, Characterization, and Tradeoffs. In 20th International Performance, Computing, and Communications Conference, April 2001.
|
| |
16
|
IBM DB2 Product Family. http://www.ibm.com/software/data/db2/.
|
| |
17
|
Intel Itanium 2 Processor Reference Manual for Software Development and Optimization. Intel Corporation, June 2002.
|
| |
18
|
Intel Itanium Architecture Software Developer's Manual. Intel Corporation, October 2002.
|
| |
19
|
Dongkeun Kim , Steve Shih-wei Liao , Perry H. Wang , Juan del Cuvillo , Xinmin Tian , Xiang Zou , Hong Wang , Donald Yeung , Milind Girkar , John P. Shen, Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, p.27, March 20-24, 2004, Palo Alto, California
|
 |
20
|
|
| |
21
|
Rakesh Krishnaiyer , Dattatraya Kulkarni , Daniel Lavery , Wei Li , Chu-cheow Lim , John Ng , David Sehr, An Advanced Optimizer for the IA-64 Architecture, IEEE Micro, v.20 n.6, p.60-68, November 2000
[doi> 10.1109/40.888704]
|
 |
22
|
Steve S.W. Liao , Perry H. Wang , Hong Wang , Gerolf Hoflehner , Daniel Lavery , John P. Shen, Post-pass binary adaptation for software-based speculative precomputation, Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation, June 17-19, 2002, Berlin, Germany
|
 |
23
|
|
| |
24
|
D. Marr, F. Binns, D. Hill, G. Hinton, D. Koufaty, J. Miller, and M. Upton. Hyper-Threading Technology Architecture and Microarchitecture. Intel Technology Journal, February 2002.
|
 |
25
|
|
| |
26
|
|
 |
27
|
Todd C. Mowry , Monica S. Lam , Anoop Gupta, Design and evaluation of a compiler algorithm for prefetching, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.62-73, October 12-15, 1992, Boston, Massachusetts, United States
|
| |
28
|
H. Muljono, S. Rusu, B. Cherkauer, and J. Stinson. New 130nm Itanium 2 Processors for 2003. In Hot Chips, 2003.
|
| |
29
|
|
| |
30
|
|
| |
31
|
M. Poess and C. Floyd. New TPC Benchmarks for Decision Support and Web Commerce. http://www.tpc.org.
|
| |
32
|
|
 |
33
|
Amir Roth , Andreas Moshovos , Gurindar S. Sohi, Dependence based prefetching for linked data structures, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.115-126, October 02-07, 1998, San Jose, California, United States
|
| |
34
|
|
| |
35
|
|
| |
36
|
Y. Song and M. Dubois. Assisted Execution. Technical Report CENG 98--25, Department of EE-Systems, University of Southern California, Oct 1998.
|
| |
37
|
SPEC CPU2000 Documentation. http://www.spec.org/osg/cpu2000/docs/.
|
 |
38
|
|
| |
39
|
|
 |
40
|
|
| |
41
|
H. Wang, P. Wang, R. D. Weldon, S. Ettinger, H. Saito, M. Girkar, S. Liao, and J. Shen. Speculative Precomputation: Exploring Use of Multithreading Technology for Latency. Intel Technology Journal, February 2002.
|
| |
42
|
|
 |
43
|
|
CITED BY 8
|
|
|
|
|
Anahita Shayesteh , Glenn Reinman , Norm Jouppi , Tim Sherwood , Suleyman Sair, Improving the performance and power efficiency of shared helpers in CMPs, Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, October 22-25, 2006, Seoul, Korea
|
|
|
|
|
|
Perry H. Wang , Jamison D. Collins , Gautham N. Chinya , Bernard Lint , Asit Mallick , Koichi Yamada , Hong Wang, Sequencer virtualization, Proceedings of the 21st annual international conference on Supercomputing, June 17-21, 2007, Seattle, Washington
|
|
|
|
|
|
Jiwei Lu , Abhinav Das , Wei-Chung Hsu , Khoa Nguyen , Santosh G. Abraham, Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.93-104, November 12-16, 2005, Barcelona, Spain
|
|
|
Richard A. Hankins , Gautham N. Chinya , Jamison D. Collins , Perry H. Wang , Ryan Rakvic , Hong Wang , John P. Shen, Multiple Instruction Stream Processor, ACM SIGARCH Computer Architecture News, v.34 n.2, p.114-127, May 2006
|
|
|
|
|