| Dynamic vectorization: a mechanism for exploiting far-flung ILP in ordinary programs |
| Full text |
Pdf
(103 KB)
|
| Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 26th annual international symposium on Computer architecture
table of contents
Atlanta, Georgia, United States
Pages: 16 - 27
Year of Publication: 1999
ISBN:0-7695-0170-2
Also published in ...
|
|
Authors
|
|
Sriram Vajapeyam
|
Supercomputer Education and Research Centre and Dept. of Computer Science & Automation, Indian Institute of Science, Bangalore, India 560012
|
|
P. J. Joseph
|
Dept. of Computer Science & Automation, Indian Institute of Science, Bangalore, India 560012
|
|
Tulika Mitra
|
SUNY, Stony Brook and Dept. of Computer Science & Automation, Indian Institute of Science, Bangalore, India 560012
|
|
| Sponsors |
|
| Publisher |
IEEE Computer Society
Washington, DC, USA
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 29, Citation Count: 4
|
|
|
ABSTRACT
Several ILP limit studies indicate the presence of considerable ILP across dynamically far-apart instructions in program execution. This paper proposes a hardware mechanism, dynamic vectorization (DV), as a tool for quickly building up a large logical instruction window. Dynamic vectorization converts repetitive dynamic instruction sequences into vector form, enabling the processing of instructions from beyond the corresponding program loop to be overlapped with the loop. This enables vector-like execution of programs with relatively complex static control flow that may not be amenable to static, compile time vectorization. Experimental evaluation shows that a large fraction of the dynamic instructions of four of the six SPECInt92 programs can be captured in vector form. Three of these programs exhibit significant potential for ILP improvements from dynamic vectorization, with speedups of more than a factor of 2 in a scenario of realistic branch prediction and perfect memory disambiguation. Under perfect branch prediction conditions, a fourth program also shows well over a factor of 2 speedup from DV. The speedups are due to the overlap of post-loop processing with loop processing.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
Tulika Mitra, "Performance Evaluation of Improved Superscalar Issue Mechanisms," M.E. Project Report, January 1997.
|
 |
5
|
Andreas Moshovos , Scott E. Breach , T. N. Vijaykumar , Gurindar S. Sohi, Dynamic speculation and synchronization of data dependences, Proceedings of the 24th annual international symposium on Computer architecture, p.181-193, June 01-04, 1997, Denver, Colorado, United States
|
 |
6
|
Subbarao Palacharla , Norman P. Jouppi , J. E. Smith, Complexity-effective superscalar processors, Proceedings of the 24th annual international symposium on Computer architecture, p.206-218, June 01-04, 1997, Denver, Colorado, United States
|
| |
7
|
Matthew A. Postiff, David Greene, Gary Tyson, and Trevor Mudge, "The Limits of Instruction Level Parallelism in SPEC95 Applications," in INTERACT.3: The Third Workshop on Interaction Between Compilers and Computer Architectures, San Jose, CA, October 1998.
|
| |
8
|
Eric Rotenberg , Quinn Jacobson , Yiannakis Sazeides , Jim Smith, Trace processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.138-148, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
9
|
|
 |
10
|
|
 |
11
|
|
| |
12
|
|
 |
13
|
|
| |
14
|
H.C. Young and J. R. Goodman, "The Design of a Queue-Based Vector Supercomputer," Int'i Conf. on Parallel Processing, 1986.
|
CITED BY 4
|
|
|
|
|
Arun Kejariwal , Alexander V. Veidenbaum , Alexandru Nicolau , Milind Girkarmark , Xinmin Tian , Hideki Saito, Challenges in exploitation of loop parallelism in embedded applications, Proceedings of the 4th international conference on Hardware/software codesign and system synthesis, October 22-25, 2006, Seoul, Korea
|
|
|
|
|
|
José González , Qiong Cai , Pedro Chaparro , Grigorios Magklis , Ryan Rakvic , Antonio González, Thread fusion, Proceeding of the thirteenth international symposium on Low power electronics and design, August 11-13, 2008, Bangalore, India
|
|