ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Exploiting parallelism through run-time analysis on a vector processor (abstract)
Full text PdfPdf (97 KB)
Source ACM Annual Computer Science Conference archive
Proceedings of the 1990 ACM annual conference on Cooperation table of contents
Washington, D.C., United States
Page: 434  
Year of Publication: 1990
ISBN:0-89791-348-5
Authors
V. P. Krothapalli  Department of Computer and Information Science, The Ohio State University, Columbus, Ohio
P. Sadayappan  Department of Computer and Information Science, The Ohio State University, Columbus, Ohio
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 0,   Downloads (12 Months): 5,   Citation Count: 0
Additional Information:

abstract   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/100348.100461
What is a DOI?

ABSTRACT

Several vendors such as Cray, Fujitsu, NEC supply vector processors that use pipelined functional units. A pipelined functional unit can extract parallelism by operating on a stream of operands simultaneously, with each stage in the pipeline working on a different set of operands. Users of these machines typically program in FORTRAN, and rely on an optimizing compiler (vectorizer) to extract parallelism. These compilers analyze loops in a program and generate a vector instruction for each type of operation that can be performed simultaneously on different elements of an array. When the independence of operations on different elements of an array is not compile-time determinable, as for example is the case with unstructured sparse matrix computations, these compilers generate scalar code. In this paper we propose a scheme based on run-time dependence analysis to vectorize loops with dependencies unknown at compile-time. Our approach analyses a loop at run-time by executing a skeleton of the loop and assigns a level to each operation. Operations that are independent of any other operations in the loop are given a level of one. Other operations are given a level one greater than the maximum of levels of all dependent operations. A vector instruction, using indirect operand access, is generated for all the operations of a given type at each level. The approach was evaluated by implementing a sparse triangular solver on one processor of a Cray X-MP. A number of sparse matrices derived from the application domain of circuit simulation were used. An improvement in performance ranging between 100-500.


Collaborative Colleagues:
V. P. Krothapalli: colleagues
P. Sadayappan: colleagues