|
ABSTRACT
Although the dataflow model has been shown to allow the exploitation of parallelism at all levels, research of the past decade has revealed several fundamental problems: Synchronization at the instruction level, token matching, coloring and re-labeling operations have a negative impact on performance by significantly increasing the number of non-compute “overhead” cycles. Recently, many novel Hybrid von-Neumann Data Driven machines have been proposed to alleviate some of these problems. The major objective has been to reduce or eliminate unnecesssary synchronization costs through simplified operand matching schemes and increased task granularity. Moreover, the results from recent studies quantifying locality suggest sufficient spatial and temporal locality is present in dataflow execution to merit its exploitation.
In this paper we present a data structure for exploiting locality in a data driven environment: the Vector Cell. A Vector Cell consists of a number of fixed length chunks of data elements. Each chunk is tagged with a presence bit, providing intra-chunk strictness and inter-chunk non-strictness to data structure access. We describe the semantics of the model, processor architecture and instruction set as well as a Sisal to dataflow vectorizing compiler back-end. The model is evaluated by comparing its performance to those of both a classical fine-grain dataflow processor employing I-structures and a conventional pipelined vector processor. Results indicate the model is surprisingly resilient to long memory and communication latencies, and is able to dynamically exploit the underlying parallelism across multiple processing elements at run time.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Arvind and R. A. Iannucd. Instruction set definition for a tagged-token data-flow machine. TechnicM Report CSG Memo 212-3, Laboratory for Computer S~ence, MIT, 1983.
|
| |
2
|
Arvind, R. S. Nikh~, and K. K. Pingali. ~Structures: Data Structures for ParMl~ Computing. MIT Computation Structures Group Memo 269, Laboratory for Computer ScOnce, MIT, 1987.
|
| |
3
|
P. S. Barth, R. S. Nikh~, and Arvid. M-structures: Extending a paralld, non-strict, functional language with state. Technical Report Computation Structures Group Report 327, M/T, Laboratory for Computer S~ence, Cambridge, MA, March 1991.
|
| |
4
|
|
| |
5
|
|
 |
6
|
|
| |
7
|
D. C. Cann. The Optim~ing S~al Compiler: Version 12.0. Techn~al Report CS-92-114, Colorado State UnN versity, 1992.
|
| |
8
|
|
 |
9
|
David E. Culler , Anurag Sah , Klaus E. Schauser , Thorsten von Eicken , John Wawrzynek, Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.164-175, April 08-11, 1991, Santa Clara, California, United States
|
| |
10
|
|
| |
11
|
J. B. Dennis and G. R. Gao. Maximum pipelining of array operations on static data-flow machine. In Int. Conf. on Parallel Processing, page 331pp, August 1983.
|
| |
12
|
J. B. Dennis and K. S. Weng. An abstract imp~mentation for concurent computation with streams. Int. Conf. on Parallel Processin~ pages 35-45, August 1979.
|
 |
13
|
|
| |
14
|
J. J. Dongarra and A. Hinds. Comparison of the Cray X-MP-4, Fujitsu VP-200, and Hitachi S-810/20. IEEE Compute~ 16:289-323, December 1985.
|
| |
15
|
G. K. Egan, J. J. Webb, and A. P. W. BShm. Some A~ chitectural Features of the CSIRA C H Data-Flow Compute~ Prent~e-Hall, 1991.
|
| |
16
|
P. Evripidou and J.-L. Gaudiot. The USC Decoupled Multilevel Data-Flow Execution Model Prent~HaH, 1991.
|
| |
17
|
J. T. Feo. The Livermore Loops in S~al. Techn~ cal Report UCID-21159, Computing Research Group, Lawrence Livermore National Laborator~ Livermore, CA 94550, August 1987.
|
| |
18
|
D. Gajski, D. A. Padua, D. J. Kuck, and R. H. Kuhn. A second opinion on Data Flow machines and languages. IEEE Compute~ pages 58-69, 1982.
|
| |
19
|
|
| |
20
|
J.-L. Gaudiot and W. A. Najjar. Macro-actor execution on multflevd data-driven architectures. In Prgc. of the IFIP Working Group 10. 3 Working Conference on Parallel Processing, Pisa, ItM~ 1988.
|
| |
21
|
I. Gottheb. Dynam~ Structured Dataflow:Preserving the Advantages of Sequential Proces~ng in a D~ta Driven Envkonment. In Int. Conf. on Parallel Pl~ocessin~ Chicago, 1988.
|
| |
22
|
V. G. Grafe and J. E. Hoch. Implementation of the EpsHon dataflow processor. In Proc. of the Twentythird Ann. Hawaii Int. Conf. on System Sciences, pages 19-29, 1990. volume 1.
|
 |
23
|
|
 |
24
|
|
| |
25
|
Cray Research Inc. CRAY Y-MP C90 Functional Description Manual. Technical Report HR-04028, Cray Research Inc., March 1992.
|
| |
26
|
J. McGraw, S. Skedzielewski, S. AHan, R. Oldehoeff, J. Glauert, C. Kirkham, B. Noyce, and R. Thomas. SISAL: Streams and Iteration in a Single Assignment Language: reference manual version 1.2. Manual M- 146, Rev. 1, Lawrence Livermore National Laborator~ Livermore, CA, March 1985.
|
| |
27
|
F. H. McMahon. Livermore FORTRAN kernds: A computer test of numerical performance range. Techng cM Report UCRL-53745, Lawrence Livermore National Laborator~ Livermore, CA, December 1986.
|
 |
28
|
|
| |
29
|
|
 |
30
|
|
| |
31
|
G. M. Papadopoulos. Implementation of a General- Purpose Dataflow M ulfiprocessor. Technical Report TR-432, MIT Laboratory for Computer Sconce, August 1988.
|
 |
32
|
|
 |
33
|
|
| |
34
|
J. R. Rice. Prob~ms to test paral}~ and vector languages. TechnicM Report CSD-TR 516, Purdue Univer~t~ 1985.
|
| |
35
|
J. R. Price. Problems to test parall~ and vector languages- 2. Technical Report CSD-TR 1016, Purdue University, 1990.
|
 |
36
|
|
 |
37
|
S. Sakai , y. Yamaguchi , K. Hiraki , Y. Kodama , T. Yuba, An architecture of a dataflow single chip processor, Proceedings of the 16th annual international symposium on Computer architecture, p.46-53, April 1989, Jerusalem, Israel
|
| |
38
|
S. Sak~i, Y. Yamaguchi, K. Hiraki, Y. Kodama, and T. Yuba. Pipefine optimization of a dataflow machine. In J-L. Gaudiot and L. Bic, editors, Advanced Topics in Data-Flow Computing, pages 225-246. Prentice Hall, 1991.
|
| |
39
|
S. Sake, Y. Yamaguchi, K. H~aki, and T. Yuba. Introduction of a Strongly-Connected-Arc-Mod~ in ~ Data- Driven Single Chip Processor EMC-R. In Data-flow Workshop, IECE, pages 231-238, 1987. In Japanese.
|
 |
40
|
Mitsuhisa Sato , Yuetsu Kodama , Shuichi Sakai , Yoshinori Yamaguchi , Yasuhito Koumura, Thread-based programming for the EM-4 hybrid dataflow machine, Proceedings of the 19th annual international symposium on Computer architecture, p.146-155, May 19-21, 1992, Queensland, Australia
|
| |
41
|
|
| |
42
|
S. Sekiguchi, K. Hiraki, and T. Shimadn. Effident vector proces~ng on a Dataflow supercomputer SIGMA-1. In Int. Conf. on Parallel Processin~ 1988.
|
| |
43
|
B. Smith. The end of architecture (keynote address). In Int. Ann. Syrup. on Computer Architecture, May 1990.
|
 |
44
|
|
 |
45
|
|
|