ACM Home Page
Please provide us with feedback. Feedback
The WaveScalar architecture
Full text PdfPdf (899 KB)
Source
ACM Transactions on Computer Systems (TOCS) archive
Volume 25 ,  Issue 2  (May 2007) table of contents
Article No. 4  
Year of Publication: 2007
ISSN:0734-2071
Authors
Steven Swanson  University of Washington, Seattle, WA
Andrew Schwerin  University of Washington, Seattle, WA
Martha Mercaldi  University of Washington, Seattle, WA
Andrew Petersen  University of Washington, Seattle, WA
Andrew Putnam  University of Washington, Seattle, WA
Ken Michelson  University of Washington, Seattle, WA
Mark Oskin  University of Washington, Seattle, WA
Susan J. Eggers  University of Washington, Seattle, WA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 19,   Downloads (12 Months): 160,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1233307.1233308
What is a DOI?

ABSTRACT

Silicon technology will continue to provide an exponential increase in the availability of raw transistors. Effectively translating this resource into application performance, however, is an open challenge that conventional superscalar designs will not be able to meet. We present WaveScalar as a scalable alternative to conventional designs. WaveScalar is a dataflow instruction set and execution model designed for scalable, low-complexity/high-performance processors. Unlike previous dataflow machines, WaveScalar can efficiently provide the sequential memory semantics that imperative languages require. To allow programmers to easily express parallelism, WaveScalar supports pthread-style, coarse-grain multithreading and dataflow-style, fine-grain threading. In addition, it permits blending the two styles within an application, or even a single function.

To execute WaveScalar programs, we have designed a scalable, tile-based processor architecture called the WaveCache. As a program executes, the WaveCache maps the program's instructions onto its array of processing elements (PEs). The instructions remain at their processing elements for many invocations, and as the working set of instructions changes, the WaveCache removes unused instructions and maps new ones in their place. The instructions communicate directly with one another over a scalable, hierarchical on-chip interconnect, obviating the need for long wires and broadcast communication.

This article presents the WaveScalar instruction set and evaluates a simulated implementation based on current technology. For single-threaded applications, the WaveCache achieves performance on par with conventional processors, but in less area. For coarse-grain threaded applications the WaveCache achieves nearly linear speedup with up to 64 threads and can sustain 7--14 multiply-accumulates per cycle on fine-grain threaded versions of well-known kernels. Finally, we apply both styles of threading to equake from Spec2000 and speed it up by 9x compared to the serial version.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
Arvind. 2005. Dataflow: Passing the token. ISCA keynote in Annual International Symposium on Computer Architecture.
4
5
 
6
Barth, P. S., Nikhil, R. S., and Arvind. 1991. M-structures: Extending a parallel, non-strict, functional languages with state. Tech. Rep. MIT/LCS/TR-327, Massachusetts Institute of Technology. March.
 
7
8
 
9
Cadence. 2007. Cadence website. http://www.cadence.com.
10
 
11
Culler, D. E. 1990. Managing parallelism and resources in scientific dataflow programs. Ph.D. thesis, Massachusetts Institute of Technology.
12
13
 
14
15
16
 
17
Desikan, R., Burger, D. C., Keckler, S. W., and Austin, T. M. 2001. Sim-Alpha: A validated, execution-driven Alpha 21264 simulator. Tech. Rep. TR-01-23, University of Texas-Austin, Department of Computer Sciences.
 
18
Ekman, M. and Stenström, P. 2003. Performance and power impact of issue width in chip-multiprocessor cores. In Proceedings of the International Conference on Parallel Processing.
 
19
Feo, J. T., Miller, P. J., and Skedzielewski, S. K. 1995. SISAL90. In Proceedings of the Conference on High Performance Functional Computing.
20
21
22
23
 
24
25
 
26
Jain, A. E. A. 2001. A 1.2GHz Alpha microprocessor with 44.8GB/s chip pin bandwidth. In Proceedings of the IEEE International Solid-State Circuits Conference. Vol. 1. 240--241.
27
28
 
29
Krewel, K. 2005. Alpha EV7 processor: A high-performance tradition continues. Microprocessor Rep.
 
30
31
32
33
34
35
36
 
37
 
38
 
39
 
40
Nikhil, R. 1990. The parallel programming language id and its compilation for parallel machines. In Proceedings of the Workshop on Massive Paralleism: Hardware, Programming and Applications. Acamedic Press.
41
42
43
44
 
45
Shimada, T., Hiraki, K., and Nishida, K. 1984. An architecture of a data flow machine and its evaluation. ACM SIGARCH Comput. Architecure News 14, 2, 226--234.
46
 
47
 
48
SPEC. 2000. SPEC CPU 2000 benchmark specifications. SPEC2000 Benchmark Release.
 
49
 
50
51
52
 
53
TSMC. 2007. Silicon design chain cooperation enables nanometer chip design. Cadence Whitepaper. http://www.cadence.com/whitepapers/.
 
54



REVIEW

"Joseph M. Arul : Reviewer"

The WaveScalar architecture aims to take advantages found in dataflow to achieve more parallelism with less silicon area, along with better performance and scalability than superscalar architecture. The program counter, which is a bottleneck in vo  more...

Collaborative Colleagues:
Steven Swanson: colleagues
Andrew Schwerin: colleagues
Martha Mercaldi: colleagues
Andrew Petersen: colleagues
Andrew Putnam: colleagues
Ken Michelson: colleagues
Mark Oskin: colleagues
Susan J. Eggers: colleagues