|
ABSTRACT
Silicon technology will continue to provide an exponential increasein the availability of raw transistors. Effectively translatingthis resource into application performance, however,is an open challenge. Ever increasing wire-delay relativeto switching speed and the exponential cost of circuit complexitymake simply scaling up existing processor designs futile.In this paper, we present an alternative to superscalardesign, WaveScalar. WaveScalar is a dataflow instructionset architecture and execution model designed for scalable,low-complexity/high-performance processors. WaveScalar isunique among dataflow architectures in efficiently providingtraditional memory semantics. At last, a dataflow machinecan run "real-world" programs, written in any language,without sacrificing parallelism.The WaveScalar ISA is designed to run on an intelligentmemory system. Each instruction in a WaveScalar binary executesin place in the memory system and explicitly communicateswith its dependents in dataflow fashion. WaveScalararchitectures cache instructions and the values they operateon in a WaveCache, a simple grid of "alu-in-cache" nodes.By co-locating computation and data in physical space, theWaveCache minimizes long wire, high-latency communication.This paper introduces the WaveScalar instruction setand evaluates a simulated implementation based on currenttechnology. Results for the SPEC and Mediabench applicationsdemonstrate that the WaveCache out-performs an aggressivelyconfigured superscalar design by 2-7 times, withample opportunities for future optimizations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Vikas Agarwal , M. S. Hrishikesh , Stephen W. Keckler , Doug Burger, Clock rate versus IPC: the end of the road for conventional microarchitectures, Proceedings of the 27th annual international symposium on Computer architecture, p.248-259, June 2000, Vancouver, British Columbia, Canada
|
| |
2
|
|
| |
3
|
|
| |
4
|
Elliot Waingold , Michael Taylor , Devabhaktuni Srikrishna , Vivek Sarkar , Walter Lee , Victor Lee , Jang Kim , Matthew Frank , Peter Finch , Rajeev Barua , Jonathan Babb , Saman Amarasinghe , Anant Agarwal, Baring It All to Software: Raw Machines, Computer, v.30 n.9, p.86-93, September 1997
[doi> 10.1109/2.612254]
|
 |
5
|
Ken Mai , Tim Paaske , Nuwan Jayasena , Ron Ho , William J. Dally , Mark Horowitz, Smart Memories: a modular reconfigurable architecture, Proceedings of the 27th annual international symposium on Computer architecture, p.161-171, June 2000, Vancouver, British Columbia, Canada
|
 |
6
|
|
| |
7
|
|
| |
8
|
[8] S. Allan and A. Oldehoeft, "A flow analysis procedure for the translation of high-level languages to a data flow language," IEEE Transactions on Computers, vol. 29, no. 9, 1980.
|
 |
9
|
Karthikeyan Sankaralingam , Ramadass Nagarajan , Haiming Liu , Changkyu Kim , Jaehyuk Huh , Doug Burger , Stephen W. Keckler , Charles R. Moore, Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture, Proceedings of the 30th annual international symposium on Computer architecture, June 09-11, 2003, San Diego, California
|
 |
10
|
Walter Lee , Rajeev Barua , Matthew Frank , Devabhaktuni Srikrishna , Jonathan Babb , Vivek Sarkar , Saman Amarasinghe, Space-time scheduling of instruction-level parallelism on a raw machine, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.46-57, October 02-07, 1998, San Jose, California, United States
|
| |
11
|
|
| |
12
|
[12] L. Gwennap, "Digital 21264 sets new standard," Microprocessor Report, vol. 10, October 1996.
|
| |
13
|
[13] "Map-ca datasheet," June 2001. Equator Technologies.
|
| |
14
|
[14] H. Sharangpani, "Intel Itanium processor core," in Hot-Chips, 2000.
|
 |
15
|
|
 |
16
|
Subbarao Palacharla , Norman P. Jouppi , J. E. Smith, Complexity-effective superscalar processors, Proceedings of the 24th annual international symposium on Computer architecture, p.206-218, June 01-04, 1997, Denver, Colorado, United States
|
 |
17
|
|
 |
18
|
|
 |
19
|
|
| |
20
|
|
 |
21
|
|
 |
22
|
|
 |
23
|
S. Sakai , y. Yamaguchi , K. Hiraki , Y. Kodama , T. Yuba, An architecture of a dataflow single chip processor, Proceedings of the 16th annual international symposium on Computer architecture, p.46-53, April 1989, Jerusalem, Israel
|
 |
24
|
T. Shimada , K. Hiraki , K. Nishida , S. Sekiguchi, Evaluation of a prototype data flow processor of the SIGMA-1 for scientific computations, Proceedings of the 13th annual international symposium on Computer architecture, p.226-234, June 02-05, 1986, Tokyo, Japan
|
 |
25
|
|
 |
26
|
|
 |
27
|
|
 |
28
|
|
| |
29
|
|
| |
30
|
[30] R. Nikhil, "The parallel programming language Id and its compilation for parallel machines," in Workshop on Mazzive Paralleism: Hardware, Programming and Applications, 1990.
|
| |
31
|
|
| |
32
|
[32] J. T. Feo, P. J. Miller, and S. K. Skedzielewski, "Sisal90," in High Performance Functional Computing, 1995.
|
| |
33
|
[33] S. Murer and R. Marti, "The FOOL programming language: Integrating single-assignment and object-oriented paradigms," in European Workshop on Parallel Computing, 1992.
|
| |
34
|
[34] J. B. Dennis, "First version data flow procedure language," Tech. Rep. MAC TM61, MIT Laboratory for Computer Science, 1991.
|
 |
35
|
|
 |
36
|
|
| |
37
|
[37] J. B. Dennis, "Dataflow supercomputers," in IEEE Computer, vol. 13, 1980.
|
 |
38
|
Karl J. Ottenstein , Robert A. Ballance , Arthur B. MacCabe, The program dependence web: a representation supporting control-, data-, and demand-driven interpretation of imperative languages, Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation, p.257-271, June 1990, White Plains, New York, United States
|
 |
39
|
David E. Culler , Anurag Sah , Klaus E. Schauser , Thorsten von Eicken , John Wawrzynek, Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.164-175, April 08-11, 1991, Santa Clara, California, United States
|
 |
40
|
|
 |
41
|
|
 |
42
|
Scott A. Mahlke , David C. Lin , William Y. Chen , Richard E. Hank , Roger A. Bringmann, Effective compiler support for predicated execution using the hyperblock, Proceedings of the 25th annual international symposium on Microarchitecture, p.45-54, December 01-04, 1992, Portland, Oregon, United States
|
| |
43
|
|
 |
44
|
Andreas Moshovos , Scott E. Breach , T. N. Vijaykumar , Gurindar S. Sohi, Dynamic speculation and synchronization of data dependences, Proceedings of the 24th annual international symposium on Computer architecture, p.181-193, June 01-04, 1997, Denver, Colorado, United States
|
| |
45
|
|
| |
46
|
[46] S. Swanson, K. Michelson, and M. Oskin, "Configuration by combustion: Online simulated annealing for dynamic hard-ware configuration," in ASPLOS X Wild and Crazy Idea Session , 2002.
|
 |
47
|
M. S. Hrishikesh , Doug Burger , Norman P. Jouppi , Stephen W. Keckler , Keith I. Farkas , Premkishore Shivakumar, The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays, Proceedings of the 29th annual international symposium on Computer architecture, May 25-29, 2002, Anchorage, Alaska
|
| |
48
|
[48] S. McFarling, "Combining Branch Predictors," Tech. Rep. TN- 36, Digital Equipment Corperation, June 1993.
|
| |
49
|
[49] SPEC, "Spec CPU 2000 benchmark specifications." SPEC2000 Benchmark Release, 2000.
|
| |
50
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
51
|
|
| |
52
|
[52] "Personal communication with doug burger and steve keckler," 2002-2003.
|
 |
53
|
|
 |
54
|
|
CITED BY 34
|
|
Karthikeyan Sankaralingam , Ramadass Nagarajan , Haiming Liu , Changkyu Kim , Jaehyuk Huh , Nitya Ranganathan , Doug Burger , Stephen W. Keckler , Robert G. McDonald , Charles R. Moore, TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP, ACM Transactions on Architecture and Code Optimization (TACO), v.1 n.1, p.62-93, March 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Julia Chen , Philo Juang , Kevin Ko , Gilberto Contreras , David Penry , Ram Rangan , Adam Stoler , Li-Shiuan Peh , Margaret Martonosi, Hardware-modulated parallelism in chip multiprocessors, ACM SIGARCH Computer Architecture News, v.33 n.4, November 2005
|
|
|
|
|
|
Martha Mercaldi , Steven Swanson , Andrew Petersen , Andrew Putnam , Andrew Schwerin , Mark Oskin , Susan J. Eggers, Modeling instruction placement on a spatial architecture, Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures, July 30-August 02, 2006, Cambridge, Massachusetts, USA
|
|
|
Ramadass Nagarajan , Sundeep K. Kushwaha , Doug Burger , Kathryn S. McKinley , Calvin Lin , Stephen W. Keckler, Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures, Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, p.74-84, September 29-October 03, 2004
|
|
|
|
|
|
|
|
|
|
|
|
Martha Mercaldi , Steven Swanson , Andrew Petersen , Andrew Putnam , Andrew Schwerin , Mark Oskin , Susan J. Eggers, Instruction scheduling for a tiled dataflow architecture, ACM SIGOPS Operating Systems Review, v.40 n.5, December 2006
|
|
|
|
|
|
Shigeru Kusakabe , Mitsuhiro Aono , Masaaki Izumi , Satoshi Amamiya , Yoshinari Nomura , Hideo Taniguchi , Makoto Amamiya, Scalability of continuation-based fine-grained multithreading in handling multiple I/O requests on FUCE, Proceedings of the 4th international conference on Computing frontiers, May 07-09, 2007, Ischia, Italy
|
|
|
Steven Swanson , Andrew Schwerin , Martha Mercaldi , Andrew Petersen , Andrew Putnam , Ken Michelson , Mark Oskin , Susan J. Eggers, The WaveScalar architecture, ACM Transactions on Computer Systems (TOCS), v.25 n.2, p.4-es, May 2007
|
|
|
Jeffrey R. Diamond , Behnam Robatmili , Stephen W. Keckler , Robert van de Geijn , Kazushige Goto , Doug Burger, High performance dense linear algebra on a spatially distributed processor, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, February 20-23, 2008, Salt Lake City, UT, USA
|
|
|
Andrew Petersen , Andrew Putnam , Martha Mercaldi , Andrew Schwerin , Susan Eggers , Steve Swanson , Mark Oskin, Reducing control overhead in dataflow architectures, Proceedings of the 15th international conference on Parallel architectures and compilation techniques, September 16-20, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
Aaron Smith , Jon Gibson , Bertrand Maher , Nick Nethercote , Bill Yoder , Doug Burger , Kathryn S. McKinle , Jim Burrill, Compiling for EDGE Architectures, Proceedings of the International Symposium on Code Generation and Optimization, p.185-195, March 26-29, 2006
|
|
|
Steven Swanson , Andrew Putnam , Martha Mercaldi , Ken Michelson , Andrew Petersen , Andrew Schwerin , Mark Oskin , Susan J. Eggers, Area-Performance Trade-offs in Tiled Dataflow Architectures, ACM SIGARCH Computer Architecture News, v.34 n.2, p.314-326, May 2006
|
|
|
|
|
|
Karthikeyan Sankaralingam , Ramadass Nagarajan , Robert McDonald , Rajagopalan Desikan , Saurabh Drolia , M. S. Govindan , Paul Gratz , Divya Gulati , Heather Hanson , Changkyu Kim , Haiming Liu , Nitya Ranganathan , Simha Sethumadhavan , Sadia Sharif , Premkishore Shivakumar , Stephen W. Keckler , Doug Burger, Distributed Microarchitectural Protocols in the TRIPS Prototype Processor, Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, p.480-491, December 09-13, 2006
|
|
|
Aaron Smith , Ramadass Nagarajan , Karthikeyan Sankaralingam , Robert McDonald , Doug Burger , Stephen W. Keckler , Kathryn S. McKinley, Dataflow Predication, Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, p.89-102, December 09-13, 2006
|
|
|
|
|
|
|
|
|
|
|
|
Antonio Carlos S. Beck , Mateus B. Rutzig , Georgi Gaydadjiev , Luigi Carro, Transparent reconfigurable acceleration for heterogeneous embedded applications, Proceedings of the conference on Design, automation and test in Europe, March 10-14, 2008, Munich, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|