|
Warning: The download time has expired please click on the item to try again.
ABSTRACT
Multiscalar processors use a new, aggressive implementation paradigm for extracting large quantities of instruction level parallelism from ordinary high level language programs. A single program is divided into a collection of tasks by a combination of software and hardware. The tasks are distributed to a number of parallel processing units which reside within a processor complex. Each of these units fetches and executes instructions belonging to its assigned task. The appearance of a single logical register file is maintained with a copy in each parallel processing unit. Register results are dynamically routed among the many parallel processing units with the help of compiler-generated masks. Memory accesses may occur speculatively without knowledge of preceding loads or stores. Addresses are disambiguated dynamically, many in parallel, and processing waits only for true data dependences.This paper presents the philosophy of the multiscalar paradigm, the structure of multiscalar programs, and the hardware architecture of a multiscalar processor. The paper also discusses performance issues in the multiscalar model, and compares the multiscalar paradigm with other paradigms. Experimental results evaluating the performance of a sample of multiscalar organizations are also presented.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Scott E. Breach , T. N. Vijaykumar , Gurindar S. Sohi, The anatomy of the register file in a multiscalar processor, Proceedings of the 27th annual international symposium on Microarchitecture, p.181-190, November 30-December 02, 1994, San Jose, California, United States
[doi> 10.1145/192724.192750]
|
 |
2
|
Ding-Kai Chen , Hong-Men Su , Pen-Chung Yew, The impact of synchronization and granularity on parallel systems, Proceedings of the 17th annual international symposium on Computer Architecture, p.239-248, May 28-31, 1990, Seattle, Washington, United States
|
| |
3
|
|
 |
4
|
|
| |
5
|
|
| |
6
|
Richard E. Hank , Scott A. Mahlke , Roger A. Bringmann , John C. Gyllenhaal , Wen-mei W. Hwu, Superblock formation using static program analysis, Proceedings of the 26th annual international symposium on Microarchitecture, p.247-255, December 01-03, 1993, Austin, Texas, United States
|
 |
7
|
|
 |
8
|
|
 |
9
|
Scott A. Mahlke , David C. Lin , William Y. Chen , Richard E. Hank , Roger A. Bringmann, Effective compiler support for predicated execution using the hyperblock, Proceedings of the 25th annual international symposium on Microarchitecture, p.45-54, December 01-04, 1992, Portland, Oregon, United States
|
 |
10
|
|
| |
11
|
G.S. Tjaden and M. J. Flynn, "Detection and Parallel Execution of Independent Instructions," IEEE Transactions on Computers, vol. C-19, pp. 889-895, October 1970.
|
 |
12
|
|
CITED BY 211
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Eric Hao , Po-Yung Chang , Marius Evers , Yale N. Patt, Increasing the instruction fetch rate via block-structured instruction set architectures, Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture, p.191-200, December 02-04, 1996, Paris, France
|
|
|
|
|
|
|
|
|
|
|
|
Dean M. Tullsen , Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm, Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, ACM SIGARCH Computer Architecture News, v.24 n.2, p.191-202, May 1996
|
|
|
|
|
|
Keith I. Farkas , Paul Chow , Norman P. Jouppi , Zvonko Vranesic, The multicluster architecture: reducing cycle time through partitioning, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.149-159, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
|
|
|
|
|
|
|
|
Jack L. Lo , Joel S. Emer , Henry M. Levy , Rebecca L. Stamm , Dean M. Tullsen , S. J. Eggers, Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading, ACM Transactions on Computer Systems (TOCS), v.15 n.3, p.322-354, Aug. 1997
|
|
|
Chong-Liang Ooi , Seon Wook Kim , Il Park , Rudolf Eigenmann , Babak Falsafi , T. N. Vijaykumar, Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor, Proceedings of the 15th international conference on Supercomputing, p.368-380, June 2001, Sorrento, Italy
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Marco Fillo , Stephen W. Keckler , William J. Dally , Nicholas P. Carter , Andrew Chang , Yevgeny Gurevich , Whay S. Lee, The M-Machine multicomputer, Proceedings of the 28th annual international symposium on Microarchitecture, p.146-156, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Walter Lee , Rajeev Barua , Matthew Frank , Devabhaktuni Srikrishna , Jonathan Babb , Vivek Sarkar , Saman Amarasinghe, Space-time scheduling of instruction-level parallelism on a raw machine, ACM SIGPLAN Notices, v.33 n.11, p.46-57, Nov. 1998
|
|
|
Iván Martel , Daniel Ortega , Eduard Ayguadé , Mateo Valero, Increasing effective IPC by exploiting distant parallelism, Proceedings of the 13th international conference on Supercomputing, p.348-355, June 20-25, 1999, Rhodes, Greece
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Amirali Baniasadi , Andreas Moshovos, Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.337-347, December 2000, Monterey, California, United States
|
|
|
Stephen W. Keckler , William J. Dally , Daniel Maskit , Nicholas P. Carter , Andrew Chang , Whay S. Lee, Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor, ACM SIGARCH Computer Architecture News, v.26 n.3, p.306-317, June 1998
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Eric Rotenberg , Quinn Jacobson , Yiannakis Sazeides , Jim Smith, Trace processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.138-148, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
|
|
|
|
|
Lieven Eeckhout , Tom Vander Aa , Bart Goeman , Hans Vandierendonck , Rudy Lauwereins , Koen De Bosschere, Application domains for fixed-length block structured architectures, Australian Computer Science Communications, v.23 n.4, p.35-44, January 2001
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Doug Burger , Stephen W. Keckler , Kathryn S. McKinley , Mike Dahlin , Lizy K. John , Calvin Lin , Charles R. Moore , James Burrill , Robert G. McDonald , William Yoder , the TRIPS Team, Scaling to the End of Silicon with EDGE Architectures, Computer, v.37 n.7, p.44-55, July 2004
|
|
|
|
|
|
Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm , Dean M. Tullsen, Simultaneous Multithreading: A Platform for Next-Generation Processors, IEEE Micro, v.17 n.5, p.12-19, September 1997
|
|
|
|
|
|
|
|
|
|
|
|
Sarita V. Adve , Doug Burger , Rudolf Eigenmann , Alasdair Rawsthorne , Michael D. Smith , Catherine H. Gebotys , Mahmut T. Kandemir , David J. Lilja , Alok N. Choudhary , Jesse Z. Fang , Pen-Chung Yew, Changing Interaction of Compiler and Architecture, Computer, v.30 n.12, p.51-58, December 1997
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Elliot Waingold , Michael Taylor , Devabhaktuni Srikrishna , Vivek Sarkar , Walter Lee , Victor Lee , Jang Kim , Matthew Frank , Peter Finch , Rajeev Barua , Jonathan Babb , Saman Amarasinghe , Anant Agarwal, Baring It All to Software: Raw Machines, Computer, v.30 n.9, p.86-93, September 1997
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Cliff Jones , David Lomet , Alexander Romanovsky , Gerhard Weikum , Alan Fekete , Marie-Claude Gaudel , Henry F. Korth , Rogerio de Lemos , Eliot Moss , Ravi Rajwar , Krithi Ramamritham , Brian Randell , Luis Rodrigues, The atomic manifesto: a story in four quarks, ACM SIGMOD Record, v.34 n.1, March 2005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Karthikeyan Sankaralingam , Ramadass Nagarajan , Haiming Liu , Changkyu Kim , Jaehyuk Huh , Nitya Ranganathan , Doug Burger , Stephen W. Keckler , Robert G. McDonald , Charles R. Moore, TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP, ACM Transactions on Architecture and Code Optimization (TACO), v.1 n.1, p.62-93, March 2004
|
|
|
Ronny Krashinsky , Christopher Batten , Mark Hampton , Steve Gerding , Brian Pharris , Jared Casper , Krste Asanovic, The Vector-Thread Architecture, IEEE Micro, v.24 n.6, p.84-90, November 2004
|
|
|
|
|
|
Lance Hammond , Brian D. Carlstrom , Vicky Wong , Ben Hertzberg , Mike Chen , Christos Kozyrakis , Kunle Olukotun, Programming with transactional coherence and consistency (TCC), ACM SIGOPS Operating Systems Review, v.38 n.5, December 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Cliff Jones , David Lomet , Alexander Romanovsky , Gerhard Weikum , Alan Fekete , Marie-Claude Gaudel , Henry F. Korth , Rogerio de Lemos , Eliot Moss , Ravi Rajwar , Krithi Ramamritham , Brian Randell , Luis Rodrigues, The atomic manifesto: a story in four quarks, ACM SIGOPS Operating Systems Review, v.39 n.2, p.41-46, April 2005
|
|
|
Lance Hammond , Benedict A. Hubbert , Michael Siu , Manohar K. Prabhu , Michael Chen , Kunle Olukotun, The Stanford Hydra CMP, IEEE Micro, v.20 n.2, p.71-84, March 2000
|
|
|
|
|
|
|
|
|
|
|
|
David M. Brooks , Pradip Bose , Stanley E. Schuster , Hans Jacobson , Prabhakar N. Kudva , Alper Buyuktosunoglu , John-David Wellman , Victor Zyuban , Manish Gupta , Peter W. Cook, Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors, IEEE Micro, v.20 n.6, p.26-44, November 2000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jose Renau , Karin Strauss , Luis Ceze , Wei Liu , Smruti Sarangi , James Tuck , Josep Torrellas, Thread-Level Speculation on a CMP can be energy efficient, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
|
|
|
Jose Renau , James Tuck , Wei Liu , Luis Ceze , Karin Strauss , Josep Torrellas, Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
|
|
|
|
|
|
|
|
|
|
|
|
María Jesús Garzarán , Milos Prvulovic , José María Llabería , Víctor Viñals , Lawrence Rauchwerger , Josep Torrellas, Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors, ACM Transactions on Architecture and Code Optimization (TACO), v.2 n.3, p.247-279, September 2005
|
|
|
Sanjeev Kumar , Michael Chu , Christopher J. Hughes , Partha Kundu , Anthony Nguyen, Hybrid transactional memory, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, March 29-31, 2006, New York, New York, USA
|
|
|
|
|
|
R. González , A. Cristal , M. Pericas , M. Valero , A. Veidenbaum, An asymmetric clustered processor based on value content, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
|
|
|
|
|
|
Julia Chen , Philo Juang , Kevin Ko , Gilberto Contreras , David Penry , Ram Rangan , Adam Stoler , Li-Shiuan Peh , Margaret Martonosi, Hardware-modulated parallelism in chip multiprocessors, ACM SIGARCH Computer Architecture News, v.33 n.4, November 2005
|
|
|
Wei Liu , James Tuck , Luis Ceze , Wonsun Ahn , Karin Strauss , Jose Renau , Josep Torrellas, POSH: a TLS compiler that exploits program structure, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, March 29-31, 2006, New York, New York, USA
|
|
|
|
|
|
|
|
|
|
|
|
José F. Martínez , Jose Renau , Michael C. Huang , Milos Prvulovic , Josep Torrellas, Cherry: checkpointed early resource recycling in out-of-order microprocessors, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
|
|
Karthikeyan Sankaralingam , Ramadass Nagarajan , Haiming Liu , Changkyu Kim , Jaehyuk Huh , Doug Burger , Stephen W. Keckler , Charles R. Moore, Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture, ACM SIGARCH Computer Architecture News, v.31 n.2, May 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
David N. Armstrong , Hyesoon Kim , Onur Mutlu , Yale N. Patt, Wrong Path Events: Exploiting Unusual and Illegal Program Behavior for Early Misprediction Detection and Recovery, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.119-128, December 04-08, 2004, Portland, Oregon
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Brian D. Carlstrom , JaeWoong Chung , Hassan Chafi , Austen McDonald , Chi Cao Minh , Lance Hammond , Christos Kozyrakis , Kunle Olukotun, Executing Java programs with transactional memory, Science of Computer Programming, v.63 n.2, p.111-129, 1 December 2006
|
|
|
Arun Kejariwal , Xinmin Tian , Wei Li , Milind Girkar , Sergey Kozhukhov , Hideki Saito , Utpal Banerjee , Alexandru Nicolau , Alexander V. Veidenbaum , Constantine D. Polychronopoulos, On the performance potential of different types of speculative thread-level parallelism: The DL version of this paper includes corrections that were not made available in the printed proceedings, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Richard B. Kujoth , Chi-Wei Wang , Jeffrey J. Cook , Derek B. Gottlieb , Nicholas P. Carter, A wire delay-tolerant reconfigurable unit for a clustered programmable-reconfigurable processor, Microprocessors & Microsystems, v.31 n.2, p.146-159, March, 2007
|
|
|
|
|
|
Shailender Chaudhry , Robert Cypher , Magnus Ekman , Martin Karlsson , Anders Landin , Sherman Yip , Håkan Zeffer , Marc Tremblay, Simultaneous speculative threading: a novel pipeline architecture implemented in sun's rock processor, ACM SIGARCH Computer Architecture News, v.37 n.3, June 2009
|
|
|
Jung Ho Ahn , Mattan Erez , William J. Dally, Tradeoff between data-, instruction-, and thread-level parallelism in stream processors, Proceedings of the 21st annual international conference on Supercomputing, June 17-21, 2007, Seattle, Washington
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ronny Krashinsky , Christopher Batten , Mark Hampton , Steve Gerding , Brian Pharris , Jared Casper , Krste Asanovic, The Vector-Thread Architecture, ACM SIGARCH Computer Architecture News, v.32 n.2, p.52, March 2004
|
|
|
|
|
|
Smruti R. Sarangi , Wei Liu, Josep Torrellas , Yuanyuan Zhou, ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.257-270, November 12-16, 2005, Barcelona, Spain
|
|
|
Taku Ohsawa , Masamichi Takagi , Shoji Kawahara , Satoshi Matsushita, Pinot: Speculative Multi-threading Processor Architecture Exploiting Parallelism over a Wide Range of Granularities, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.81-92, November 12-16, 2005, Barcelona, Spain
|
|
|
Jiwei Lu , Abhinav Das , Wei-Chung Hsu , Khoa Nguyen , Santosh G. Abraham, Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.93-104, November 12-16, 2005, Barcelona, Spain
|
|
|
Guilherme Ottoni , Ram Rangan , Adam Stoler , David I. August, Automatic Thread Extraction with Decoupled Software Pipelining, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.105-118, November 12-16, 2005, Barcelona, Spain
|
|
|
|
|
|
|
|
|
Michael Bedford Taylor , Walter Lee , Jason Miller , David Wentzlaff , Ian Bratt , Ben Greenwald , Henry Hoffmann , Paul Johnson , Jason Kim , James Psota , Arvind Saraf , Nathan Shnidman , Volker Strumpen , Matt Frank , Saman Amarasinghe , Anant Agarwal, Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams, ACM SIGARCH Computer Architecture News, v.32 n.2, p.2, March 2004
|
|
|
Richard A. Hankins , Gautham N. Chinya , Jamison D. Collins , Perry H. Wang , Ryan Rakvic , Hong Wang , John P. Shen, Multiple Instruction Stream Processor, ACM SIGARCH Computer Architecture News, v.34 n.2, p.114-127, May 2006
|
|
|
|
|
|
|
|
|
|
|
|
Lance Hammond , Vicky Wong , Mike Chen , Brian D. Carlstrom , John D. Davis , Ben Hertzberg , Manohar K. Prabhu , Honggo Wijaya , Christos Kozyrakis , Kunle Olukotun, Transactional Memory Coherence and Consistency, ACM SIGARCH Computer Architecture News, v.32 n.2, p.102, March 2004
|
|
|
|
|
|
|
|
|
|
|
|
Haitham Akkary , Komal Jothi , Renjith Retnamma , Satyanarayana Nekkalapu , Doug Hall , Shahrokh Shahidzadeh, On the potential of latency tolerant execution in speculative multithreading, Proceedings of the 1st international forum on Next-generation multicore/manycore technologies, November 24-25, 2008, Cairo, Egypt
|
|
|
Lukasz Ziarek , Suresh Jagannathan , Matthew Fluet , Umut A. Acar, Speculative N-Way barriers, Proceedings of the 4th workshop on Declarative aspects of multicore programming, January 20-20, 2009, Savannah, GA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Easwaran Raman , Guilherme Ottoni , Arun Raman , Matthew J. Bridges , David I. August, Parallel-stage decoupled software pipelining, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization, April 05-09, 2008, Boston, MA, USA
|
|
|
|
|
|
|
|
|
|
|
|
Easwaran Raman , Neil Va hharajani , Ram Rangan , David I. August, Spice: speculative parallel iteration chunk execution, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization, April 05-09, 2008, Boston, MA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Miquel Pericàs , Adrian Cristal , Francisco J. Cazorla , Ruben González , Alex Veidenbaum , Daniel A. Jiménez , Mateo Valero, A Two-Level Load/Store Queue Based on Execution Locality, ACM SIGARCH Computer Architecture News, v.36 n.3, p.25-36, June 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jose Renau , Karin Strauss , Luis Ceze , Wei Liu , Smruti R. Sarangi , James Tuck , Josep Torrellas, Energy-Efficient Thread-Level Speculation, IEEE Micro, v.26 n.1, p.80-91, January 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Josep Torrellas , Luis Ceze , James Tuck , Calin Cascaval , Pablo Montesinos , Wonsun Ahn , Milos Prvulovic, The Bulk Multicore architecture for improved programmability, Communications of the ACM, v.52 n.12, December 2009
|
|
|
|
|
|
|
|