|
ABSTRACT
The Mips R10000 is a dynamic superscalar microprocessor that implements the 64-bit Mips-4 Instruction Set Architecture. It fetches and decodes four instructions per cycle and dynamically issues them to five fully pipelined low-latency execution units. Instructions can be fetched and executed speculatively beyond branches. Instructions graduate in order upon completion. Although instructions execute out of order, the processor still provides sequential memory consistency and precise exception handling.The R10000 is designed for high performance, even in large real-world applications which have poor memory locality. With speculative execution, it calculates memory addresses and initiates cache refills early. Its hierarchical nonblocking memory system helps hide memory latency with two levels of set-associative, write-back caches.
CITED BY 187
|
|
|
|
|
Harvey J. Wassermann , Olaf M. Lubeck , Yong Luo , Federico Bassetti, Performance evaluation of the SGI Origin2000: a memory-centric characterization of LANL ASCI applications, Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM), p.1-11, November 15-21, 1997, San Jose, CA
|
|
|
|
|
|
|
|
|
Tao Li , Lizy Kurian John , Vijaykrishnan Narayanan , Anand Sivasubramaniam , Jyotsna Sabarinathan , Anupama Murthy, Using complete system simulation to characterize SPECjvm98 benchmarks, Proceedings of the 14th international conference on Supercomputing, p.22-33, May 08-11, 2000, Santa Fe, New Mexico, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Keith I. Farkas , Paul Chow , Norman P. Jouppi , Zvonko Vranesic, The multicluster architecture: reducing cycle time through partitioning, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.149-159, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
|
|
|
|
|
Jack L. Lo , Joel S. Emer , Henry M. Levy , Rebecca L. Stamm , Dean M. Tullsen , S. J. Eggers, Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading, ACM Transactions on Computer Systems (TOCS), v.15 n.3, p.322-354, Aug. 1997
|
|
|
Jih-Kwon Peir , Shih-Chang Lai , Shih-Lien Lu , Jared Stark , Konrad Lai, Bloom filtering cache misses for accurate data speculation and prefetching, Proceedings of the 16th international conference on Supercomputing, June 22-26, 2002, New York, New York, USA
|
|
|
Alper Buyuktosunoglu , David Albonesi , Stanley Schuster , David Brooks , Pradip Bose , Peter Cook, A circuit level implementation of an adaptive issue queue for power-aware microprocessors, Proceedings of the 11th Great Lakes symposium on VLSI, p.73-78, March 2001, West Lafayette, Indiana, United States
|
|
|
|
|
|
|
|
|
Jeff Gibson , Robert Kunz , David Ofelt , Mark Horowitz , John Hennessy , Mark Heinrich, FLASH vs. (simulated) FLASH: closing the simulation loop, ACM SIGPLAN Notices, v.35 n.11, p.49-58, Nov. 2000
|
|
|
Yong Luo , Olaf M. Lubeck , Harvey Wasserman , Federico Bassetti , Kirk W. Cameron, Development and validation of a hierarchical memory model incorporating CPU- and memory-operation overlap model, Proceedings of the 1st international workshop on Software and performance, p.152-163, October 12-16, 1998, Santa Fe, New Mexico, United States
|
|
|
|
|
|
|
|
|
Roger Espasa , Mateo Valero , James E. Smith, Out-of-order vector architectures, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.160-170, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
|
|
Rajeev Balasubramonian , David Albonesi , Alper Buyuktosunoglu , Sandhya Dwarkadas, Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.245-257, December 2000, Monterey, California, United States
|
|
|
Ravi Iyer , Nancy M. Amato , Lawrence Rauchwerger , Laxmi Bhuyan, Comparing the memory system performance of the HP V-class and SGI Origin 2000 multiprocessors using microbenchmarks and scientific applications, Proceedings of the 13th international conference on Supercomputing, p.339-347, June 20-25, 1999, Rhodes, Greece
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Koji Inoue , Tohru Ishihara , Kazuaki Murakami, Way-predicting set-associative cache for high performance and low energy consumption, Proceedings of the 1999 international symposium on Low power electronics and design, p.273-275, August 16-17, 1999, San Diego, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Parthasarathy Ranganathan , Vijay S. Pai , Sarita V. Adve, Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models, Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, p.199-210, June 23-25, 1997, Newport, Rhode Island, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Masahiro Goshima , Kengo Nishino , Toshiaki Kitamura , Yasuhiko Nakashima , Shinji Tomita , Shin-ichiro Mori, A high-speed dynamic instruction scheduling scheme for superscalar processors, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
|
|
|
|
|
Alper Buyuktosunoglu , David H. Albonesi , Stanley Schuster , David Brooks , Pradip Bose , Peter Cook, Power-efficient issue queue design, Power aware computing, Kluwer Academic Publishers, Norwell, MA, 2002
|
|
|
|
|
|
|
|
|
|
|
|
Michael Huang , Jose Renau , Seung-Moon Yoo , Josep Torrellas, L1 data cache decomposition for energy efficiency, Proceedings of the 2001 international symposium on Low power electronics and design, p.10-15, August 2001, Huntington Beach, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
Jeff Gibson , Robert Kunz , David Ofelt , Mark Horowitz , John Hennessy , Mark Heinrich, FLASH vs. (Simulated) FLASH: closing the simulation loop, ACM SIGARCH Computer Architecture News, v.28 n.5, p.49-58, Dec. 2000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Daniel Chaver , Luis Piñuel , Manuel Prieto , Francisco Tirado , Michael C. Huang, Branch prediction on demand: an energy-efficient solution, Proceedings of the 2003 international symposium on Low power electronics and design, August 25-27, 2003, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Dan Teodosiu , Joel Baxter , Kinshuk Govil , John Chapin , Mendel Rosenblum , Mark Horowitz, Hardware fault containment in scalable shared-memory multiprocessors, ACM SIGARCH Computer Architecture News, v.25 n.2, p.73-84, May 1997
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Venkata Syam P. Rapaka , Emil Talpes , Diana Marculescu, Mixed-clock issue queue design for energy aware, high-performance cores, Proceedings of the 2004 conference on Asia South Pacific design automation: electronic design and solution fair, p.380-383, January 27-30, 2004, Yokohama, Japan
|
|
|
|
|
|
|
|
|
Michael Schlansker , Thomas M. Conte , James Dehnert , Kemal Ebcioglu , Jesse Z. Fang , Carol L. Thompson, Compilers for Instruction-Level Parallelism, Computer, v.30 n.12, p.63-69, December 1997
|
|
|
|
|
|
|
|
|
|
|
|
Milo M. K. Martin , Daniel J. Sorin , Harold W. Cain , Mark D. Hill , Mikko H. Lipasti, Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
José F. Martínez , Jose Renau , Michael C. Huang , Milos Prvulovic , Josep Torrellas, Cherry: checkpointed early resource recycling in out-of-order microprocessors, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Minas Dasygenis , Erik Brockmeyer , Bart Durinck , Francky Catthoor , Dimitrios Soudris , Antonios Thanailakis, A combined DMA and application-specific prefetching approach for tackling the memory latency bottleneck, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, v.14 n.3, p.279-291, March 2006
|
|
|
|
|
|
Antonia Zhai , Christopher B. Colohan , J. Gregory Steffan , Todd C. Mowry, Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, p.39, March 20-24, 2004, Palo Alto, California
|
|
|
|
|
|
Oguz Ergin , Deniz Balkan , Kanad Ghose , Dmitry Ponomarev, Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.304-315, December 04-08, 2004, Portland, Oregon
|
|
|
Eric Tune , Rakesh Kumar , Dean M. Tullsen , Brad Calder, Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.183-194, December 04-08, 2004, Portland, Oregon
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Deniz Balkan , Joseph Sharkey , Dmitry Ponomarev , Kanad Ghose, SPARTAN: speculative avoidance of register allocations to transient values for performance and energy efficiency, Proceedings of the 15th international conference on Parallel architectures and compilation techniques, September 16-20, 2006, Seattle, Washington, USA
|
|
|
Francisco J. Mesa-Martínez , Michael C. Huang , Jose Renau, SEED: scalable, efficient enforcement of dependences, Proceedings of the 15th international conference on Parallel architectures and compilation techniques, September 16-20, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
J. S. Hu , N. Vijaykrishnan , S. Kim , M. Kandemir , M. J. Irwin, Scheduling Reusable Instructions for Power Reduction, Proceedings of the conference on Design, automation and test in Europe, p.10148, February 16-20, 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Elham Safi , Patrick Akl , Andreas Moshovos , Andreas Veneris , Aggeliki Arapoyianni, On the latency, energy and area of checkpointed, superscalar register alias tables, Proceedings of the 2007 international symposium on Low power electronics and design, August 27-29, 2007, Portland, OR, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Miquel Pericàs , Adrian Cristal , Francisco J. Cazorla , Ruben González , Alex Veidenbaum , Daniel A. Jiménez , Mateo Valero, A Two-Level Load/Store Queue Based on Execution Locality, ACM SIGARCH Computer Architecture News, v.36 n.3, p.25-36, June 2008
|
|
|
|
|
|
Antonio Carlos S. Beck , Mateus B. Rutzig , Georgi Gaydadjiev , Luigi Carro, Transparent reconfigurable acceleration for heterogeneous embedded applications, Proceedings of the conference on Design, automation and test in Europe, March 10-14, 2008, Munich, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Wei-Wu Hu , Ji-Ye Zhao , Shi-Qiang Zhong , Xu Yang , Elio Guidetti , Chris Wu, Implementing a 1GHz four-issue out-of-order execution microprocessor in a standard cell ASIC methodology, Journal of Computer Science and Technology, v.22 n.1, p.1-14, January 2007
|
|
|
Isidro Gonzalez , Marco Galluzzi , Alex Veidenbaum , Marco A. Ramirez , Adrian Cristal , Mateo Valero, A distributed processor state management architecture for large-window processors, Proceedings of the 2008 41st IEEE/ACM International Symposium on Microarchitecture, p.11-22, November 08-12, 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|