|
ABSTRACT
Tuning supercomputer application performance often requires analyzing the interaction of the application and the underlying architecture. In this paper, we describe support in the MIPS R10000 for non-intrusively monitoring a variety of processor events -- support that is particularly useful for characterizing the dynamic behavior of multi-level memory hierarchies, hardware-based cache coherence, and speculative execution. We first explain how performance data is collected using an integrated set of hardware mechanisms, operating system abstractions, and performance tools. We then describe several examples drawn from scientific applications that illustrate how the counters and profiling tools provide information that helps developers analyze and tune applications.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Cray Research Inc. UNICOS Performance Utilities Reference Manual. Cray Research Publication SR-2040. January, 1994.
|
| |
3
|
Digital Equipment Corporation. pfm - The 21064 Performance Counter Pseudo-Device. DEC OSF/1 Manual pages. 1995.
|
| |
4
|
S. J. Eggers and T. E. Jeremiassen. Eliminating False Sharing. International Conference on Parallel Processing, 377-381. August 1991. http://netlib.att.com/netlib/att/cs/home/jeremiassen/papers/icpp-91.html
|
| |
5
|
A. Eustace and A. Srivastava. ATOM: A Flexible Interface for Building High Performance Program Analysis Tools. DEC WRL Technical Note TN-44. July 1994. http://www.research.digital.com/wrl/publications/abstracts/TN-44.html
|
| |
6
|
M. Galles and E. Williams. Performance Optimizations, Implementation, and Verification of the SGI Challenge Multiprocessor. Proceedings of the 27th Annual Hawaii International Conference on System Sciences, 1994. http://www.sgi.com/Technology/challenge_paper.html
|
| |
7
|
|
| |
8
|
|
 |
9
|
|
 |
10
|
Mark Horowitz , Margaret Martonosi , Todd C. Mowry , Michael D. Smith, Informing memory operations: providing memory performance feedback in modern processors, Proceedings of the 23rd annual international symposium on Computer architecture, p.260-270, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
| |
11
|
|
 |
12
|
|
| |
13
|
|
| |
14
|
Jussi Mäki. POWER2 Hardware Performance Monitor Tools. Nov. 1995. http://www.csc.fi/~jmaki/rs2hpm-paper
|
 |
15
|
Margaret Martonosi , Douglas W. Clark , Malena Mesarina, The SHRIMP performance monitor: design and applications, Proceedings of the SIGMETRICS symposium on Parallel and distributed tools, p.61-69, May 22-23, 1996, Philadelphia, Pennsylvania, United States
[doi> 10.1145/238020.238040]
|
| |
16
|
Terje Mathisen. Pentium Secrets. Byte Magazine, July 1994, 191-192. http://green.kaist.ac.kr/jwhahn/art3.htm
|
| |
17
|
|
| |
18
|
MIPS Technologies Inc. R10000 Microprocessor Technical Brief. October 1994. http://www.mips.com/r10k/
|
| |
19
|
MIPS Technologies Inc. The R10000 Superscalar Microprocessor. Hotchips 1995. Presentation available at http://www.mips.com/r10k/
|
| |
20
|
MIPS Technologies Inc. R10000 Microprocessor User's Manual-Version 1.1, Section 14.20: Coprocessor 0 Performance Counter Registers. April 1996. http://www.mips.com/r10k/
|
| |
21
|
Silicon Graphics Inc. Power Challenge Technical Report. http://www.sgi.com/Products/software/PDF/pwr-chlg/
|
| |
22
|
Silicon Graphics Inc. Performance Co-Pilot User's and Administrator's Guide. Document Number 007-2614-001. http://www.sgi.com/Technology/TechPubs/
|
 |
23
|
|
 |
24
|
Josep Torrellas , Anoop Gupta , John Hennessy, Characterizing the caching and synchronization performance of a multiprocessor operating system, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.162-174, October 12-15, 1992, Boston, Massachusetts, United States
|
| |
25
|
|
| |
26
|
E. H. Welbon, C. C. Chan-Nui, D. J. Shippy, and D. A. Hicks. POWER2 Performance Monitor. PowerPC and POWER2: Technical Aspects of the New IBM RISC System/6000, IBM Corporation, SA23-2737, pp. 55-63. http://www.austin.ibm.com/tech/monitor.html
|
| |
27
|
|
 |
28
|
|
CITED BY 54
|
|
Sameer Shende , Allen D. Malony , Janice Cuny , Peter Beckman , Steve Karmesin , Kathleen Lindlan, Portable profiling and tracing for parallel, scientific applications using C++, Proceedings of the SIGMETRICS symposium on Parallel and distributed tools, p.134-145, August 03-04, 1998, Welches, Oregon, United States
|
|
|
|
|
|
Luc Renambot , Bruno Arnaldi , Thierry Priol , Xavier Pueyo, Towards efficient parallel radiosity for DSM-based parallel computers using virtual interfaces, Proceedings of the IEEE symposium on Parallel rendering, p.79-86, October 20-21, 1997, Phoenix, Arizona, United States
|
|
|
|
|
|
|
|
|
Luiz DeRose , K. Ekanadham , Jeffrey K. Hollingsworth , Simone Sbaraglia, SIGMA: a simulator infrastructure to guide memory analysis, Proceedings of the 2002 ACM/IEEE conference on Supercomputing, p.1-13, November 16, 2002, Baltimore, Maryland
|
|
|
Jennifer M. Anderson , Lance M. Berc , Jeffrey Dean , Sanjay Ghemawat , Monika R. Henzinger , Shun-Tak A. Leung , Richard L. Sites , Mark T. Vandevoorde , Carl A. Waldspurger , William E. Weihl, Continuous profiling: where have all the cycles gone?, ACM SIGOPS Operating Systems Review, v.31 n.5, p.1-14, Dec. 1997
|
|
|
Yong Luo , Olaf M. Lubeck , Harvey Wasserman , Federico Bassetti , Kirk W. Cameron, Development and validation of a hierarchical memory model incorporating CPU- and memory-operation overlap model, Proceedings of the 1st international workshop on Software and performance, p.152-163, October 12-16, 1998, Santa Fe, New Mexico, United States
|
|
|
Ravi Iyer , Nancy M. Amato , Lawrence Rauchwerger , Laxmi Bhuyan, Comparing the memory system performance of the HP V-class and SGI Origin 2000 multiprocessors using microbenchmarks and scientific applications, Proceedings of the 13th international conference on Supercomputing, p.339-347, June 20-25, 1999, Rhodes, Greece
|
|
|
|
|
|
Venkata K. Pingali , Sally A. McKee , Wilson C. Hseih , John B. Carter, Computation regrouping: restructuring programs for temporal data cache locality, Proceedings of the 16th international conference on Supercomputing, June 22-26, 2002, New York, New York, USA
|
|
|
|
|
|
Jennifer M. Anderson , Lance M. Berc , Jeffrey Dean , Sanjay Ghemawat , Monika R. Henzinger , Shun-Tak A. Leung , Richard L. Sites , Mark T. Vandevoorde , Carl A. Waldspurger , William E. Weihl, Continuous profiling: where have all the cycles gone?, ACM Transactions on Computer Systems (TOCS), v.15 n.4, p.357-390, Nov. 1997
|
|
|
Yan Solihin , Vinh Lam , Josep Torrellas, Scal-Tool: pinpointing and quantifying scalability bottlenecks in DSM multiprocessors, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p.17-es, November 14-19, 1999, Portland, Oregon, United States
|
|
|
|
|
|
|
|
|
|
|
|
Jack Dongarra , Ian Foster , Geoffrey Fox , William Gropp , Ken Kennedy , Linda Torczon , Andy White, References, Sourcebook of parallel computing, Morgan Kaufmann Publishers Inc., San Francisco, CA, 2003
|
|
|
Harvey J. Wassermann , Olaf M. Lubeck , Yong Luo , Federico Bassetti, Performance evaluation of the SGI Origin2000: a memory-centric characterization of LANL ASCI applications, Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM), p.1-11, November 15-21, 1997, San Jose, CA
|
|
|
|
|
|
|
|
|
|
|
|
Rohit Chandra , Ding-Kai Chen , Robert Cox , Dror E. Maydan , Nenad Nedeljkovic , Jennifer M. Anderson, Data distribution support on distributed shared memory multiprocessors, ACM SIGPLAN Notices, v.32 n.5, p.334-345, May 1997
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Peter F. Sweeney , Matthias Hauswirth , Brendon Cahoon , Perry Cheng , Amer Diwan , David Grove , Michael Hind, Using hardware performance monitors to understand the behavior of java applications, Proceedings of the 3rd conference on Virtual Machine Research And Technology Symposium, p.5-5, May 06-07, 2004, San Jose, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|