|
ABSTRACT
Understanding program behavior is at the foundation of computer architecture and program optimization. Many programs have wildly different behavior on even the very largest of scales (over the complete execution of the program). This realization has ramifications for many architectural and compiler techniques, from thread scheduling, to feedback directed optimizations, to the way programs are simulated. However, in order to take advantage of time-varying behavior, we must first develop the analytical tools necessary to automatically and efficiently analyze program behavior over large sections of execution.Our goal is to develop automatic techniques that are capable of finding and exploiting the Large Scale Behavior of programs (behavior seen over billions of instructions). The first step towards this goal is the development of a hardware independent metric that can concisely summarize the behavior of an arbitrary section of execution in a program. To this end we examine the use of Basic Block Vectors. We quantify the effectiveness of Basic Block Vectors in capturing program behavior across several different architectural metrics, explore the large scale behavior of several programs, and develop a set of algorithms based on clustering capable of analyzing this behavior. We then demonstrate an application of this technology to automatically determine where to simulate for a program to help guide computer architecture research.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Ben-Dor, R. Shamir, and Z. Yakhini. Clustering gene expression patterns. Journal of Computational Biology, 6:281-297, 1999.
|
| |
2
|
|
| |
3
|
D. C. Burger and T. M. Austin. The simplescalar tool set, version 2.0. Technical Report CS-TR-97-1342, University of Wisconsin, Madison, June 1997.
|
| |
4
|
|
| |
5
|
|
| |
6
|
G. Hamerly and C. Elkan. Learning the k in k-means. Technical Report CS2002-0716, University of California, San Diego, 2002.
|
| |
7
|
J. Haskins and K. Skadron. Minimal subset evaluation: Rapid warm-up for simulated hardware state. In Proceedings of the 2001 International Conference on Computer Design, September 2001.
|
| |
8
|
|
 |
9
|
|
| |
10
|
|
| |
11
|
R. E. Kass and L. Wasserman. A reference Bayesian test for nested hypotheses and its relationship to the schwarz criterion. Journal of the American Statistical Association, 90(431):928-934, 1995.
|
| |
12
|
A. KleinOsowski, J. Flynn, N. Meares, and D. Lilja. Adapting the spec 2000 benchmark suite for simulation-based computer architecture research. In Proceedings of the International Conference on Computer Design, September 2000.
|
| |
13
|
|
| |
14
|
J. MacQueen. Some methods for classification and analysis of multivariate observations. In L. M. LeCam and J. Neyman, editors, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 281-297, Berkeley, CA, 1967. University of California Press.
|
| |
15
|
|
 |
16
|
|
| |
17
|
|
| |
18
|
T. Sherwood and B. Calder. Time varying behavior of programs. Technical Report UCSD-CS99-630, UC San Diego, August 1999.
|
| |
19
|
|
| |
20
|
T. Sherwood, S. Sair, and B. Calder. Phase tracking and prediction. Technical Report CS2002-0710, UC San Diego, June 2002.
|
 |
21
|
|
 |
22
|
|
CITED BY 235
|
|
|
|
|
Tor M. Aamodt , Pedro Marcuello , Paul Chow , Antonio González , Per Hammarlund , Hong Wang , John P. Shen, A framework for modeling and optimization of prescient instruction prefetch, ACM SIGMETRICS Performance Evaluation Review, v.31 n.1, June 2003
|
|
|
|
|
|
Dan Ernst , Shidhartha Das , Seokwoo Lee , David Blaauw , Todd Austin , Trevor Mudge , Nam Sung Kim , Krisztian Flautner, Razor: Circuit-Level Correction of Timing Errors for Low-Power Operation, IEEE Micro, v.24 n.6, p.10-20, November 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kevin Skadron , Margaret Martonosi , David I. August , Mark D. Hill , David J. Lilja , Vijay S. Pai, Challenges in Computer Architecture Evaluation, Computer, v.36 n.8, p.30-36, August 2003
|
|
|
|
|
|
Howard Chen , Wei-Chung Hsu , Jiwei Lu , Pen-Chung Yew , Dong-Yuan Chen, Dynamic trace selection using performance monitoring hardware sampling, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, March 23-26, 2003, San Francisco, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jumnit Hong , Eriko Nurvitadhi , Shih-Lien L. Lu, Design, implementation, and verification of active cache emulator (ACE), Proceedings of the internation symposium on Field programmable gate arrays, February 22-24, 2006, Monterey, California, USA
|
|
|
|
|
|
Seokwoo Lee , Shidhartha Das , Valeria Bertacco , Todd Austin , David Blaauw , Trevor Mudge, Circuit-aware architectural simulation, Proceedings of the 41st annual conference on Design automation, June 07-11, 2004, San Diego, CA, USA
|
|
|
|
|
|
Karthikeyan Sankaralingam , Ramadass Nagarajan , Haiming Liu , Changkyu Kim , Jaehyuk Huh , Nitya Ranganathan , Doug Burger , Stephen W. Keckler , Robert G. McDonald , Charles R. Moore, TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP, ACM Transactions on Architecture and Code Optimization (TACO), v.1 n.1, p.62-93, March 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Seokwoo Lee , Shidhartha Das , Toan Pham , Todd Austin , David Blaauw , Trevor Mudge, Reducing pipeline energy demands with local DVS and dynamic retiming, Proceedings of the 2004 international symposium on Low power electronics and design, August 09-11, 2004, Newport Beach, California, USA
|
|
|
Nikolaos Hardavellas , Stephen Somogyi , Thomas F. Wenisch , Roland E. Wunderlich , Shelley Chen , Jangwoo Kim , Babak Falsafi , James C. Hoe , Andreas G. Nowatzyk, SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture, ACM SIGMETRICS Performance Evaluation Review, v.31 n.4, p.31-34, March 2004
|
|
|
Jared C. Smolens , Brian T. Gold , Jangwoo Kim , Babak Falsafi , James C. Hoe , Andreas G. Nowatzyk, Fingerprinting: Bounding Soft-Error-Detection Latency and Bandwidth, IEEE Micro, v.24 n.6, p.22-29, November 2004
|
|
|
Martin Schulz , Brian S. White , Sally A. McKee , Hsien-Hsin S. Lee , Jürgen Jeitner, Owl: next generation system monitoring, Proceedings of the 2nd conference on Computing frontiers, May 04-06, 2005, Ischia, Italy
|
|
|
Ashok Jagannathan , Hannah Honghua Yang , Kris Konigsfeld , Dan Milliron , Mosur Mohan , Michail Romesis , Glenn Reinman , Jason Cong, Microarchitecture evaluation with floorplanning and interconnect pipelining, Proceedings of the 2005 conference on Asia South Pacific design automation, January 18-21, 2005, Shanghai, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Aditya Toomula , Jaspal Subhlok, Replicating memory behavior for performance prediction, Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems, p.1-8, October 22-23, 2004, Houston, Texas
|
|
|
|
|
|
|
|
|
Chi-Keung Luk , Robert Cohn , Robert Muth , Harish Patil , Artur Klauser , Geoff Lowney , Steven Wallace , Vijay Janapa Reddi , Kim Hazelwood, Pin: building customized program analysis tools with dynamic instrumentation, ACM SIGPLAN Notices, v.40 n.6, June 2005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kaushal Sanghai , Ting Su , Jennifer Dy , David Kaeli, A multinomial clustering model for fast simulation of computer architecture designs, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
Jason Cong , Ashok Jagannathan , Glenn Reinman , Yuval Tamir, Understanding the energy efficiency of SMT and CMP with multiclustering, Proceedings of the 2005 international symposium on Low power electronics and design, August 08-10, 2005, San Diego, CA, USA
|
|
|
|
|
|
|
|
|
|
|
|
Cristiano Pereira , Jeremy Lau , Brad Calder , Rajesh Gupta, Dynamic phase analysis for cycle-close trace generation, Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, September 19-21, 2005, Jersey City, NJ, USA
|
|
|
|
|
|
|
|
|
|
|
|
Robert Springer , David K. Lowenthal , Barry Rountree , Vincent W. Freeh, Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, March 29-31, 2006, New York, New York, USA
|
|
|
|
|
|
|
|
|
Minglong Shao , Anastassia Ailamaki , Babak Falsafi, DBmbench: fast and accurate database workload representation on modern microarchitecture, Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research, p.254-267, October 17-20, 2005, Toranto, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Shobana Padmanabhan , Phillip Jones , David V. Schuehler , Scott J. Friedman , Praveen Krishnamurthy , Huakai Zhang , Roger Chamberlain , Ron K. Cytron , Jason Fritts , John W. Lockwood, Extracting and improving microarchitecture performance on reconfigurable architectures, International Journal of Parallel Programming, v.33 n.2, p.115-136, June 2005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jared C. Smolens , Jangwoo Kim , James C. Hoe , Babak Falsafi, Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.257-268, December 04-08, 2004, Portland, Oregon
|
|
|
|
|
|
|
|
|
Harish Patil , Robert Cohn , Mark Charney , Rajiv Kapoor , Andrew Sun , Anand Karunanidhi, Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.81-92, December 04-08, 2004, Portland, Oregon
|
|
|
|
|
|
Eric Tune , Rakesh Kumar , Dean M. Tullsen , Brad Calder, Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.183-194, December 04-08, 2004, Portland, Oregon
|
|
|
|
|
|
|
|
|
|
|
|
John Cavazos , Christophe Dubach , Felix Agakov , Edwin Bonilla , Michael F. P. O'Boyle , Grigori Fursin , Olivier Temam, Automatic performance model construction for the fast software exploration of new hardware designs, Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, October 22-25, 2006, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
Luis M. Ramos , José Luis Briz , Pablo E. Ibáñez , Victor Viñals, Data prefetching in a cache hierarchy with high bandwidth and capacity, Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures, p.37-44, September 16-20, 2006, Seattle, Washington
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kartik K. Agaram , Stephen W. Keckler , Calvin Lin , Kathryn S. McKinley, Decomposing memory performance: data structures and phases, Proceedings of the 2006 international symposium on Memory management, June 10-11, 2006, Ottawa, Ontario, Canada
|
|
|
Wei Wu , Lingling Jin , Jun Yang , Pu Liu , Sheldon X.-D. Tan, A systematic method for functional unit power estimation in microprocessors, Proceedings of the 43rd annual conference on Design automation, July 24-28, 2006, San Francisco, CA, USA
|
|
|
|
|
|
|
|
|
Joshua J. Yi , Hans Vandierendonck , Lieven Eeckhout , David J. Lilja, The exigency of benchmark and compiler drift: designing tomorrow's processors with yesterday's tools, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hans Vandierendonck , Philippe Manet , Thibault Delavallee , Igor Loiselle , Jean-Didier Legat, By-passing the out-of-order execution pipeline to increase energy-efficiency, Proceedings of the 4th international conference on Computing frontiers, May 07-09, 2007, Ischia, Italy
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kenneth Hoste , Aashish Phansalkar , Lieven Eeckhout , Andy Georges , Lizy K. John , Koen De Bosschere, Performance prediction based on inherent program similarity, Proceedings of the 15th international conference on Parallel architectures and compilation techniques, September 16-20, 2006, Seattle, Washington, USA
|
|
|
|
|
|
Nevine AbouGhazaleh , Alexandre Ferreira , Cosmin Rusu , Ruibin Xu , Frank Liberato , Bruce Childers , Daniel Mosse , Rami Melhem, Integrated CPU and l2 cache voltage scaling using machine learning, ACM SIGPLAN Notices, v.42 n.7, July 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jungwoo Ha , Christopher J. Rossbach , Jason V. Davis , Indrajit Roy , Hany E. Ramadan , Donald E. Porter , David L. Chen , Emmett Witchel, Improved error reporting for software that uses black-box components, ACM SIGPLAN Notices, v.42 n.6, June 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
David W. Oehmke , Nathan L. Binkert , Trevor Mudge , Steven K. Reinhardt, How to Fake 1000 Registers, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.7-18, November 12-16, 2005, Barcelona, Spain
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fernando Castro , Luis Pinuel , Daniel Chaver , Manuel Prieto , Michael Huang , Francisco Tirado, DMDC: Delayed Memory Dependence Checking through Age-Based Filtering, Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, p.297-308, December 09-13, 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Arijit Biswas , Paul Racunas , Razvan Cheveresan , Joel Emer , Shubhendu S. Mukherjee , Ram Rangan, Computing Architectural Vulnerability Factors for Address-Based Structures, ACM SIGARCH Computer Architecture News, v.33 n.2, p.532-543, May 2005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Himanshu Kaul , Dennis Sylvester , David Blaauw , Trevor Mudge , Todd Austin, DVS for On-Chip Bus Designs Based on Timing Error Correction, Proceedings of the conference on Design, Automation and Test in Europe, p.80-85, March 07-11, 2005
|
|
|
Dan Ernst , Nam Sung Kim , Shidhartha Das , Sanjay Pant , Rajeev Rao , Toan Pham , Conrad Ziesler , David Blaauw , Todd Austin , Krisztian Flautner , Trevor Mudge, Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation, Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, p.7, December 03-05, 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lei Gao , Stefan Kraemer , Rainer Leupers , Gerd Ascheid , Heinrich Meyr, A fast and generic hybrid simulation approach using C virtual machine, Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, September 30-October 03, 2007, Salzburg, Austria
|
|
|
|
|
|
|
|
|
|
|
|
Stefan Kraemer , Lei Gao , Jan Weinstock , Rainer Leupers , Gerd Ascheid , Heinrich Meyr, HySim: a fast simulation framework for embedded software development, Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis, September 30-October 03, 2007, Salzburg, Austria
|
|
|
Engin Ipek , Sally A. McKee , Karan Singh , Rich Caruana , Bronis R. de Supinski , Martin Schulz, Efficient architectural design space exploration via predictive modeling, ACM Transactions on Architecture and Code Optimization (TACO), v.4 n.4, p.1-34, January 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Vincent W. Freeh , David K. Lowenthal , Feng Pan , Nandini Kappiah , Rob Springer , Barry L. Rountree , Mark E. Femal, Analyzing the Energy-Time Trade-Off in High-Performance Computing Applications, IEEE Transactions on Parallel and Distributed Systems, v.18 n.6, p.835-848, June 2007
|
|
|
Ke Meng , Russ Joseph , Robert P. Dick , Li Shang, Multi-optimization power management for chip multiprocessors, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Melhem Tawk , Khaled Z. Ibrahim , Smail Niar, Multi-granularity sampling for simulating concurrent heterogeneous applications, Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems, October 19-24, 2008, Atlanta, GA, USA
|
|
|
|
|
|
|
|
|
|
|
|
Yunlian Jiang , Xipeng Shen , Jie Chen , Rahul Tripathi, Analysis and approximation of optimal co-scheduling on chip multiprocessors, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hongzhou Chen , Lingdi Ping , Xuezeng Pan , Kuijun Lu , Xiaoning Jiang, A swarm-inspired resource distribution for SMT processors, Proceedings of the 3rd International Conference on Bio-Inspired Models of Network, Information and Computing Sytems, November 25-28, 2008, Hyogo, Japan
|
|
|
|
|
|
Stefan Valentin Gheorghita , Martin Palkovic , Juan Hamers , Arnout Vandecappelle , Stelios Mamagkakis , Twan Basten , Lieven Eeckhout , Henk Corporaal , Francky Catthoor , Frederik Vandeputte , Koen De Bosschere, System-scenario-based design of dynamic embedded systems, ACM Transactions on Design Automation of Electronic Systems (TODAES), v.14 n.1, p.1-45, January 2009
|
|
|
F. Castro , D. Chaver , L. Pinuel , M. Prieto , F. Tirado, Using age registers for a simple load-store queue filtering, Journal of Systems Architecture: the EUROMICRO Journal, v.55 n.2, p.79-89, February, 2009
|
|
|
|
|
|
|
|
|
|
|
|
Hongzhong Zheng , Jiang Lin , Zhao Zhang , Eugene Gorbatov , Howard David , Zhichun Zhu, Mini-rank: Adaptive DRAM architecture for improving memory power efficiency, Proceedings of the 2008 41st IEEE/ACM International Symposium on Microarchitecture, p.210-221, November 08-12, 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Muli Ben-Yehuda , David Breitgand , Michael Factor , Hillel Kolodner , Valentin Kravtsov , Dan Pelleg, NAP: a building block for remediating performance bottlenecks via black box network analysis, Proceedings of the 6th international conference on Autonomic computing, June 15-19, 2009, Barcelona, Spain
|
|
|
|
|
|
Joshua J. Yi , Lieven Eeckhout , David J. Lilja , Brad Calder , Lizy K. John , James E. Smith, The Future of Simulation: A Field of Dreams, Computer, v.39 n.11, p.22-29, November 2006
|
|
|
Joshua J. Yi , Lieven Eeckhout , David J. Lilja , Brad Calder , Lizy K. John , James E. Smith, The Future of Simulation: A Field of Dreams, Computer, v.39 n.11, p.22-29, November 2006
|
|
|
|
|
|
|
|
|
Ayse K. Coskun , Richard Strong , Dean M. Tullsen , Tajana Simunic Rosing, Evaluating the impact of job scheduling and power management on processor lifetime for chip multiprocessors, Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems, June 15-19, 2009, Seattle, WA, USA
|
|
|
|
|