ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Prefetch optimizations on large-scale applications via parameter value prediction
Full text PdfPdf (341 KB)
Source
International Conference on Supercomputing archive
Proceedings of the 23rd international conference on Supercomputing table of contents
Yorktown Heights, NY, USA
POSTER SESSION: Posters table of contents
Pages: 519-520  
Year of Publication: 2009
ISBN:978-1-60558-498-0
Authors
Shih-wei Liao  Google Inc., Mountain View, CA, USA
Tzu-Han Hung  Princeton University, Princeton, NJ, USA
Donald Nguyen  University of Texas at Austin, Austin, TX, USA
Hucheng Zhou  Google, Beijing, China
Chinyen Chou  National Taiwan University, Taipei, Taiwan Roc
Chiaheng Tu  National Taiwan University, Taipei, Taiwan Roc
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 86,   Citation Count: 0
Additional Information:

abstract   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1542275.1542359
What is a DOI?

ABSTRACT

A typical data center application requires the processor cycles of thousands of machines. Even a single-digit performance improvement can significantly reduce the cost and power consumption of a data center. Unfortunately, achieving sustained improvement, even if modest, is difficult. Data centers are dynamic environments where applications are frequently released and servers are continually upgraded. For maintainability and fault tolerance, the physical capabilities and configuration of the servers are abstracted from the application programmer.

We study application performance under different processor prefetch configurations. These configurations are largely transparent to the programmer, yet we observe a wide range of performance when comparing the worst and best configurations, with relative performance improvement ranging from 1.4% to 75.1%. Alarmingly, one application that consumes many processor cycles has a 23.6% improvement.

Default prefetch configurations favor aggressively prefetching memory, which benefits most applications, but some data center applications have highly tuned memory behavior and aggressive prefetching severely decreases performance. We develop a tuning framework which attempts to predict the optimal configuration based on hardware performance counters. It applies to a large number of performance-critical data center applications without modifying the source codeor binaries. The framework achieves performance within 1% of the best performance of a suite of important data center applications.


Collaborative Colleagues:
Shih-wei Liao: colleagues
Tzu-Han Hung: colleagues
Donald Nguyen: colleagues
Hucheng Zhou: colleagues
Chinyen Chou: colleagues
Chiaheng Tu: colleagues