| Producing wrong data without doing anything obviously wrong! |
| Full text |
Pdf
(498 KB)
|
Source
|
Architectural Support for Programming Languages and Operating Systems
archive
Proceeding of the 14th international conference on Architectural support for programming languages and operating systems
table of contents
Washington, DC, USA
SESSION: Potpourri
table of contents
Pages 265-276
Year of Publication: 2009
ISBN:978-1-60558-406-5
Also published in ...
|
|
Authors
|
|
Todd Mytkowicz
|
University of Colorado, Boulder, CO, USA
|
|
Amer Diwan
|
University of Colorado, Boulder, CO, USA
|
|
Matthias Hauswirth
|
University of Lugano, Lugano, Switzerland
|
|
Peter F. Sweeney
|
IBM Research, Hawthorne, NY, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 95, Downloads (12 Months): 488, Citation Count: 2
|
|
|
ABSTRACT
This paper presents a surprising result: changing a seemingly innocuous aspect of an experimental setup can cause a systems researcher to draw wrong conclusions from an experiment. What appears to be an innocuous aspect in the experimental setup may in fact introduce a significant bias in an evaluation. This phenomenon is called measurement bias in the natural and social sciences. Our results demonstrate that measurement bias is significant and commonplace in computer system evaluation. By significant we mean that measurement bias can lead to a performance analysis that either over-states an effect or even yields an incorrect conclusion. By commonplace we mean that measurement bias occurs in all architectures that we tried (Pentium 4, Core 2, and m5 O3CPU), both compilers that we tried (gcc and Intel's C compiler), and most of the SPEC CPU2006 C programs. Thus, we cannot ignore measurement bias. Nevertheless, in a literature survey of 133 recent papers from ASPLOS, PACT, PLDI, and CGO, we determined that none of the papers with experimental results adequately consider measurement bias. Inspired by similar problems and their solutions in other sciences, we describe and demonstrate two methods, one for detecting (causal analysis) and one for avoiding (setup randomization) measurement bias.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Nathan L. Binkert , Ronald G. Dreslinski , Lisa R. Hsu , Kevin T. Lim , Ali G. Saidi , Steven K. Reinhardt, The M5 Simulator: Modeling Networked Systems, IEEE Micro, v.26 n.4, p.52-60, July 2006
[doi> 10.1109/MM.2006.82]
|
 |
3
|
Stephen M. Blackburn , Robin Garner , Chris Hoffmann , Asjad M. Khang , Kathryn S. McKinley , Rotem Bentzur , Amer Diwan , Daniel Feinberg , Daniel Frampton , Samuel Z. Guyer , Martin Hirzel , Antony Hosking , Maria Jump , Han Lee , J. Eliot B. Moss , B. Moss , Aashish Phansalkar , Darko Stefanović , Thomas VanDrunen , Daniel von Dincklage , Ben Wiedermann, The DaCapo benchmarks: java benchmarking development and analysis, Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications, October 22-26, 2006, Portland, Oregon, USA
|
 |
4
|
|
| |
5
|
S. Browne , J. Dongarra , N. Garner , K. London , P. Mucci, A scalable cross-platform infrastructure for application performance tuning using hardware counters, Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), p.42-es, November 04-10, 2000, Dallas, Texas, United States
|
| |
6
|
Amer Diwan, Han Lee, Dirk Grunwald, and Keith Farkas. Energy consumption and garbage collection in low-powered computing. Technical Report CU-CS-930-02, University of Colorado, 1992.
|
 |
7
|
|
| |
8
|
Intel. Intel 64 and IA-32 Architectures Software Developer's Manual Volume 3B: System Programming Guide. http://www.intel.com/products/processor/manuals/. Order number: 253669--027US, July 2008.
|
| |
9
|
John P. A. Ioannidis. Contradicted and initially stronger effects in highly cited clinical research. The journal of the American Medical Association (JAMA), 294:218--228, 2005.
|
| |
10
|
Sam Kash Kachigan. Statistical Analysis: An Interdisciplinary Introduction to Univariate & Multivariate Methods. Radius Press, 1986.
|
| |
11
|
Tomas Kalibera, Lubomir Bulej, and Petr Tuma. Benchmark precision and random initial state. In Proceedings of the 2005 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS 2005), pages 484--490, San Diego, CA, USA, 2005. SCS.
|
| |
12
|
W. Korn, P. J. Teller, and G. Castillo. Just how accurate are performance counters? In Proceedings of the IEEE International Conference on Performance, Computing, and Communications (IPCCC'01), pages 303--310, 2001.
|
| |
13
|
|
| |
14
|
M. Maxwell, P. Teller, L. Salayandia, and S.Moore. Accuracy of performance monitoring hardware. In Proceedings of the Los Alamos Computer Science Institute Symposium (LACSI'02), October 2002.
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
|