| Fair and balanced?: bias in bug-fix datasets |
| Full text |
Pdf
(772 KB)
|
Source
|
Foundations of Software Engineering
archive
Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
table of contents
Amsterdam, The Netherlands
SESSION: Empirical software engineering
table of contents
Pages: 121-130
Year of Publication: 2009
ISBN:978-1-60558-001-2
|
|
Authors
|
|
Christian Bird
|
University of California, Davis, Davis, CA, USA
|
|
Adrian Bachmann
|
University of Zurich, Zurich, Switzerland
|
|
Eirik Aune
|
Univeristy of California, Davis, Davis, CA, USA
|
|
John Duffy
|
University of California, Davis, Davis, CA, USA
|
|
Abraham Bernstein
|
University of Zurich, Zurich, Switzerland
|
|
Vladimir Filkov
|
University of California, Davis, Davis, CA, USA
|
|
Premkumar Devanbu
|
University of California, Davis, Davis, CA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 36, Downloads (12 Months): 125, Citation Count: 1
|
|
|
ABSTRACT
Software engineering researchers have long been interested in where and why bugs occur in code, and in predicting where they might turn up next. Historical bug-occurence data has been key to this research. Bug tracking systems, and code version histories, record when, how and by whom bugs were fixed; from these sources, datasets that relate file changes to bug fixes can be extracted. These historical datasets can be used to test hypotheses concerning processes of bug introduction, and also to build statistical bug prediction models. Unfortunately, processes and humans are imperfect, and only a fraction of bug fixes are actually labelled in source code version histories, and thus become available for study in the extracted datasets. The question naturally arises, are the bug fixes recorded in these historical datasets a fair representation of the full population of bug fixes? In this paper, we investigate historical data from several software projects, and find strong evidence of systematic bias. We then investigate the potential effects of "unfair, imbalanced" datasets on the performance of prediction techniques. We draw the lesson that bias is a critical problem that threatens both the effectiveness of processes that rely on biased datasets to build prediction models and the generalizability of hypotheses tested on biased data.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Agresti and B. Coull. Approximate Is Better Than "Exact" for Interval Estimation ofBinomial Proportions. The American Statistician, 52(2), 1998.
|
| |
2
|
C. Ambroise and G. McLachlan. Selection bias in gene extraction on the basis of microarray gene-expression data. Proceedings of the National Academy of Sciences,99(10):6562--6566, 2002.
|
| |
3
|
A. Bachmann and A. Bernstein. Data retrieval, processing and linking for software process dataanalysis. Technical report, University of Zurich, 2009. Published May, 2009. http://www.ifi.uzh.ch/ddis/people/adrian-bachmann/pdq/.
|
 |
4
|
|
| |
5
|
V. Basili, G. Caldiera, and H. Rombach. The Goal Question Metric Approach. Encyclopedia of Software Engineering, 1:528--532, 1994.
|
| |
6
|
V. Basili and R. Selby Jr. Data collection and analysis in software research and management. Proc. of the American Statistical Association and BiomeasureSociety Joint Statistical Meetings, 1984.
|
| |
7
|
V. Basili and D. Weiss. A methodology for collecting valid software engineering data. IEEE Transactions on Software Engineering, 10(6):728--738,1984.
|
| |
8
|
Y. Benjamini and Y. Hochberg. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 1995.
|
| |
9
|
R. Berk. An introduction to sample selection bias in sociological data. American Sociological Review, 48(3):386--398, 1983.
|
| |
10
|
Bugzilla Fields, http://www.eclipse.org/tptp/home/documents/process/development/bugzilla.html.
|
| |
11
|
C. Catal and B. Diri. A systematic review of software fault prediction studies. Expert Systems With Applications, 2008.
|
| |
12
|
W. J. Conover. Practical Nonparametric Statistics. John Wiley&Sons, 1971.
|
| |
13
|
|
 |
14
|
|
| |
15
|
S. Dowdy, S. Wearden, and D. Chilko. Statistics for research. John Wiley&Sons, third edition, 2004.
|
| |
16
|
Marc Eaddy , Thomas Zimmermann , Kaitlin D. Sherwood , Vibhav Garg , Gail C. Murphy , Nachiappan Nagappan , Alfred V. Aho, Do Crosscutting Concerns Cause Defects?, IEEE Transactions on Software Engineering, v.34 n.4, p.497-515, July 2008
[doi> 10.1109/TSE.2008.36]
|
| |
17
|
P. Easterbrook, J. Berlin, R. Gopalan, and D. Matthews. Publication bias in clinical research. Lancet, 337(8746):867--72, 1991.
|
| |
18
|
S. Easterbrook, J. Singer, M. Storey, and D. Damian. Selecting Empirical Methods for Software Engineering Research. Guide to Advanced Empirical Software Engineering, 2007.
|
| |
19
|
|
| |
20
|
L. Gasser and G. Ripoche. Distributed collective practices and free/open-source software problem management: perspectives and methods. Proceedings of the Conference on Coopration, Innovations et Technologies, 2003.
|
| |
21
|
M. Grabe, S. Zhou, and B. Barnett. Explicating sensationalism in television news: Content and the bells and whistles of form. Journal of Broadcasting&Electronic Media, 45:635, 2001.
|
| |
22
|
|
| |
23
|
J. Heckman. Sample Selection Bias as a Specification Error. Econometrica, 47(1):153--161, 1979.
|
 |
24
|
|
 |
25
|
Sunghun Kim , Kai Pan , E. E. James Whitehead, Jr., Memories of bug fixes, Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering, November 05-11, 2006, Portland, Oregon, USA
[doi> 10.1145/1181775.1181781]
|
| |
26
|
|
| |
27
|
|
 |
28
|
|
| |
29
|
|
| |
30
|
M. R. Levy. The Methodology and Performance of Election Day Polls. Public Opinion Quarterly, 47(1):54--67, 1983.
|
 |
31
|
|
| |
32
|
Gernot Liebchen , Bheki Twala , Martin Shepperd , Michelle Cartwright , Mark Stephens, Filtering, Robust Filtering, Polishing: Techniques for Addressing Quality in Software Data, Proceedings of the First International Symposium on Empirical Software Engineering and Measurement, p.99-106, September 20-21, 2007
[doi> 10.1109/ESEM.2007.48]
|
| |
33
|
R. Little and D. Rubin. Statistical analysis with missing data. Technometrics, 45(4):364--365, 2003.
|
| |
34
|
|
| |
35
|
A. Mockus. Missing Data in Software Engineering. Empirical Methods in Software Engineering. The MIT Press), 2000.
|
 |
36
|
|
 |
37
|
Stephan Neuhaus , Thomas Zimmermann , Christian Holler , Andreas Zeller, Predicting vulnerable software components, Proceedings of the 14th ACM conference on Computer and communications security, October 28-31, 2007, Alexandria, Virginia, USA
[doi> 10.1145/1315245.1315311]
|
| |
38
|
M. Nick and C. Tautz. Practical evaluation of an organizational memory using the goal-question-metric technique. Lecture notes in computer science, pages 138--147, 1999.
|
| |
39
|
R. Nickerson. Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology, 2:175--220, 1998.
|
 |
40
|
|
| |
41
|
Promise '08: Proceedings of the 4th international workshop on predictor models in software engineering, 2008. Eds. B. Boetticher and T. Ostrand.
|
| |
42
|
Promise Dataset, http://promisedata.org.
|
 |
43
|
|
| |
44
|
|
| |
45
|
R. A. Singleton, Jr. and B. C. Straits. Approaches to Social Research. Oxford University Press, 2005.
|
 |
46
|
|
| |
47
|
K. Weiss. Confounding, ascertainment bias, and the blind quest for a genetic 'fountain of youth'. Annals of Medicine, 35:532--544, 2003.
|
 |
48
|
|
| |
49
|
|
| |
50
|
T. Zimmermann and P. Weißgerber. Preprocessing CVS data for fine-grained analysis. In Proceedings of the International Workshop on Mining Software Repositories, 2004.
|
| |
51
|
|
CITED BY
|
|
Thomas Zimmermann , Nachiappan Nagappan , Harald Gall , Emanuel Giger , Brendan Murphy, Cross-project defect prediction: a large scale experiment on data vs. domain vs. process, Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, August 24-28, 2009, Amsterdam, The Netherlands
|
|