ACM Home Page
Please provide us with feedback. Feedback
Fair and balanced?: bias in bug-fix datasets
Full text PdfPdf (772 KB)
Source
Foundations of Software Engineering archive
Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering on European software engineering conference and foundations of software engineering symposium table of contents
Amsterdam, The Netherlands
SESSION: Empirical software engineering table of contents
Pages 121-130  
Year of Publication: 2009
ISBN:978-1-60558-001-2
Authors
Christian Bird  University of California, Davis, Davis, CA, USA
Adrian Bachmann  University of Zurich, Zurich, Switzerland
Eirik Aune  Univeristy of California, Davis, Davis, CA, USA
John Duffy  University of California, Davis, Davis, CA, USA
Abraham Bernstein  University of Zurich, Zurich, Switzerland
Vladimir Filkov  University of California, Davis, Davis, CA, USA
Premkumar Devanbu  University of California, Davis, Davis, CA, USA
Sponsors
ACM: Association for Computing Machinery
SIGSOFT: ACM Special Interest Group on Software Engineering
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 30,   Downloads (12 Months): 62,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1595696.1595716
What is a DOI?

ABSTRACT

Software engineering researchers have long been interested in where and why bugs occur in code, and in predicting where they might turn up next. Historical bug-occurence data has been key to this research. Bug tracking systems, and code version histories, record when, how and by whom bugs were fixed; from these sources, datasets that relate file changes to bug fixes can be extracted. These historical datasets can be used to test hypotheses concerning processes of bug introduction, and also to build statistical bug prediction models. Unfortunately, processes and humans are imperfect, and only a fraction of bug fixes are actually labelled in source code version histories, and thus become available for study in the extracted datasets. The question naturally arises, are the bug fixes recorded in these historical datasets a fair representation of the full population of bug fixes? In this paper, we investigate historical data from several software projects, and find strong evidence of systematic bias. We then investigate the potential effects of "unfair, imbalanced" datasets on the performance of prediction techniques. We draw the lesson that bias is a critical problem that threatens both the effectiveness of processes that rely on biased datasets to build prediction models and the generalizability of hypotheses tested on biased data.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Agresti and B. Coull. Approximate Is Better Than "Exact" for Interval Estimation ofBinomial Proportions. The American Statistician, 52(2), 1998.
 
2
C. Ambroise and G. McLachlan. Selection bias in gene extraction on the basis of microarray gene-expression data. Proceedings of the National Academy of Sciences,99(10):6562--6566, 2002.
 
3
A. Bachmann and A. Bernstein. Data retrieval, processing and linking for software process dataanalysis. Technical report, University of Zurich, 2009. Published May, 2009. http://www.ifi.uzh.ch/ddis/people/adrian-bachmann/pdq/.
 
4
A. Bachmann and A. Bernstein. Software process data quality and characteristics - a historical viewon open and closed source projects. IWPSE-EVOL 2009, 2009.
 
5
V. Basili, G. Caldiera, and H. Rombach. The Goal Question Metric Approach. Encyclopedia of Software Engineering, 1:528--532, 1994.
 
6
V. Basili and R. Selby Jr. Data collection and analysis in software research and management. Proc. of the American Statistical Association and BiomeasureSociety Joint Statistical Meetings, 1984.
 
7
V. Basili and D. Weiss. A methodology for collecting valid software engineering data. IEEE Transactions on Software Engineering, 10(6):728--738,1984.
 
8
Y. Benjamini and Y. Hochberg. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 1995.
 
9
R. Berk. An introduction to sample selection bias in sociological data. American Sociological Review, 48(3):386--398, 1983.
 
10
Bugzilla Fields, http://www.eclipse.org/tptp/home/documents/process/development/bugzilla.html.
 
11
C. Catal and B. Diri. A systematic review of software fault prediction studies. Expert Systems With Applications, 2008.
 
12
W. J. Conover. Practical Nonparametric Statistics. John Wiley & Sons, 1971.
 
13
D. Cubranic, G. Murphy, J. Singer, and K. Booth. Hipikat: a project memory for software development. Software Engineering, IEEE Transactions on, 31(6):446--465,2005.
 
14
V. Dallmeier and T. Zimmermann. Extraction of bug localization benchmarks from history. In Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering, 2007.
 
15
S. Dowdy, S. Wearden, and D. Chilko. Statistics for research. John Wiley & Sons, third edition, 2004.
 
16
M. Eaddy, T. Zimmermann, K. Sherwood, V. Garg, G. Murphy, N. Nagappan, and A. Aho. Do Crosscutting Concerns Cause Defects? IEEE Transactions on Software Engineering, 34(4):497--515, 2008.
 
17
P. Easterbrook, J. Berlin, R. Gopalan, and D. Matthews. Publication bias in clinical research. Lancet, 337(8746):867--72, 1991.
 
18
S. Easterbrook, J. Singer, M. Storey, and D. Damian. Selecting Empirical Methods for Software Engineering Research. Guide to Advanced Empirical Software Engineering, 2007.
 
19
M. Fischer, M. Pinzger, and H. Gall. Populating a release history database from version control and bug tracking systems. In ICSM '03: Proceedings of the International Conference on Software Maintenance, page 23, 2003.
 
20
L. Gasser and G. Ripoche. Distributed collective practices and free/open-source software problem management: perspectives and methods. Proceedings of the Conference on Coopration, Innovations et Technologies, 2003.
 
21
M. Grabe, S. Zhou, and B. Barnett. Explicating sensationalism in television news: Content and the bells and whistles of form. Journal of Broadcasting & Electronic Media, 45:635, 2001.
 
22
R. Grady and D. Caswell. Software metrics: establishing a company-wide program. Prentice-Hall, Inc. Upper Saddle River, NJ, USA, 1987.
 
23
J. Heckman. Sample Selection Bias as a Specification Error. Econometrica, 47(1):153--161, 1979.
 
24
I. Herraiz, G. Robles, and J. Gonzalez-Barahona. Towards predictor models for large libre software projects. ACM SIGSOFT Software Engineering Notes, 30(4):1--6, 2005.
 
25
S. Kim, K. Pan, and E. Whitehead Jr. Memories of bug fixes. Proceedings of the ACM SIGSOFT international symposium on Foundations of software engineering, 2006.
 
26
S. Kim, T. Zimmermann, and K. Pan. Automatic Identification of Bug-Introducing Changes. Proceedings of the 21st IEEE International Conference on Automated Software Engineering, 2006.
 
27
S. Kim, T. Zimmermann, E. Whitehead Jr, and A. Zeller. Predicting Faults from Cached History. Proceedings of the International Conference on Software Engineering, 2007.
 
28
A. Koru and H. Liu. An investigation of the effect of module size on defect prediction using static measures. ACM SIGSOFT Software Engineering Notes (Special Promise Issue), 30(4):1--5, 2005.
 
29
A. G. Koru and J. Tian. Defect handling in medium and large open source projects. IEEE Software, 21(4):54--61, July/August 2004.
 
30
M. R. Levy. The Methodology and Performance of Election Day Polls. Public Opinion Quarterly, 47(1):54--67, 1983.
 
31
G. A. Liebchen and M. Shepperd. Data sets and data quality in software engineering. In PROMISE '08: Proceedings of the 4th international workshop on Predictor models in software engineering, 2008.
 
32
G. A. Liebchen, B. Twala, M. J. Shepperd, M. Cartwright, and M. Stephens. Filtering, robust filtering, polishing: Techniques for addressing quality in software data. In ESEM, pages 99--106, 2007.
 
33
R. Little and D. Rubin. Statistical analysis with missing data. Technometrics, 45(4):364--365, 2003.
 
34
T. M. Mitchell. Machine Learning. McGraw-Hill, 1997.
 
35
A. Mockus. Missing Data in Software Engineering. Empirical Methods in Software Engineering. The MIT Press), 2000.
 
36
R. Moser, W. Pedrycz, and G. Succi. A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In Proceedings of the international conference on Software engineering, 2008.
 
37
S. Neuhaus, T. Zimmermann, C. Holler, and A. Zeller. Predicting vulnerable software components. In Proc. of the ACM conference on Computer and communications security, 2007.
 
38
M. Nick and C. Tautz. Practical evaluation of an organizational memory using the goal-question-metric technique. Lecture notes in computer science, pages 138--147, 1999.
 
39
R. Nickerson. Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology, 2:175--220, 1998.
 
40
D. Perry, A. Porter, and L. Votta. Empirical studies of software engineering: a roadmap. In Proc. of the conference on The future of Software engineering, 2000.
 
41
Promise '08: Proceedings of the 4th international workshop on predictor models in software engineering, 2008. Eds. B. Boetticher and T. Ostrand.
 
42
Promise Dataset, http://promisedata.org.
 
43
A. Schröter, T. Zimmermann, and A. Zeller. Predicting component failures at design time. In Proceedings of the International Symposium on Empirical Software Engineering, 2006.
 
44
F. Shull, J. Singer, and D. Sjøberg. Guide to Advanced Empirical Software Engineering. Springer Verlag, 2007.
 
45
R. A. Singleton, Jr. and B. C. Straits. Approaches to Social Research. Oxford University Press, 2005.
 
46
J. Śliwerski, T. Zimmermann, and A. Zeller. When do changes induce fixes? In Proceedings of the international workshop on Mining software repositories, 2005.
 
47
K. Weiss. Confounding, ascertainment bias, and the blind quest for a genetic 'fountain of youth'. Annals of Medicine, 35:532--544, 2003.
 
48
B. Zadrozny. Learning and evaluating classifiers under sample selection bias. In Proc. of the twenty-first international conference on Machine learning, 2004.
 
49
T. Zimmermann, R. Premraj, and A. Zeller. Predicting defects for eclipse. In Proc. of the International Workshop on Predictor Models in Software Engineering, 2007.
 
50
T. Zimmermann and P. Weißgerber. Preprocessing CVS data for fine-grained analysis. In Proceedings of the International Workshop on Mining Software Repositories, 2004.
 
51
S. Zuboff. In the age of the smart machine: the future of work and power. Basic Books, 1988