ACM Home Page
Please provide us with feedback. Feedback
Improving defect prediction using temporal features and non linear models
Full text PdfPdf (241 KB)
Source Foundations of Software Engineering archive
Ninth international workshop on Principles of software evolution: in conjunction with the 6th ESEC/FSE joint meeting table of contents
Dubrovnik, Croatia
SESSION: Mining history table of contents
Pages: 11 - 18  
Year of Publication: 2007
ISBN:978-1-59593-722-3
Authors
Abraham Bernstein  University of Zurich, Switzerland
Jayalath Ekanayake  University of Zurich, Switzerland
Martin Pinzger  University of Zurich, Switzerland
Sponsors
SIGSOFT: ACM Special Interest Group on Software Engineering
CEPIS : The Council of European Professional Informatics Societies
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1294948.1294953
What is a DOI?

ABSTRACT

Predicting the defects in the next release of a large software system is a very valuable asset for the project manger to plan her resources. In this paper we argue that temporal features (or aspects) of the data are central to prediction performance. We also argue that the use of non-linear models, as opposed to traditional regression, is necessary to uncover some of the hidden interrelationships between the features and the defects and maintain the accuracy of the prediction in some cases.

Using data obtained from the CVS and Bugzilla repositories of the Eclipse project, we extract a number of temporal features, such as the number of revisions and number of reported issues within the last three months. We then use these data to predict both the location of defects (i.e., the classes in which defects will occur) as well as the number of reported bugs in the next month of the project. To that end we use standard tree-based induction algorithms in comparison with the traditional regression.

Our non-linear models uncover the hidden relationships between features and defects, and present them in easy to understand form. Results also show that using the temporal features our prediction model can predict whether a source file will have a defect with an accuracy of 99% (area under ROC curve 0.9251) and the number of defects with a mean absolute error of 0.019 (Spearman's correlation of 0.96).


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
 
4
 
5
 
6
C. Kiefer, A. Bernstein, and J. Tappolet. Analyzing software with isparql. In Proceedings of the 3rd International Workshop on Semantic Web Enabled Software Engineering (SWESE 2007). Springer, June 2007. to appear.
7
 
8
 
9
10
 
11
 
12
 
13
 
14
R. J. Quinlan. Learning with continuous classes. In 5th Australian Joint Conference on Artificial Intelligence, pages 343--348, Singapore, 1992.
 
15
16
 
17
 
18


Collaborative Colleagues:
Abraham Bernstein: colleagues
Jayalath Ekanayake: colleagues
Martin Pinzger: colleagues