|
ABSTRACT
Prediction of software defects works well within projects as long as there is a sufficient amount of data available to train any models. However, this is rarely the case for new software projects and for many companies. So far, only a few have studies focused on transferring prediction models from one project to another. In this paper, we study cross-project defect prediction models on a large scale. For 12 real-world applications, we ran 622 cross-project predictions. Our results indicate that cross-project prediction is a serious challenge, i.e., simply using models from projects in the same domain or with the same process does not lead to accurate predictions. To help software engineers choose models wisely, we identified factors that do influence the success of cross-project predictions. We also derived decision trees that can provide early estimates for precision, recall, and accuracy before a prediction is attempted.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. J. Albrecht and J. R. Gaffney, "Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation," IEEE Transactions in Software Engineering, vol. 9, pp. 639--648, 1983.
|
| |
2
|
V. Basili, G. Caldiera, and D. H. Rombach, "The Goal Question Metric Paradigm," in Encyclopedia of Software Engineering. vol. 2: John Wiley and Sons, Inc., 1994, pp. 528--532.
|
| |
3
|
V. R. Basili, L. C. Briand, and W. L. Melo, "A Validation of Object Orient Design Metrics as Quality Indicators," IEEE Transactions on Software Engineering, vol. 22, pp. 751--761, 1996.
|
| |
4
|
V. R. Basili, F. Shull, and F. Lanubile, "Building Knowledge Through Families of Experiments," IEEE Transactions on Software Engineering, vol. 25, pp. 456--473, 1999.
|
| |
5
|
C. Bird, A. Bachmann, E. Aune, J. Duffy, A. Bernstein, V. Filkov, and P. Devanbu, "Fair and Balanced? Bias in bug-fix Datasets," in European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), 2009.
|
| |
6
|
B. W. Boehm, C. Abts, A. W. Brown, S. Chulani, B. K. Clark, E. Horowitz, R. Madachy, D. Reifer, and B. Steece, Software Cost Estimation with COCOMO II. Upper Saddle River, NJ: Prentice Hall, 2000.
|
| |
7
|
L. Briand, T. Langley, and I. Wieczorek, "A Replicated Assessment of Common Software Cost Estimation Techniques," in International Conference on Software Engineering, 2000, pp. 377--386.
|
| |
8
|
L. C. Briand, J. Wuest, S. Ikonomovski, and H. Lounis, "Investigating quality factors in object-oriented designs: an industrial case study," in ICSE, 1999, pp. 345--354.
|
| |
9
|
S. R. Chidamber and C. F. Kemerer, "A Metrics Suite for Object Oriented Design," IEEE Transactions on Software Engineering, vol. 20, pp. 476--493, 1994.
|
| |
10
|
T. DeMarco, Controlling Software Projects: Management Measurement and Estimation: Yourdon Press, 1982.
|
| |
11
|
J. Ekanayake, J. Tappolet, H. C. Gall, and A. Bernstein, "Tracking Concept Drift of Software Projects Using Defect Prediction Quality," in IEEE Working Conference on Mining Software Repositories, 2009.
|
| |
12
|
N. E. Fenton and N. Ohlsson, "Quantitative analysis of faults and failures in a complex software system," IEEE Transactions on Software Engineering, vol. 26, pp. 797--814, 2000.
|
| |
13
|
T. L. Graves, A. F. Karr, J. S. Marron, and H. Siy, "Predicting Fault Incidence Using Software Change History," IEEE Transactions on Software Engineering, vol. 26, pp. 653--661, 2000.
|
| |
14
|
T. Gyimothy, R. Ferenc, and I. Siket, "Empirical validation of object-oriented metrics on open source software for fault prediction," IEEE Transactions in Software Engineering, vol. 31, pp. 897--910 2005.
|
| |
15
|
J. Han and M. Kamber, Data Mining Concepts and Techniques: Elsevier, 2006.
|
| |
16
|
B. Kitchenham, E. Mendes, and G. H. Travassos, "Cross- vs. within-company cost estimation studies: A systematic review," IEEE Transactions in Software Engineering, vol. 33, pp. 316--329, 2007.
|
| |
17
|
P. Knab, M. Pinzger, and A. Bernstein, "Predicting defect densities in source code files with decision tree learners," in Mining Software Repositories (MSR 06), 2006, pp. 119--125.
|
| |
18
|
E. Mendes and B. Kitchenham, "Further Comparison of Cross-Company and Within-Company Effort Estimation Models for Web Applications," in IEEE International Symposium on Software Metrics 2004, pp. 348--357.
|
| |
19
|
T. Menzies, J. Greenwald, and A. Frank, "Data Mining Static Code Attributes to Learn Defect Predictors," IEEE Transactions in Software Engineering, vol. 33, pp. 2--13, 2007.
|
| |
20
|
A. Mockus, P. Zhang, and P. Li, "Drivers for customer perceived software quality," in International Conference on Software Engineering (ICSE 05), St. Louis, MO, 2005, pp. 225--233.
|
| |
21
|
J. Munson and T. Khoshgoftaar, "The Detection of Fault-Prone Programs," IEEE Transactions on Software Engineering, vol. 18, pp. 423--433, 1992.
|
| |
22
|
J. C. Munson and S. Elbaum, "Code Churn: A Measure for Estimating the Impact of Code Change," in IEEE International Conference on Software Maintenence, 1998, pp. 24--31.
|
| |
23
|
N. Nagappan and T. Ball, "Use of Relative Code Churn Measures to Predict System Defect Density," in International Conference on Software Engineering (ICSE), St. Louis, MO, 2005, pp. 284--292.
|
| |
24
|
N. Nagappan and T. Ball, "Using Software Dependencies and Churn Metrics to Predict Field Failures: An Empirical Case Study," in International Symposium on Empirical Software Engineering and Measurement, 2007, pp. 364--373.
|
| |
25
|
N. Nagappan, T. Ball, and B. Murphy, "Using Historical In-Process and Product Metrics for Early Estimation of Software Failures," in International Symposium on Software Reliability Engineering, 2006, pp. 62--74.
|
| |
26
|
N. Nagappan, T. Ball, and A. Zeller, "Mining metrics to predict component failures," in International Conference on Software Engineering, 2006, pp. 452--461.
|
| |
27
|
N. Nagappan, B. Murphy, and V. Basili, "The Influence of Organizational Structure on Software Quality: An Empirical Case Study," in Int. Conference on Software Engineering, 2008, pp. 521--530.
|
| |
28
|
M. C. Ohlsson, A. von Mayrhauser, B. McGuire, and C. Wohlin, "Code Decay Analysis of Legacy Software through Successive Releases," in IEEE Aerospace Conference, 1999, pp. 69--81.
|
| |
29
|
T. J. Ostrand, E. J. Weyuker, and R. M. Bell, "Where the Bugs Are," in International Symposium on Software Testing and Analysis (ISSTA), 2004, pp. 86--96.
|
| |
30
|
L. Putnam and A. Fitzsimmons, "Estimating software costs," Datamation, vol. 25, 1979.
|
| |
31
|
H. A. Rubin, "Macroestimation of software development parameters:The Estimacs system," in SOFTFAIR Conference on Software Development Tools, Techniques and Alternatives 1983, pp. 109--118.
|
| |
32
|
B. Turhan, T. Menzies, A. B. Bener, and J. D. Stefano, "On the relative value of cross-company and within-company data for defect prediction " Empirical Software Engineering, DOI: 10.1007/s10664-008-9103-7, 2009.
|
| |
33
|
I. Wieczorek and M. Ruhe, "How Valuable Is Company-Specific Data Compared to Multi-Company Data for Cost Estimation?," in International Symposium on Software Metrics, 2002, pp. 237--246.
|
| |
34
|
X. Zhu, Knowledge Discovery and Data Mining: Challenges and Realities: IGI Global, 2007.
|
| |
35
|
T. Zimmermann, R. Premraj, and A. Zeller, "Predicting Defects for Eclipse," in Third International Workshop on Predictor Models in Software Engineering 2007.
|
|