|
ABSTRACT
Open source development projects typically support an open bug repository to which both developers and users can report bugs. The reports that appear in this repository must be triaged to determine if the report is one which requires attention and if it is, which developer will be assigned the responsibility of resolving the report. Large open source developments are burdened by the rate at which new bug reports appear in the bug repository. In this paper, we present a semi-automated approach intended to ease one part of this process, the assignment of reports to a developer. Our approach applies a machine learning algorithm to the open bug repository to learn the kinds of reports each developer resolves. When a new report arrives, the classifier produced by the machine learning technique suggests a small number of developers suitable to resolve the report. With this approach, we have reached precision levels of 57% and 64% on the Eclipse and Firefox development projects respectively. We have also applied our approach to the gcc open source development with less positive results. We describe the conditions under which the approach is applicable and also report on the lessons we learned about applying machine learning to repositories used in open source development.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
G. Canfora and L. Cerulo. How software repositories can help in resolving a new change request. In Workshop on Empirical Studies in Reverse Engineering, 2005.
|
| |
4
|
D. Čubranić and G. C. Murphy. Automatic bug triage using text classification. In Proceedings of Software Engineering and Knowledge Engineering, pages 92--97, 2004.
|
| |
5
|
|
| |
6
|
S. R. Gunn. Support Vector Machines for classification and regression. Technical report, University of Southampton, Faculty of Engineering, Science and Mathematics; School of Electronics and Computer Science, 1998.
|
| |
7
|
|
| |
8
|
G. H. John and P. Langley. Estimating continous distributions in Bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pages 338--345, 1995.
|
 |
9
|
|
| |
10
|
Andy Podgurski , David Leon , Patrick Francis , Wes Masri , Melinda Minch , Jiayang Sun , Bin Wang, Automated support for classifying software failure reports, Proceedings of the 25th International Conference on Software Engineering, May 03-10, 2003, Portland, Oregon
|
| |
11
|
|
| |
12
|
E. S. Raymond. The cathedral and the bazaar. First Monday, 3(3), 1998.
|
| |
13
|
C. R. Reis and R. P. de Mattos Fortes. An overview of the software engineering process and tools in the Mozilla project. In Proceedings of the Open Source Software Development Workshop, pages 155--175, 2002.
|
| |
14
|
J. D. M. Rennie, L. Shih, J. Teevan, and D. R. Karger. Tackling the poor assumptions of Naive Bayes classifiers. In Proceedings of International Conference on Machine Learning, pages 616--623, 2003.
|
 |
15
|
|
| |
16
|
|
| |
17
|
|
CITED BY 27
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jason B. Ellis , Shahtab Wahid , Catalina Danis , Wendy A. Kellogg, Task and social visualization in software development: evaluation of a prototype, Proceedings of the SIGCHI conference on Human factors in computing systems, April 28-May 03, 2007, San Jose, California, USA
|
|
|
|
|
|
|
|
|
Nicolas Bettenburg , Sascha Just , Adrian Schröter , Cathrin Weiß , Rahul Premraj , Thomas Zimmermann, Quality of bug reports in Eclipse, Proceedings of the 2007 OOPSLA workshop on eclipse technology eXchange, p.21-25, October 21-21, 2007, Montreal, Quebec, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Emily Hill , Zachary P. Fry , Haley Boyd , Giriprasad Sridhara , Yana Novikova , Lori Pollock , K. Vijay-Shanker, AMAP: automatically mining abbreviation expansions in programs to enhance software maintenance tools, Proceedings of the 2008 international working conference on Mining software repositories, May 10-11, 2008, Leipzig, Germany
|
|
|
|
|
|
|
|
|
Xiaoyin Wang , Lu Zhang , Tao Xie , John Anvik , Jiasu Sun, An approach to detecting duplicate bug reports using natural language and execution information, Proceedings of the 30th international conference on Software engineering, May 10-18, 2008, Leipzig, Germany
|
|
|
|
|
|
|
|
|
Nicolas Bettenburg , Sascha Just , Adrian Schröter , Cathrin Weiss , Rahul Premraj , Thomas Zimmermann, What makes a good bug report?, Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, November 09-14, 2008, Atlanta, Georgia
|
|
|
Erik Linstead , Sushil Bajracharya , Trung Ngo , Paul Rigor , Cristina Lopes , Pierre Baldi, Sourcerer: mining and searching internet-scale software repositories, Data Mining and Knowledge Discovery, v.18 n.2, p.300-336, April 2009
|
|
|
|
|
|
Bernd Bruegge , Joern David , Jonas Helming , Maximilian Koegel, Classification of tasks using machine learning, Proceedings of the 5th International Conference on Predictor Models in Software Engineering, May 18-19, 2009, Vancouver, British Columbia, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|