ACM Home Page
Please provide us with feedback. Feedback
Applying classification techniques to remotely-collected program execution data
Full text PdfPdf (184 KB)
Source Foundations of Software Engineering archive
Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering table of contents
Lisbon, Portugal
SESSION: Patterns and aspects table of contents
Pages: 146 - 155  
Year of Publication: 2005
ISBN:1-59593-014-0
Also published in ...
Authors
Murali Haran  Penn State University, University Park, PA
Alan Karr  National Institute of Statistical Sciences, Triangle Park, NC
Alessandro Orso  Georgia Inst. of Technology, Atlanta, GA
Adam Porter  University of Maryland, College Park, MD
Ashish Sanil  National Institute of Statistical Sciences, Triangle Park, NC
Sponsors
SIGSOFT: ACM Special Interest Group on Software Engineering
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 49,   Citation Count: 11
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1081706.1081732
What is a DOI?

ABSTRACT

There is an increasing interest in techniques that support measurement and analysis of fielded software systems. One of the main goals of these techniques is to better understand how software actually behaves in the field. In particular, many of these techniques require a way to distinguish, in the field, failing from passing executions. So far, researchers and practitioners have only partially addressed this problem: they have simply assumed that program failure status is either obvious (i.e., the program crashes) or provided by an external source (e.g., the users). In this paper, we propose a technique for automatically classifying execution data, collected in the field, as coming from either passing or failing program runs. (Failing program runs are executions that terminate with a failure, such as a wrong outcome.) We use statistical learning algorithms to build the classification models. Our approach builds the models by analyzing executions performed in a controlled environment (e.g., test cases run in-house) and then uses the models to predict whether execution data produced by a fielded instance were generated by a passing or failing program execution. We also present results from an initial feasibility study, based on multiple versions of a software subject, in which we investigate several issues vital to the applicability of the technique. Finally, we present some lessons learned regarding the interplay between the reliability of classification models and the amount and type of data collected.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
 
4
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, 1984.
 
5
6
 
7
 
8
 
9
 
10
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, 2001.
11
12
13
14
 
15
Microsoft online crash analysis, 2004. http://oca.microsoft.com.
16
17
18
19
 
20
 
21
22

CITED BY  11

Collaborative Colleagues:
Murali Haran: colleagues
Alan Karr: colleagues
Alessandro Orso: colleagues
Adam Porter: colleagues
Ashish Sanil: colleagues