ACM Home Page
Please provide us with feedback. Feedback
Addressing software dependability with statistical and machine learning techniques
Full text PdfPdf (103 KB)
Source International Conference on Software Engineering archive
Proceedings of the 27th international conference on Software engineering table of contents
St. Louis, MO, USA
SESSION: State of the art table of contents
Pages: 8 - 8  
Year of Publication: 2005
ISBN:1-59593-963-2
Author
Armando Fox  Stanford University, Stanford, CA
Sponsors
ACM: Association for Computing Machinery
SIGSOFT: ACM Special Interest Group on Software Engineering
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 32,   Citation Count: 1
Additional Information:

abstract   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1062455.1062462
What is a DOI?

ABSTRACT

Our ability to design and deploy large complex systems is outpacing our ability to understand their behavior. How do we detect and recover from "heisenbugs," which account for up to 40% of failures in complex Internet systems, without extensive application-specific coding? Which users were affected, and for how long? How do we diagnose and correct problems caused by configuration errors or operator errors? Although these problems are posed at a high level of abstraction, all we can usually measure directly are low-level behaviors---analogous to driving a car while looking through a magnifying glass. Machine learning can bridge this gap using techniques that learn "baseline" models automatically or semi-automatically, allowing the characterization and monitoring of systems whose structure is not well understood a priori. I'll discuss initial successes and future challenges in using machine learning for failure detection anbd diagnosis, configuration troubleshooting, attribution (which low-level properties appear to be correlated with an observed high-level effect such as decreased performance), and failure forecasting.