ACM Home Page
Please provide us with feedback. Feedback
Investigation of failure causes in workload-driven reliability testing
Full text PdfPdf (438 KB)
Source Foundations of Software Engineering archive
Fourth international workshop on Software quality assurance: in conjunction with the 6th ESEC/FSE joint meeting table of contents
Dubrovnik, Croatia
SESSION: Failure anticipation table of contents
Pages: 78 - 85  
Year of Publication: 2007
ISBN:978-1-59593-724-7
Authors
Domenico Cotroneo  Università degli Studi di Napoli Federico II, Naples, Italy
Roberto Pietrantuono  Università degli Studi di Napoli Federico II, Naples, Italy
Leonardo Mariani  Università degli Studi di Milano Bicocca, Milano
Fabrizio Pastore  Università degli Studi di Milano Bicocca, Milano
Sponsors
SIGSOFT: ACM Special Interest Group on Software Engineering
CEPIS : The Council of European Professional Informatics Societies
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 35,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1295074.1295089
What is a DOI?

ABSTRACT

Virtual execution environments and middleware are required to be extremely reliable because applications running on top of them are developed assuming their correctness, and platform-level failures can result in serious and unexpected application-level problems. Since software platforms and middleware are often executed for long time without any interruption, large part of the testing process is devoted to investigate their behavior when long and stressful executions occur (these test cases are called workloads). When a problem is identified, software engineers examine log files to find its root cause. Unfortunately, since of the workloads length, log files can contain a huge amount of information and manual analysis is often prohibitive. Thus, de-facto, the identification of the root cause is mostly left to the intuition of the software engineer.

In this paper, we propose a technique to automatically analyze logs obtained from workloads to retrieve important information that can relate the failure to its cause. The technique works in three steps: (1) during workload executions, the system under test is monitored; (2) logs extracted from workloads that have been successfully completed are used to derive compact and general models of the expected behavior of the target system; (3) logs corresponding to workloads terminated unsuccessfully are compared with the inferred models to identify anomalous event sequences. Anomalies help software engineers to identify failure causes. The technique can also be used during operational phase, to discover possible causes of unexpected failures by comparing logs corresponding to failing executions with models derived at testing time. Preliminary experimental results conducted on the Java Virtual Machine indicate that several bugs can be rapidly identified thanks to the feedbacks provided by our technique.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Apache Software Foundation. James server. http://james.apache.org/.
 
3
 
4
 
5
 
6
IBM. Jikes Research Virtual Machine. http://jikesrvm.sourceforge.net.
 
7
Java Community Process (JCP). JSR-163: Java Platform Profiling Architecture (JPPA), 2004.
 
8
 
9
I. Lee, R. Iyer, and D. Tang. Error/failure analysis using event logs from fault tolerant systems. In Proceedings of the 21st International Symposium on Fault-Tolerant Computing, 1991. (FTCS'21), pages 10--17, 1991.
 
10
T. Lin and D. Siewiorek. Error log analysis: statistical modeling and heuristic trend analysis. IEEE Transactions on Reliability, 39(4):419 -- 432, 1989.
 
11
 
12
13
 
14
 
15
G. Pintér, H. Madeira, M. Vieira, I. Majzik, and A. Pataricza. A data mining approach to identify key factors in dependability experiments. In Proceedings of the 5th European Dependable Computing Conference (EDCC '05), pages 263--280, 2005.
 
16
 
17
 
18
Sun. Hotspot Java Virtual Machine. http://java.sun.com/products/hotspot/.


Collaborative Colleagues:
Domenico Cotroneo: colleagues
Roberto Pietrantuono: colleagues
Leonardo Mariani: colleagues
Fabrizio Pastore: colleagues