|
ABSTRACT
Program verification is a promising approach to improving program quality, because it can search all possible program executions for specific errors. However, the need to formally describe correct behavior or errors is a major barrier to the widespread adoption of program verification, since programmers historically have been reluctant to write formal specifications. Automating the process of formulating specifications would remove a barrier to program verification and enhance its practicality.This paper describes specification mining, a machine learning approach to discovering formal specifications of the protocols that code must obey when interacting with an application program interface or abstract data type. Starting from the assumption that a working program is well enough debugged to reveal strong hints of correct protocols, our tool infers a specification by observing program execution and concisely summarizing the frequent interaction patterns as state machines that capture both temporal and data dependences. These state machines can be examined by a programmer, to refine the specification and identify errors, and can be utilized by automatic verification tools, to find bugs.Our preliminary experience with the mining tool has been promising. We were able to learn specifications that not only captured the correct protocol, but also discovered serious bugs.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Thomas Ball , Rupak Majumdar , Todd Millstein , Sriram K. Rajamani, Automatic predicate abstraction of C programs, Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation, p.203-213, June 2001, Snowbird, Utah, United States
|
| |
2
|
|
 |
3
|
|
| |
4
|
A. W. Biermann and J. A. Feldman. On the synthesis of finite-state machines from samples of their behaviour. IEEE Transactions on Computers, 21:591-597, 1972.
|
| |
5
|
|
 |
6
|
Andy Chou , Junfeng Yang , Benjamin Chelf , Seth Hallem , Dawson Engler, An empirical study of operating systems errors, Proceedings of the eighteenth ACM symposium on Operating systems principles, October 21-24, 2001, Banff, Alberta, Canada
|
| |
7
|
|
 |
8
|
|
 |
9
|
|
 |
10
|
Dawson Engler , David Yu Chen , Seth Hallem , Andy Chou , Benjamin Chelf, Bugs as deviant behavior: a general approach to inferring errors in systems code, Proceedings of the eighteenth ACM symposium on Operating systems principles, October 21-24, 2001, Banff, Alberta, Canada
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
E Mark Gold. Language identification in the limit. Information and Control, 10:447-474, 1967.
|
| |
15
|
|
 |
16
|
Michael Kearns , Yishay Mansour , Dana Ron , Ronitt Rubinfeld , Robert E. Schapire , Linda Sellie, On the learnability of discrete distributions, Proceedings of the twenty-sixth annual ACM symposium on Theory of computing, p.273-282, May 23-25, 1994, Montreal, Quebec, Canada
[doi> 10.1145/195058.195155]
|
 |
17
|
|
| |
18
|
|
 |
19
|
Michael D. Ernst , Adam Czeisler , William G. Griswold , David Notkin, Quickly detecting relevant program invariants, Proceedings of the 22nd international conference on Software engineering, p.449-458, June 04-11, 2000, Limerick, Ireland
[doi> 10.1145/337180.337240]
|
| |
20
|
Kevin P. Murphy. Passively learning finite automata. Technical Report 96-04-017, Santa Fe Institute, 1996.
|
| |
21
|
Anand Raman, Peter Andreae, and Jon Patrick. A beam search algorithm for pfsa inference. Pattern Analysis and Applications, 1(2), 1998.
|
| |
22
|
Anand V. Raman and Jon D. Patrick. The sk-strings method for inferring PFSA. In Proceedings of the workshop on automata induction, grammatical inference and language acquisition at the 14th international conference on machine learning (ICML97), 1997.
|
| |
23
|
|
 |
24
|
Dana Ron , Yoram Singer , Naftali Tishby, On the learnability and usage of acyclic probabilistic finite automata, Proceedings of the eighth annual conference on Computational learning theory, p.31-40, July 05-08, 1995, Santa Cruz, California, United States
[doi> 10.1145/225298.225302]
|
| |
25
|
David Rosenthal. Inter-client communication conventions manual (ICCCM), version 2.0. X Consortium, Inc. and Sun Microsystems, 1994. Part of the X11R6 distribution.
|
 |
26
|
|
| |
27
|
|
CITED BY 82
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
R. Sekar , V.N. Venkatakrishnan , Samik Basu , Sandeep Bhatkar , Daniel C. DuVarney, Model-carrying code: a practical approach for safe execution of untrusted applications, Proceedings of the nineteenth ACM symposium on Operating systems principles, October 19-22, 2003, Bolton Landing, NY, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Vinod Ganapathy , Sanjit A. Seshia , Somesh Jha , Thomas W. Reps , Randal E. Bryant, Automatic discovery of API-level exploits, Proceedings of the 27th international conference on Software engineering, May 15-21, 2005, St. Louis, MO, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jinlin Yang , David Evans , Deepali Bhardwaj , Thirumalesh Bhat , Manuvir Das, Perracotta: mining temporal API rules from imperfect traces, Proceeding of the 28th international conference on Software engineering, May 20-28, 2006, Shanghai, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sebastian Elbaum , Kalyan-Ram Chilakamarri , Marc Fisher, II , Gregg Rothermel, Web application characterization through directed requests, Proceedings of the 2006 international workshop on Dynamic systems analysis, May 23-23, 2006, Shanghai, China
|
|
|
|
|
|
|
|
|
Ted Kremenek , Paul Twohey , Godmar Back , Andrew Ng , Dawson Engler, From uncertainty to belief: inferring the specification within, Proceedings of the 7th symposium on Operating systems design and implementation, November 06-08, 2006, Seattle, Washington
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sriram Sankaranarayanan , Swarat Chaudhuri , Franjo Ivančić , Aarti Gupta, Dynamic inference of likely data preconditions over predicates by tree learning, Proceedings of the 2008 international symposium on Software testing and analysis, July 20-24, 2008, Seattle, WA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lin Tan , Xiaolan Zhang , Xiao Ma , Weiwei Xiong , Yuanyuan Zhou, AutoISES: automatically inferring security specifications and detecting violations, Proceedings of the 17th conference on Security symposium, p.379-394, July 28-August 01, 2008, San Jose, CA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Shan Lu , Soyeon Park , Chongfeng Hu , Xiao Ma , Weihang Jiang , Zhenmin Li , Raluca A. Popa , Yuanyuan Zhou, MUVI: automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs, ACM SIGOPS Operating Systems Review, v.41 n.6, December 2007
|
|
|
|
|
|
A. Prasad Sistla , V. N. Venkatakrishnan , Michelle Zhou , Hilary Branske, CMV: automatic verification of complete mediation for java virtual machines, Proceedings of the 2008 ACM symposium on Information, computer and communications security, March 18-20, 2008, Tokyo, Japan
|
|
|
|
|
|
Sruthi Bandhakavi , Prithvi Bisht , P. Madhusudan , V. N. Venkatakrishnan, CANDID: preventing sql injection attacks using dynamic candidate evaluations, Proceedings of the 14th ACM conference on Computer and communications security, October 28-31, 2007, Alexandria, Virginia, USA
|
|
|
|
|
|
|
|
|
Michael D. Ernst , Jeff H. Perkins , Philip J. Guo , Stephen McCamant , Carlos Pacheco , Matthew S. Tschantz , Chen Xiao, The Daikon system for dynamic detection of likely invariants, Science of Computer Programming, v.69 n.1-3, p.35-45, December, 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
David Lo , Hong Cheng , Jiawei Han , Siau-Cheng Khoo , Chengnian Sun, Classification of software behaviors for failure detection: a discriminative pattern mining approach, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, June 28-July 01, 2009, Paris, France
|
|