ACM Home Page
Please provide us with feedback. Feedback
Automatic mining of functionally equivalent code fragments via random testing
Full text PdfPdf (493 KB)
Source
International Symposium on Software Testing and Analysis archive
Proceedings of the eighteenth international symposium on Software testing and analysis table of contents
Chicago, IL, USA
SESSION: Empirical studies table of contents
Pages 81-92  
Year of Publication: 2009
ISBN:978-1-60558-338-9
Authors
Lingxiao Jiang  University of California, Davis, Davis, CA, USA
Zhendong Su  University of California, Davis, Davis, CA, USA
Sponsors
SIGSOFT: ACM Special Interest Group on Software Engineering
SIGPLAN: ACM Special Interest Group on Programming Languages
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 32,   Downloads (12 Months): 69,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1572272.1572283
What is a DOI?

ABSTRACT

Similar code may exist in large software projects due to some common software engineering practices, such as copying and pasting code and n-version programming. Although previous work has studied syntactic equivalence and small-scale, coarse-grained program-level and function-level semantic equivalence, it is not known whether significant fine-grained, code-level semantic duplications exist. Detecting such semantic equivalence is also desirable because it can enable many applications such as code understanding, maintenance, and optimization.

In this paper, we introduce the first algorithm to automatically mine functionally equivalent code fragments of arbitrary size - down to an executable statement. Our notion of functional equivalence is based on input and output behavior. Inspired by Schwartz's randomized polynomial identity testing, we develop our core algorithm using automated random testing: (1) candidate code fragments are automatically extracted from the input program; and (2) random inputs are generated to partition the code fragments based on their output values on the generated inputs. We implemented the algorithm and conducted a large-scale empirical evaluation of it on the Linux kernel 2.6.24. Our results show that there exist many functionally equivalent code fragments that are syntactically different (i.e., they are unlikely due to copying and pasting code). The algorithm also scales to million-line programs; it was able to analyze the Linux kernel with several days of parallel processing.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
 
4
 
5
 
6
 
7
M. Bertran, F.-X. Babot, and A. Climent. An input/output semantics for distributed program equivalence reasoning. Electr. Notes Theor. Comput. Sci, 137(1):25--46, 2005.
 
8
9
 
10
G. Cousineau and P. Enjalbert. Program equivalence and provability. In MFCS: Symposium on Mathematical Foundations of Computer Science, 1979.
 
11
12
13
14
15
 
16
A. Groce and R. Joshi. Exploiting traces in program analysis. In TACAS, volume 3920 of LNCS, pages 379--393. Springer, 2006.
 
17
 
18
19
 
20
21
 
22
 
23
 
24
 
25
 
26
 
27
 
28
29
 
30
31
32
33
34
 
35
36
 
37
 
38
 
39

Collaborative Colleagues:
Lingxiao Jiang: colleagues
Zhendong Su: colleagues