ACM Home Page
Please provide us with feedback. Feedback
Scalable detection of semantic clones
Full text PdfPdf (294 KB)
Source
International Conference on Software Engineering archive
Proceedings of the 30th international conference on Software engineering table of contents
Leipzig, Germany
SESSION: Program comprehension table of contents
Pages 321-330  
Year of Publication: 2008
ISBN:978-1-60558-079-1
Authors
Mark Gabel  University of California at Davis, Davis, CA, USA
Lingxiao Jiang  University of California at Davis, Davis, CA, USA
Zhendong Su  University of California at Davis, Davis, CA, USA
Sponsors
ACM: Association for Computing Machinery
SIGSOFT: ACM Special Interest Group on Software Engineering
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 28,   Downloads (12 Months): 210,   Citation Count: 6
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1368088.1368132
What is a DOI?

ABSTRACT

Several techniques have been developed for identifying similar code fragments in programs. These similar fragments, referred to as code clones, can be used to identify redundant code, locate bugs, or gain insight into program design. Existing scalable approaches to clone detection are limited to finding program fragments that are similar only in their contiguous syntax. Other, semantics-based approaches are more resilient to differences in syntax, such as reordered statements, related statements interleaved with other unrelated statements, or the use of semantically equivalent control structures. However, none of these techniques have scaled to real world code bases. These approaches capture semantic information from Program Dependence Graphs (PDGs), program representations that encode data and control dependencies between statements and predicates. Our definition of a code clone is also based on this representation: we consider program fragments with isomorphic PDGs to be clones.

In this paper, we present the first scalable clone detection algorithm based on this definition of semantic clones. Our insight is the reduction of the difficult graph similarity problem to a simpler tree similarity problem by mapping carefully selected PDG subgraphs to their related structured syntax. We efficiently solve the tree similarity problem to create a scalable analysis. We have implemented this algorithm in a practical tool and performed evaluations on several million-line open source projects, including the Linux kernel. Compared with previous approaches, our tool locates significantly more clones, which are often more semantically interesting than simple copied and pasted code fragments.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
 
4
 
5
6
 
7
8
 
9
 
10
11
 
12
 
13
K. Kontogiannis, R. de Mori, E. Merlo, M. Galler, and M. Bernstein. Pattern matching for clone and concept detection. Automated Soft. Eng., 3(1/2), 1996.
 
14
15
16
 
17
18
 
19
 
20
21


Collaborative Colleagues:
Mark Gabel: colleagues
Lingxiao Jiang: colleagues
Zhendong Su: colleagues