ACM Home Page
Please provide us with feedback. Feedback
Detecting higher-level similarity patterns in programs
Full text PdfPdf (641 KB)
Source Foundations of Software Engineering archive
Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering table of contents
Lisbon, Portugal
SESSION: Patterns and aspects table of contents
Pages: 156 - 165  
Year of Publication: 2005
ISBN:1-59593-014-0
Also published in ...
Authors
Hamid Abdul Basit  National University of Singapore
Stan Jarzabek  National University of Singapore
Sponsors
SIGSOFT: ACM Special Interest Group on Software Engineering
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 18,   Downloads (12 Months): 141,   Citation Count: 18
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1081706.1081733
What is a DOI?

ABSTRACT

Cloning in software systems is known to create problems during software maintenance. Several techniques have been proposed to detect the same or similar code fragments in software, so-called simple clones. While the knowledge of simple clones is useful, detecting design-level similarities in software could ease maintenance even further, and also help us identify reuse opportunities. We observed that recurring patterns of simple clones - so-called structural clones - often indicate the presence of interesting design-level similarities. An example would be patterns of collaborating classes or components. Finding structural clones that signify potentially useful design information requires efficient techniques to analyze the bulk of simple clone data and making non-trivial inferences based on the abstracted information. In this paper, we describe a practical solution to the problem of detecting some basic, but useful, types of design-level similarities such as groups of highly similar classes or files. First, we detect simple clones by applying conventional token-based techniques. Then we find the patterns of co-occurring clones in different files using the Frequent Itemset Mining (FIM) technique. Finally, we perform file clustering to detect those clusters of highly similar files that are likely to contribute to a design-level similarity pattern. The novelty of our approach is application of data mining techniques to detect design level similarities. Experiments confirmed that our method finds many useful structural clones and scales up to big programs. The paper describes our method for structural clone detection, a prototype tool called Clone Miner that implements the method and experimental results.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
ANTLR website at http://www.antlr.org
4
 
5
 
6
 
7
 
8
 
9
 
10
Case Study: eliminating redundant codes in the Buffer library. At XVCL Website, http://xvcl.comp.nus.edu.sg/xvcl/buffer/index.htm
 
11
Church, K. W. and Helfman, J. I. Dotplot: A program for exploring self-similarity in million of lines of text and code. Journal of Computational and Graphical Statistics, June 1993, 2(2):153--174.
 
12
Davey, N., Barson, P., Field, S., Frank, R., and Tansley, D. The development of a software clone detector. International Journal of Applied Software Technology, 1(3-4): 219--236, 1995.
 
13
 
14
 
15
 
16
Grahne, G., and Zhu, J., Efficiently Using Prefix-trees in Mining Frequent Itemsets. In Proceeding of the First IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI'03), Melbourne, FL, Nov 2003.
 
17
18
 
19
Java Technology at http://java.sun.com/
 
20
 
21
 
22
 
23
Karkkainen, J., and Sanders, P. Simple linear work suffix array construction. In Proc. 30th Internat. Colloq. Automata, Languages & Programming (2003) 943--955.
 
24
25
 
26
Kim, D.K., Sim, J.S., Park, H., and Park, K. Linear-time construction of suffix arrays. In Proc. Fourteenth Annual Symp. Combinatorial Pattern Matching (2003) 186--199.
 
27
Ko, P., and Aluru, S. Space efficient linear time construction of suffix arrays. In Proc. Fourteenth Annual Symp. Combinatorial Pattern Matching (2003) 200--210.
 
28
Kontogiannis, K.A., De Mori, R., Merlo, E., Galler, M., and Bernstein, M. Pattern Matching for Clone and Concept Detection. J. Automated Software Eng., vol. 3, pp. 770--108, 1996.
 
29
 
30
 
31
Larsson, N.J., and Sadakane, K. Faster Suffix Sorting. Technical Report LU-CS-TR:99-214, Lund University (1999) 20 pp.
 
32
 
33
 
34
 
35
Morzy, T., Wojciechowski, M., and Zakrzewicz, M. Web Users Clustering. In Proc. of the 15th International Symposium on Computer and Information Sciences, Istanbul, Turkey, 2000, pages 374--382.
 
36
 
37
 
38
 
39
Ryan, A. P. J., Smyth, W. F., Turpin, A., and Xiaoyang Y. New suffix array algorithms -- linear but not fast? In Proc. 15th Australasian Workshop on Combinatorial Algorithms, Seok-Hee Hong (ed.) (2004) 148--156.
 
40
 
41
 
42
XVCL website at : http://xvcl.comp.nus.edu.sg/overview_brochure.php

CITED BY  18

Collaborative Colleagues:
Hamid Abdul Basit: colleagues
Stan Jarzabek: colleagues