ACM Home Page
Please provide us with feedback. Feedback
Evaluation of source code copy detection methods on freebsd
Full text PdfPdf (231 KB)
Source
International Conference on Software Engineering archive
Proceedings of the 2008 international working conference on Mining software repositories table of contents
Leipzig, Germany
SESSION: Changes and clones table of contents
Pages 61-66  
Year of Publication: 2008
ISBN:978-1-60558-024-1
Authors
Hung-Fu Chang  University of Southern California, Los Angeles, CA, USA
Audris Mockus  Avaya Labs Research, Basking Ridge, NJ, USA
Sponsors
SIGSOFT: ACM Special Interest Group on Software Engineering
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 99,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1370750.1370766
What is a DOI?

ABSTRACT

Studies have shown that substantial code reuse is common in open source and in commercial projects. However, the precise extent of reuse and its impact on productivity and quality are not well investigated in the open source context. Previously, we have introduced a simple-to-use method that needs only a set of file pathnames to identifies directories that share filenames and partially validated its performance on a set of closed-source projects. To evaluate this method and to improve reuse detection at the file level, we apply it and four additional file copy detection methods that utilize the underlying content of multiple versions of the source code on the FreeBSD project. The evaluation quantified unique advantages of each method and showed that the filename method detected roughly half of all reuse cases. We are still faced with a challenge to scale the content based methods to large repositories containing all versions of open source files.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
 
4
Stefan Haefliger, Georg von Krogh and Sebastian Spaeth. Code reuse in open source software. Management Science, Articles in Advance, pp. 1--14.
5
 
6
E. Damiani, S. De Capitani di Vimercati, S. Paraboschi, P. Samarati. An Open Digest-based Technique for Spam Detection. ACM, vol. 41, no. 8, pp. 74--83. The 2004 International Workshop on Security in Parallel and Distributed Systems.
 
7
 
8
 
9
 
10
 
11
 
12
 
13
 
14
 
15

Collaborative Colleagues:
Hung-Fu Chang: colleagues
Audris Mockus: colleagues