ACM Home Page
Please provide us with feedback. Feedback
Constructing universal version history
Full text PdfPdf (72 KB)
Source International Conference on Software Engineering archive
Proceedings of the 2006 international workshop on Mining software repositories table of contents
Shanghai, China
SESSION: Matching table of contents
Pages: 76 - 79  
Year of Publication: 2006
ISBN:1-59593-397-2
Authors
Hung-Fu Chang  University of Southern California, Los Angeles, CA
Audris Mockus  Avaya Labs Research, Basking Ridge, NJ
Sponsors
ACM: Association for Computing Machinery
SIGSOFT: ACM Special Interest Group on Software Engineering
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 30,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1137983.1138002
What is a DOI?

ABSTRACT

Developers often copy code for parts or entire products to start a new product or a new release. In order to understand the software change history and to determine the code authorship, we propose to construct a universal version history from multiple version control repositories. To that end we create two practical code copy detection methods at the level of the source code file: prefix-postfix algorithm and prefix algorithm. The full pathname of a file and its version history are used to construct the universal version history of a file by linking together change histories of files that had the same code at any point in the past. The assumption of both algorithms is that developers often duplicate files by copying entire directories. Once the copying is identified we propose an algorithm to link version histories from multiple repositories in order to construct universal version history. The results show that about 41.32% of source files (in the repository involving more than 6M versions of around 2M files) were duplicated among the Avaya's source code repositories for more than ten different projects. The prefix-postfix algorithm is more suitable than prefix algorithm due to the reasonable error rates after validation of the known copying behaviors.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
 
4
 
5
 
6
 
7


Collaborative Colleagues:
Hung-Fu Chang: colleagues
Audris Mockus: colleagues