ACM Home Page
Please provide us with feedback. Feedback
Using code metric histograms and genetic algorithms to perform author identification for software forensics
Full text PdfPdf (196 KB)
Source
Genetic And Evolutionary Computation Conference archive
Proceedings of the 9th annual conference on Genetic and evolutionary computation table of contents
London, England
SESSION: Real-world applications: papers table of contents
Pages: 2082 - 2089  
Year of Publication: 2007
ISBN:978-1-59593-697-4
Authors
Robert Charles Lange  Drexel University, Philadelphia, PA
Spiros Mancoridis  Drexel University, Philadelphia, PA
Sponsors
SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 87,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1276958.1277364
What is a DOI?

ABSTRACT

We have developed a technique to characterize software developers- styles using a set of source code metrics. This style fingerprint can be used to identify the likely author of a piece of code from a pool of candidates. Author identification has applications in criminal justice, corporate litigation, and plagiarism detection. Furthermore, we can identify candidate developers who share similar styles, making our technique useful for software maintenance as well. Our method involves measuring the differences in histogram distributions for code metrics.Identifying a combination of metrics that is effective in distinguishing developer styles is key to the utility of the technique. Our case study involves 18 metrics, and the time involved in exhaustive searching of the problem space prevented us from adding additional metrics. Using a genetic algorithm to perform the search, we were able to find good metric combinations in hours as opposed to weeks. The genetic algorithm has enabled us to begin adding new metrics to our catalog of available metrics. This paper documents the results of our experiments in author identification for software forensics and outlines future directions of research to improve the utility of our method.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
R. Bouckaert. Bayesian network classifiers in weka. Technical Report 14/2004, The University of Waikato, Department of Computer Science, Hamilton, New Zealand, 2004.
 
4
 
5
 
6
 
7
 
8
J. Lenahan. Pygene Open Source Evolutionary Computation Tool. SIGEVOlution Newsletter, 1(2):27, 2006.
 
9
S. Macdonell, A. Gray, G. MacLennan, and P. Sallis. Software forensics for discriminating between program authors usingcase--based reasoning, feedforward neural networks and multiplediscriminant analysis. Neural Information Processing, 1999. Proceedings. ICONIP'99. 6th International Conference on, 1, 1999.
10
 
11
 
12
P. Sallis. Contemporary Computing Methods for the Authorship Characterisation Problem in Computational Linguistics. New Zealand Journal of Computing, 5(1):85--95, 1994.
 
13
P. Sallis, S. MacDonell, G. MacLennan, A. Gray, and R. Kilgour. Identified: Software authorship analysis with case--based reasoning. Proc. Addendum Session Int. Conf. Neural Info. Processing and Intelligent Info. Systems, pages 53--56, 1997.
 
14
E. Spafford and S. Weeber. Software forensics: Can we track code to its authors. Technical Report CSD-TR 92-010, Purdue University, Dept. of Computer Sciences, 1992.

Collaborative Colleagues:
Robert Charles Lange: colleagues
Spiros Mancoridis: colleagues