| Using code metric histograms and genetic algorithms to perform author identification for software forensics |
| Full text |
Pdf
(196 KB)
|
Source
|
Genetic And Evolutionary Computation Conference
archive
Proceedings of the 9th annual conference on Genetic and evolutionary computation
table of contents
London, England
SESSION: Real-world applications: papers
table of contents
Pages: 2082 - 2089
Year of Publication: 2007
ISBN:978-1-59593-697-4
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 86, Citation Count: 0
|
|
|
ABSTRACT
We have developed a technique to characterize software developers- styles using a set of source code metrics. This style fingerprint can be used to identify the likely author of a piece of code from a pool of candidates. Author identification has applications in criminal justice, corporate litigation, and plagiarism detection. Furthermore, we can identify candidate developers who share similar styles, making our technique useful for software maintenance as well. Our method involves measuring the differences in histogram distributions for code metrics.Identifying a combination of metrics that is effective in distinguishing developer styles is key to the utility of the technique. Our case study involves 18 metrics, and the time involved in exhaustive searching of the problem space prevented us from adding additional metrics. Using a genetic algorithm to perform the search, we were able to find good metric combinations in hours as opposed to weeks. The genetic algorithm has enabled us to begin adding new metrics to our catalog of available metrics. This paper documents the results of our experiments in author identification for software forensics and outlines future directions of research to improve the utility of our method.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Stefan Berchtold , Christian Böhm , Daniel A. Keim , Hans-Peter Kriegel, A cost model for nearest neighbor search in high-dimensional data space, Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.78-86, May 11-15, 1997, Tucson, Arizona, United States
[doi> 10.1145/263661.263671]
|
| |
3
|
R. Bouckaert. Bayesian network classifiers in weka. Technical Report 14/2004, The University of Waikato, Department of Computer Science, Hamilton, New Zealand, 2004.
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
J. Lenahan. Pygene Open Source Evolutionary Computation Tool. SIGEVOlution Newsletter, 1(2):27, 2006.
|
| |
9
|
S. Macdonell, A. Gray, G. MacLennan, and P. Sallis. Software forensics for discriminating between program authors usingcase--based reasoning, feedforward neural networks and multiplediscriminant analysis. Neural Information Processing, 1999. Proceedings. ICONIP'99. 6th International Conference on, 1, 1999.
|
 |
10
|
|
| |
11
|
|
| |
12
|
P. Sallis. Contemporary Computing Methods for the Authorship Characterisation Problem in Computational Linguistics. New Zealand Journal of Computing, 5(1):85--95, 1994.
|
| |
13
|
P. Sallis, S. MacDonell, G. MacLennan, A. Gray, and R. Kilgour. Identified: Software authorship analysis with case--based reasoning. Proc. Addendum Session Int. Conf. Neural Info. Processing and Intelligent Info. Systems, pages 53--56, 1997.
|
| |
14
|
E. Spafford and S. Weeber. Software forensics: Can we track code to its authors. Technical Report CSD-TR 92-010, Purdue University, Dept. of Computer Sciences, 1992.
|
|