| Detecting code clones in binary executables |
| Full text |
Pdf
(895 KB)
|
Source
|
International Symposium on Software Testing and Analysis
archive
Proceedings of the eighteenth international symposium on Software testing and analysis
table of contents
Chicago, IL, USA
SESSION: Testing and analysis tools #1
table of contents
Pages 117-128
Year of Publication: 2009
ISBN:978-1-60558-338-9
|
|
Authors
|
|
Andreas Sæbjørnsen
|
University of California, Davis, Davis, CA, USA
|
|
Jeremiah Willcock
|
Indiana University, Bloomington, IN, USA
|
|
Thomas Panas
|
Lawrence Livermore National Laboratory, Livermore, CA, USA
|
|
Daniel Quinlan
|
Lawrence Livermore National Laboratory, Livermore, CA, USA
|
|
Zhendong Su
|
University of California, Davis, Davis, CA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 36, Downloads (12 Months): 107, Citation Count: 0
|
|
|
ABSTRACT
Large software projects contain significant code duplication, mainly due to copying and pasting code. Many techniques have been developed to identify duplicated code to enable applications such as refactoring, detecting bugs, and protecting intellectual property. Because source code is often unavailable, especially for third-party software, finding duplicated code in binaries becomes particularly important. However, existing techniques operate primarily on source code, and no effective tool exists for binaries. In this paper, we describe the first practical clone detection algorithm for binary executables. Our algorithm extends an existing tree similarity framework based on clustering of characteristic vectors of labeled trees with novel techniques to normalize assembly instructions and to accurately and compactly model their structural information. We have implemented our technique and evaluated it on Windows XP system binaries totaling over 50 million assembly instructions. Results show that it is both scalable and precise: it analyzed Windows XP system binaries in a few hours and produced few false positives. We believe our technique is a practical, enabling technology for many applications dealing with binary code.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
IDA Pro disassembler. http://www.datarescue.com.
|
| |
2
|
JPlag. http://www.jplag.de.
|
| |
3
|
A. Andoni and P. Indyk. E2LSH: Exact Euclidean locality-sensitive hashing. Web: http://www.mit.edu/~andoni/LSH/, 2004.
|
 |
4
|
|
| |
5
|
|
 |
6
|
|
| |
7
|
|
| |
8
|
D. Bruschi, L. Martignoni, and M. Monga. Detecting self-mutating malware using control flow graph matching. In DIMVA, pages 129--143, 2006.
|
| |
9
|
|
 |
10
|
|
| |
11
|
|
 |
12
|
|
| |
13
|
|
| |
14
|
A. Hemel. The GPL compliance engineering guide. http://www.loohuis-consulting.nl/downloads/compliance-manual.pdf.
|
 |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
C. Kruegel, D. Mutz, W. Robertson, and G. Vigna. Polymorphic worm detection using structural information of executables. In Recent Adv. in Intrusion Detection, pages 207--226. Springer-Verlag, 2005.
|
| |
20
|
Zhenmin Li , Shan Lu , Suvda Myagmar , Yuanyuan Zhou, CP-Miner: a tool for finding copy-paste and related bugs in operating system code, Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, p.20-20, December 06-08, 2004, San Francisco, CA
|
 |
21
|
|
| |
22
|
M. Schordan and D. Quinlan. A source-to-source architecture for user-defined optimizations. In Joint Modular Languages Conference, volume 2789 of Lecture Notes in Computer Science, pages 214--223. Springer Verlag, Aug. 2003.
|
 |
23
|
|
| |
24
|
A. Schulman. Finding binary clones with opstrings and function digests. Doctor Dobb's J, 30(9):64--70, 2005.
|
| |
25
|
|
| |
26
|
|
 |
27
|
Heng Yin , Dawn Song , Manuel Egele , Christopher Kruegel , Engin Kirda, Panorama: capturing system-wide information flow for malware detection and analysis, Proceedings of the 14th ACM conference on Computer and communications security, October 28-31, 2007, Alexandria, Virginia, USA
[doi> 10.1145/1315245.1315261]
|
|