ACM Home Page
Please provide us with feedback. Feedback
Polyglot: automatic extraction of protocol message format using dynamic binary analysis
Full text PdfPdf (448 KB)
Source
Conference on Computer and Communications Security archive
Proceedings of the 14th ACM conference on Computer and communications security table of contents
Alexandria, Virginia, USA
SESSION: Protocols and spam filters table of contents
Pages: 317 - 329  
Year of Publication: 2007
ISBN:978-1-59593-703-2
Authors
Juan Caballero  Carnegie Mellon University, Pittsburgh, PA
Heng Yin  Carnegie Mellon University, Pittsburgh, PA & College of William and Mary, Williamsburg, VA
Zhenkai Liang  Carnegie Mellon University, Pittsburgh, PA
Dawn Song  Carnegie Mellon University, Pittsburgh, PA & UC Berkeley, Berkeley, CA
Sponsors
ACM: Association for Computing Machinery
SIGSAC: ACM Special Interest Group on Security, Audit, and Control
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 122,   Citation Count: 6
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1315245.1315286
What is a DOI?

ABSTRACT

Protocol reverse engineering, the process of extracting the application-level protocol used by an implementation, without access to the protocol specification, is important for many network security applications. Recent work [17] has proposed protocol reverse engineering by using clustering on network traces. That kind of approach is limited by the lack of semantic information on network traces. In this paper we propose a new approach using program binaries. Our approach, shadowing, uses dynamic analysis and is based on a unique intuition - the way that an implementation of the protocol processes the received application data reveals a wealth of information about the protocol message format. We have implemented our approach in a system called Polyglot and evaluated it extensively using real-world implementations of five different protocols: DNS, HTTP, IRC, Samba and ICQ. We compare our results with the manually crafted message format, included in Wireshark, one of the state-of-the-art protocol analyzers. The differences we find are small and usually due to different implementations handling fields in different ways. Finding such differences between implementations is an added benefit, as they are important for problems such as fingerprint generation, fuzzing, and error detection.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
How Samba Was Written. http://samba.org/ftp/tridge/misc/french cafe.txt.
 
2
Icqlib: The ICQ Library. http://kicq.sourceforge.net/icqlib.shtml.
 
3
Libyahoo2: A C Library for Yahoo! Messenger. http://libyahoo2.sourceforge.net.
 
4
MSN Messenger Protocol. http://www.hypothetic.org/docs/msn/index.php.
 
5
Qemu: Open Source Processor Emulator. http://fabrice.bellard.free.fr/qemu/.
 
6
Tcpdump. http://www.tcpdump.org/.
 
7
The UnOfficial AIM/OSCAR Protocol Specification. http://www.oilcan.org/oscar/.
 
8
Wireshark, Network Protocol Analyzer. http://www.wireshark.org.
 
9
M. A. Beddoe. Network Protocol Analysis Using Bioinformatics Algorithms. http://www.baselineresearch.net/PI/.
 
10
N. Borisov, D. J. Brumley, H. J. Wang, and C. Guo. Generic Application-Level Protocol Analyzer and Its Language. Network and Distributed System Security Symposium, San Diego, CA, February 2007.
 
11
 
12
J. Caballero, S. Venkataraman, P. Poosankam, M. G. Kang, D. Song, and A. Blum. FiG: Automatic Fingerprint Generation. Network and Distributed System Security Symposium, San Diego, CA, February 2007.
 
13
14
15
 
16
D. Crocker and P. Overell. Augmented BNF for Syntax Specifications: ABNF. RFC 4234 (Draft Standard), 4234, October 2005.
 
17
 
18
W. Cui, V. Paxson, N. C. Weaver, and R. H. Katz. Protocol-Independent Adaptive Replay of Application Dialog. Network and Distributed System Security Symposium, San Diego, CA, February 2006.
 
19
20
21
22
 
23
 
24
25
26
 
27
 
28
J. Newsome and D. Song. Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software. Network and Distributed System Security Symposium, San Diego, CA, February 2005.
 
29
J. Newsome, D. Brumley, and D. Song. Vulnerability-Specific Execution Filtering for Exploit Prevention on Commodity Software. Network and Distributed System Security Symposium, San Diego, CA, February 2006.
30
 
31
 
32
33
34
35
 
36
P. Vogt, F. Nentwich, N. Jovanovic, E. Kirda, C. Kruegel, and G. Vigna. Cross-Site Scripting Prevention with Dynamic Data Tainting and Static Analysis. Network and Distributed System Security Symposium, San Diego, CA, February 2007.
37


Collaborative Colleagues:
Juan Caballero: colleagues
Heng Yin: colleagues
Zhenkai Liang: colleagues
Dawn Song: colleagues