Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution Zhiqiang Lin 1 Xuxian Jiang 2, Dongyan Xu 1, Xiangyu Zhang 1 1.

Slides:



Advertisements
Similar presentations
Parallel Virtual Machine Rama Vykunta. Introduction n PVM provides a unified frame work for developing parallel programs with the existing infrastructure.
Advertisements

INTROPERF: TRANSPARENT CONTEXT- SENSITIVE MULTI-LAYER PERFORMANCE INFERENCE USING SYSTEM STACK TRACES Chung Hwan Kim*, Junghwan Rhee, Hui Zhang, Nipun.
TaintScope: A Checksum-Aware Directed Fuzzing Tool for Automatic Software Vulnerability Detection Tielei Wang 1, Tao Wei 1, Guofei Gu 2, Wei Zou 1 1 Peking.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extraction of.
Linear Obfuscation to Combat Symbolic Execution Zhi Wang 1, Jiang Ming 2, Chunfu Jia 1 and Debin Gao 3 1 Nankai University 2 Pennsylvania State University.
1 Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation David Brumley, Juan.
Distributed Process Scheduling Summery Distributed Process Scheduling Summery BY:-Yonatan Negash.
1 Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation David Brumley, Juan.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
4/27/2006 ELEC7250: White 1 ELEC7250 VLSI Testing: Final Project Andrew White.
The abs_path in a URI If the abs_path is not present in the URL, it must be given as "/" in a Request-URI for a resource. Thus, if a user points a browser.
Data and Computer Communications Eighth Edition by William Stallings Lecture slides by Lawrie Brown Chapter 23 – Internet Applications Internet Directory.
Process Coloring: an Information Flow-Preserving Approach to Malware Investigation Eugene Spafford, Dongyan Xu (Presenter) Department of Computer Science.
School of Computer Science and Information Systems
Winter Retreat Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer
General approach to exploit detection and signature generation White-box  Need the source code Gray-box  More accurate. But need to monitor a program's.
The abs_path in a URI If the abs_path is not present in the URL, it must be given as "/" in a Request-URI for a resource. Thus, if a user points a browser.
Strategies to relate the program and problem domains using code instrumentation Mario Marcelo Berón University of Minho Pedro Rangel Henriques University.
2  Problem Definition  Project Purpose – Building Obfuscator  Obfuscation Quality  Obfuscation Using Opaque Predicates  Future Planning.
PJSISSTA '001 Black-Box Test Reduction Using Input-Output Analysis ISSTA ‘00 Patrick J. Schroeder, Bogdan Korel Department of Computer Science Illinois.
Lucent Technologies – Proprietary Use pursuant to company instruction Learning Sequential Models for Detecting Anomalous Protocol Usage (work in progress)
Web Hacking 1. Overview Why web HTTP Protocol HTTP Attacks 2.
HyperText Transfer Protocol (HTTP).  HTTP is the protocol that supports communication between web browsers and web servers.  A “Web Server” is a HTTP.
1 HTTPCore, Cookies Managing Data on the World Wide-Web Elad Kravi.
Universal HTTP Denial-of-Service. About Hybrid Creating web-business-logic security Doing cool stuff in AI research Optimizing acceptance rate for Web-bound.
Copyright (c) 2010, Dr. Kuanchin Chen1 The Client-Server Architecture of the WWW Dr. Kuanchin Chen.
Sistem Jaringan dan Komunikasi Data #9. DNS The Internet Directory Service  the Domain Name Service (DNS) provides mapping between host name & IP address.
Automatic Reverse Engineering of Program Data Structures from Binary Execution Zhiqiang Lin Xiangyu Zhang, Dongyan Xu Dept. of Computer Science and CERIAS.
Face Detection And Recognition For Distributed Systems Meng Lin and Ermin Hodžić 1.
Analyzing and Detecting Network Security Vulnerability Weekly report 1Fan-Cheng Wu.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
The HyperText Transfer Protocol. History HTTP has been in use since 1990 (HTTP/0.9) HTTP/1.0 was defined in RFC 1945 (May 1996) and included metainformation.
Stealthy Malware Detection Through VMM-based “Out-of-the-Box” Semantic View Reconstruction CCS’07, Alexandria, VA, Oct 29 – Nov 2, 2007 Xuxian Jiang, Xinyuan.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Roberto Paleari,Universit`a degli Studi di Milano Lorenzo Martignoni,Universit`a degli Studi di Udine Emanuele Passerini,Universit`a degli Studi di Milano.
Automatically Generating Models for Botnet Detection Presenter: 葉倚任 Authors: Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel,
Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution Zhiqiang Lin 1 Xuxian Jiang 2, Dongyan Xu 1, Xiangyu Zhang 1 1.
Christopher Kruegel University of California Engin Kirda Institute Eurecom Clemens Kolbitsch Thorsten Holz Secure Systems Lab Vienna University of Technology.
HTTP1 Hypertext Transfer Protocol (HTTP) After this lecture, you should be able to:  Know how Web Browsers and Web Servers communicate via HTTP Protocol.
Hassen Grati, Houari Sahraoui, Pierre Poulin DIRO, Université de Montréal Extracting Sequence Diagrams from Execution Traces using Interactive Visualization.
Deriving Input Syntactic Structure From Execution Zhiqiang Lin Xiangyu Zhang Purdue University November 11 th, 2008 The 16th ACM SIGSOFT International.
SOFTWARE DESIGN. INTRODUCTION There are 3 distinct types of activities in design 1.External design 2.Architectural design 3.Detailed design Architectural.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Polygraph: Automatically Generating Signatures for Polymorphic Worms James Newsome, Brad Karp, and Dawn Song Carnegie Mellon University Presented by Ryan.
Remote Procedure Calls CS587x Lecture Department of Computer Science Iowa State University.
Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana Cardiff University, UK.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Polygraph: Automatically Generating Signatures for Polymorphic Worms Presented by: Devendra Salvi Paper by : James Newsome, Brad Karp, Dawn Song.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
SigGraph: Brute Force Scanning of Kernel Data Structure Instances Using Graph-based Signatures Zhiqiang Lin 1 Junghwan Rhee 1, Xiangyu Zhang 1, Dongyan.
Machine Learning for Program Language Research Yao Peisen Prism Group, HKUST
WebWatcher A Lightweight Tool for Analyzing Web Server Logs Hervé DEBAR IBM Zurich Research Laboratory Global Security Analysis Laboratory
Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software Paper by: James Newsome and Dawn Song.
Convicting Exploitable Software Vulnerabilities: An Efficient Input Provenance Based Approach Zhiqiang Lin Xiangyu Zhang, Dongyan Xu Purdue University.
Data and Computer Communications Eighth Edition by William Stallings Lecture slides by Lawrie Brown Chapter 23 – Internet Applications Internet Directory.
2009/6/221 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Reporter : Fong-Ruei, Li Machine.
Maikel Leemans Wil M.P. van der Aalst. Process Mining in Software Systems 2 System under Study (SUS) Functional perspective Focus: User requests Functional.
Contents What is Reverse Engineering (RE)? Why do we need Reverse Engineering? Scope and Tasks of Reverse Engineering Reverse Engineering Tools Reverse.
Powerpoint presentation on Drive-by download attack -By Yogita Goyal.
October 20-23rd, 2015 FEEBO: A Framework for Empirical Evaluation of Malware Detection Resilience Against Behavior Obfuscation Sebastian Banescu Tobias.
HTTP Parsing Athula Balachandran Wolfgang Richter.
Automatic Network Protocol Analysis
ASIACCS 2007 AutoPaG: Towards Automated Software Patch Generation with Source Code Root Cause Identification and Repair Zhiqiang Lin 1,3 Xuxian Jiang 2,
POLYGRAPH: Automatically Generating Signatures for Polymorphic Worms
Distributed Network Traffic Feature Extraction for a Real-time IDS
The 40th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Reuse-Oriented Camouflaging Trojan: Vulnerability Detection and Attack.
Yongle Zhang, Serguei Makarov, Xiang Ren, David Lion, Ding Yuan
RDE: Replay DEbugging for Diagnosing Production Site Failures
Discovering Data Structures
Detecting Attacks Against Robotic Vehicles:
Presentation transcript:

Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution Zhiqiang Lin 1 Xuxian Jiang 2, Dongyan Xu 1, Xiangyu Zhang 1 1 Purdue University 2 George Mason University February 12 th, 2008 The 15 th Annual Network and Distributed System Security Symposium

Motivation  Protocol reverse engineering  A process to recover protocol specifications  E.g., fields and their relationships  Applications:  Network-based Intrusion detection  Network management  Penetration test  …

Challenges 0x0040: cd f6e e d6c 0x0050: f 312e 300d 0a d 0x0060: e 743a f 312e x0070: 2e d6f x0080: d 0a a 202a 0x0090: 2f2a 0d0a 486f a e x00a0: 342e e37 310d 0a43 6f6e 6e x00b0: 696f 6e3a 204b d 416c d. 0x00c0: 0a0d 0a  Multiple fields in a single message  Non-static size of fields  Complex relationships among protocol fields Sequential Parallel Hierarchical

Challenges HTTP-Request = Request-Line (( general-header | request-header | entity-header ) CRLF)* CRLF [ message-body ] Request-Line = Method SP Request-URI SP HTTP-Version CRLF Parallel Sequential Hierarchical A BNF Specification of HTTP Request (RFC2616)

Related Work  Network Trace  Protocol Informatics  Discoverer [W. Cui et. al. Security’07]  Binary Analysis  Polyglot [J. Caballero et. al. CCS’07]  Automatic Network Protocol Analysis [G. Wondracek et. al. NDSS’08]

Observation 119 int read_header(int sid) { if (sscanf(line, "%[^ ] %[^ ] %[^ ]", conn[sid].dat->in_RequestMethod, conn[sid].dat->in_RequestURI, conn[sid].dat->in_Protocol)!=3) while (strlen(line)>0) { if (strncasecmp(line, "Cookie: ", 8)==0) 155 strncpy(conn[sid].dat->in_Cookie, (char *)&line+8, sizeof(conn[sid].dat->in_Cookie)-1); 156 if (strncasecmp(line, "Host: ", 6)==0) 157 strncpy(conn[sid].dat->in_Host, (char *)&line+6, sizeof(conn[sid].dat->in_Host)-1); … 160 if (strncasecmp(line, "User-Agent: ", 12)==0) 161 strncpy(conn[sid].dat->in_UserAgent, (char *)&line+12, sizeof(conn[sid].dat->in_UserAgent)-1); 162 } } Code snippet in http.c (null-httpd-0.5.0)

AutoFormat -- Basic Idea Execution Context Protocol Fields G E T / n e w s … Context One Field Another Field

System Overview Context-aware Execution Monitor GET /news.html 0 'G' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr 1 'E' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr 2 'T' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr … 24 ‘\n’ main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr … 0 'G' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x1F7F3 ->0xF5A8->ap_read_request->ap_getword_white 0 'G' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr 1 'E' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr 2 'T' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr … 24 ‘\n’ main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr … 0 'G' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x1F7F3 ->0xF5A8->ap_read_request->ap_getword_white Log call stackEIP input

Protocol Field Identifier  Analyze log file  Step 1: build protocol field tree from the logged data.  Step 2: refine the tree using three heuristics  Step 3: output the result

Example: Apache log data 0 'G' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr 1 'E' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr 2 'T' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr … 24 ‘\n’ main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr … 24 '\n' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x >0xF5A8->ap_read_request->ap_rgetline_core 23 '\r‘ main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x >0xF5A8->ap_read_request->ap_rgetline_core 0 'G' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x1F7F3 ->0xF5A8->ap_read_request->ap_getword_white 1 'E' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x1F7F3 ->0xF5A8->ap_read_request->ap_getword_white 2 'T' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x1F7F3 ->0xF5A8->ap_read_request->ap_getword_white … GET /news.html HTTP/1.0\r\n\r\n GET

Step 1 -- Building Protocol Field Tree root GET /news.html HTTP/1.0\r\n User−Agent: Wget/ (Red Hat modified)\r\nAccept: */*\r\n…. GET /news.htmlGET HTTP/1.0

Step 1: Building Protocol Field Tree GET /news.html HTTP/1.0\r\n H news.html GET GET /news.html GET /news.html HTTP/1.0\r\n HTTP/1.0\r\n TTP/1.0 / / / news.html H H TTP/1.0

Step 2: Refinement (Tokenization) GET /news.html HTTP/1.0\r\n /news.html GET GET /news.html GET /news.html HTTP/1.0\r\n HTTP/1.0\r\n /news.html HTTP/1.0 GET /news.html HTTP/1.0\r\n H news.html GET GET /news.html GET /news.html HTTP/1.0\r\n HTTP/1.0\r\n TTP/1.0 / / / news.html H H TTP/1.0

Step 2: Refinement (Redundant Node Deletion) GET /news.html HTTP/1.0\r\n /news.html GET GET /news.html GET /news.html HTTP/1.0\r\n HTTP/1.0\r\n /news.html HTTP/1.0 GET /news.html HTTP/1.0\r\n /news.html GET GET /news.htmlHTTP/1.0\r\n

Step 2: Refinement (Node Insertion) GET /news.html HTTP/1.0\r\n /news.html GET GET /news.htmlHTTP/1.0\r\n GET /news.html HTTP/1.0\r\n /news.html GET GET /news.htmlHTTP/1.0\r\n

Step 3: Output the Result Parallel & Sequential Hierarchical GET /news.html HTTP/1.0\r\n /news.html GET GET /news.html HTTP/1.0\r\n /news.html GET HTTP/1.0\r\n

Evaluation  Implemented on top of Valgrind  Also applies to QEMU, PIN  Benchmark  30 messages with six known protocols and one unknown protocol.  Evaluation Metric  Re: Ratio of exact match |(A ∩ W)| / |W|  A: set of fields identified by AutoFormat  W: set of fields identified by Wireshark

Overall Result Re(F) = 88.5% Re(H) = 98.0% Re(P) = 100.0% Re=93.4% Re(F): Re for finest-grained fields Re(H): Re for hierarchical fields Re(P): Re for parallel fields

Experimental Result – Slapper Worm Nested data structure declaration Compiler inserted gap

Discussion  Dynamic Trace Dependency  Byte granularity  Protocol State Machine  Obfuscated binaries

Conclusion  AutoFormat  A tool for automatic protocol format extraction.  Key insight  A protocol implementation is programmed to recognize the protocol format and usually contains protocol field-specific execution context, and we can actually leverage such context to infer the hierarchical structure of protocol fields, and even get their BNF structures.

Thank you For more information: {zlin, dxu, Q & A