Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution Zhiqiang Lin 1 Xuxian Jiang 2, Dongyan Xu 1, Xiangyu Zhang 1 1.

Slides:



Advertisements
Similar presentations
Balajee Vamanan, Gwendolyn Voskuilen, and T. N. Vijaykumar School of Electrical & Computer Engineering SIGCOMM 2010.
Advertisements

PROVENANCE FOR THE CLOUD (USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES(FAST `10)) Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard.
Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extraction of.
Linear Obfuscation to Combat Symbolic Execution Zhi Wang 1, Jiang Ming 2, Chunfu Jia 1 and Debin Gao 3 1 Nankai University 2 Pennsylvania State University.
1 Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation David Brumley, Juan.
Parallel Prefix Computation Advanced Algorithms & Data Structures Lecture Theme 14 Prof. Dr. Th. Ottmann Summer Semester 2006.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
5/1/2006Sireesha/IDS1 Intrusion Detection Systems (A preliminary study) Sireesha Dasaraju CS526 - Advanced Internet Systems UCCS.
Data and Computer Communications Eighth Edition by William Stallings Lecture slides by Lawrie Brown Chapter 23 – Internet Applications Internet Directory.
Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel.
Process Coloring: an Information Flow-Preserving Approach to Malware Investigation Eugene Spafford, Dongyan Xu (Presenter) Department of Computer Science.
School of Computer Science and Information Systems
Winter Retreat Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer
The abs_path in a URI If the abs_path is not present in the URL, it must be given as "/" in a Request-URI for a resource. Thus, if a user points a browser.
Strategies to relate the program and problem domains using code instrumentation Mario Marcelo Berón University of Minho Pedro Rangel Henriques University.
Rensselaer Polytechnic Institute CSC-432 – Operating Systems David Goldschmidt, Ph.D.
Intrusion Detection System Marmagna Desai [ 520 Presentation]
Defeating Large Scale Attacks: Technology and Strategies for Global Network Monitoring The NetViewer Experiment PAVG in collaboration with Networking Systems.
Lucent Technologies – Proprietary Use pursuant to company instruction Learning Sequential Models for Detecting Anomalous Protocol Usage (work in progress)
Web Hacking 1. Overview Why web HTTP Protocol HTTP Attacks 2.
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
HyperText Transfer Protocol (HTTP).  HTTP is the protocol that supports communication between web browsers and web servers.  A “Web Server” is a HTTP.
1 HTTPCore, Cookies Managing Data on the World Wide-Web Elad Kravi.
Universal HTTP Denial-of-Service. About Hybrid Creating web-business-logic security Doing cool stuff in AI research Optimizing acceptance rate for Web-bound.
Copyright (c) 2010, Dr. Kuanchin Chen1 The Client-Server Architecture of the WWW Dr. Kuanchin Chen.
Sistem Jaringan dan Komunikasi Data #9. DNS The Internet Directory Service  the Domain Name Service (DNS) provides mapping between host name & IP address.
Zois Vasileios Α. Μ :4183 University of Patras Department of Computer Engineering & Informatics Diploma Thesis.
Face Detection And Recognition For Distributed Systems Meng Lin and Ermin Hodžić 1.
WebServer A Web server is a program that, using the client/server model and the World Wide Web's Hypertext Transfer Protocol (HTTP), serves the files that.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Automatic Diagnosis and Response to Memory Corruption Vulnerabilities Authors: Jun Xu, Peng Ning, Chongkyung Kil, Yan Zhai, Chris Bookholt In ACM CCS’05.
FiG: Automatic Fingerprint Generation Shobha Venkataraman Joint work with Juan Caballero, Pongsin Poosankam, Min Gyung Kang, Dawn Song & Avrim Blum Carnegie.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Discoverer: Automatic Protocol Reverse Engineering from Network Traces Weidong Cui Jayanthkumar Kannan Helen J. Wang Microsoft Research USENIX Security.
Automatically Generating Models for Botnet Detection Presenter: 葉倚任 Authors: Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel,
Christopher Kruegel University of California Engin Kirda Institute Eurecom Clemens Kolbitsch Thorsten Holz Secure Systems Lab Vienna University of Technology.
Mapping Internet Sensors with Probe Response Attacks Authors: John Bethencourt, Jason Franklin, Mary Vernon Published At: Usenix Security Symposium, 2005.
Using Prediction to Accelerate Coherence Protocols Authors : Shubendu S. Mukherjee and Mark D. Hill Proceedings. The 25th Annual International Symposium.
HTTP1 Hypertext Transfer Protocol (HTTP) After this lecture, you should be able to:  Know how Web Browsers and Web Servers communicate via HTTP Protocol.
Hassen Grati, Houari Sahraoui, Pierre Poulin DIRO, Université de Montréal Extracting Sequence Diagrams from Execution Traces using Interactive Visualization.
Deriving Input Syntactic Structure From Execution Zhiqiang Lin Xiangyu Zhang Purdue University November 11 th, 2008 The 16th ACM SIGSOFT International.
Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution Zhiqiang Lin 1 Xuxian Jiang 2, Dongyan Xu 1, Xiangyu Zhang 1 1.
Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection Rachna Vargiya and Philip Chan Department of Computer Sciences Florida.
Polygraph: Automatically Generating Signatures for Polymorphic Worms James Newsome, Brad Karp, and Dawn Song Carnegie Mellon University Presented by Ryan.
CISC Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.
Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana Cardiff University, UK.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Polygraph: Automatically Generating Signatures for Polymorphic Worms Presented by: Devendra Salvi Paper by : James Newsome, Brad Karp, Dawn Song.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
By: Gang Zhou Computer Science Department University of Virginia 1 Medians and Beyond: New Aggregation Techniques for Sensor Networks CS851 Seminar Presentation.
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
WebWatcher A Lightweight Tool for Analyzing Web Server Logs Hervé DEBAR IBM Zurich Research Laboratory Global Security Analysis Laboratory
Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software Paper by: James Newsome and Dawn Song.
A Binary Agent Technology for COTS Software Integrity Anant Agarwal Richard Schooler InCert Software.
Convicting Exploitable Software Vulnerabilities: An Efficient Input Provenance Based Approach Zhiqiang Lin Xiangyu Zhang, Dongyan Xu Purdue University.
Data and Computer Communications Eighth Edition by William Stallings Lecture slides by Lawrie Brown Chapter 23 – Internet Applications Internet Directory.
2009/6/221 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Reporter : Fong-Ruei, Li Machine.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Powerpoint presentation on Drive-by download attack -By Yogita Goyal.
XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.
SIEM Rotem Mesika System security engineering
Automatic Network Protocol Analysis
MadeCR: Correlation-based Malware Detection for Cognitive Radio
Section 8.1 Trees.
RDE: Replay DEbugging for Diagnosing Production Site Failures
Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights Feng Zhang †⋄, Jidong Zhai ⋄, Xipeng Shen #, Onur Mutlu ⋆, Wenguang.
Data Structures and Algorithms
William Stallings Data and Computer Communications
HTTP Hypertext Transfer Protocol
Presentation transcript:

Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution Zhiqiang Lin 1 Xuxian Jiang 2, Dongyan Xu 1, Xiangyu Zhang 1 1 Purdue University 2 George Mason University February 12 th, 2007 The 15 th Annual Network and Distributed System Security Symposium

Motivation  Protocol reverse engineering  A process to recover protocol specifications  E.g., fields and their relationships  Applications:  Network-based Intrusion detection – DoS attacks, Port Scans, Computer Systems  Network management – correctly recognize and monitor traffic  Fuzz Testing – s/w testing technique  …

Challenges 0x0040: cd f6e e d6c 0x0050: f 312e 300d 0a d 0x0060: e 743a f 312e x0070: 2e d6f x0080: d 0a a 202a 0x0090: 2f2a 0d0a 486f a e x00a0: 342e e37 310d 0a43 6f6e 6e x00b0: 696f 6e3a 204b d 416c d. 0x00c0: 0a0d 0a  Multiple fields in a single message  Non-static size of fields  Complex relationships among protocol fields Sequential Parallel Hierarchical

Challenges HTTP-Request = Request-Line (( general-header | request-header | entity-header ) CRLF)* CRLF [ message-body ] Request-Line = Method SP Request-URI SP HTTP-Version CRLF Parallel Sequential Hierarchical A BNF Specification of HTTP Request (RFC2616) Note: SP and CRLF are separators **Hierarchical relation: A field can be further divided into multiple sub-fields **Sequential relation : Captures the ordering between adjacent fields in a protocol. **Parallel relation: The positions of two or more fields are exchangeable in the protocol specification.

Related Work  Network Trace  Protocol Informatics  Discoverer [W. Cui et. al. Security’07]  Binary Analysis  Polyglot [J. Caballero et. al. CCS’07]  Automatic Network Protocol Analysis [G. Wondracek et. al. NDSS’08]

Observation 119 int read_header(int sid) { sgets(line, sizeof(line)-1, conn[sid].socket); … 137 if (sscanf(line, "%[^ ] %[^ ] %[^ ]", conn[sid].dat->in_RequestMethod, conn[sid].dat->in_RequestURI, conn[sid].dat->in_Protocol)!=3) while (strlen(line)>0) { if (strncasecmp(line, "Cookie: ", 8)==0) 155 strncpy(conn[sid].dat->in_Cookie, (char *)&line+8, sizeof(conn[sid].dat->in_Cookie)-1); 156 if (strncasecmp(line, "Host: ", 6)==0) 157 strncpy(conn[sid].dat->in_Host, (char *)&line+6, sizeof(conn[sid].dat->in_Host)-1); … 160 if (strncasecmp(line, "User-Agent: ", 12)==0) 161 strncpy(conn[sid].dat->in_UserAgent, (char *)&line+12, sizeof(conn[sid].dat->in_UserAgent)-1); 162 } } Code snippet in http.c (null-httpd-0.5.0) REQUEST LINE field divided into METHOD, REQUEST URI and HTTP VERSION Cookie, host, user- agent are  Parallel fields

AutoFormat -- Basic Idea Execution Context Protocol Fields G E T / n e w s … Context One Field Another Field

System Overview Context-aware Execution Monitor GET /news.html 0 'G' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr 1 'E' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr 2 'T' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr … 24 ‘\n’ main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr … 0 'G' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x1F7F3 ->0xF5A8->ap_read_request->ap_getword_white 0 'G' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr 1 'E' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr 2 'T' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr … 24 ‘\n’ main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr … 0 'G' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x1F7F3 ->0xF5A8->ap_read_request->ap_getword_white Log call stackEIP input

Protocol Field Identifier  Analyze log file  Step 1: build protocol field tree from the logged data.  Step 2: refine the tree using three heuristics  Step 3: output the result

Example: Apache log data 0 'G' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr 1 'E' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr 2 'T' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr … 24 ‘\n’ main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr … 24 '\n' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x >0xF5A8->ap_read_request->ap_rgetline_core 23 '\r‘ main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x >0xF5A8->ap_read_request->ap_rgetline_core 0 'G' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x1F7F3 ->0xF5A8->ap_read_request->ap_getword_white 1 'E' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x1F7F3 ->0xF5A8->ap_read_request->ap_getword_white 2 'T' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x1F7F3 ->0xF5A8->ap_read_request->ap_getword_white … GET /news.html HTTP/1.0\r\n\r\n GET

Step 1 -- Building Protocol Field Tree root GET /news.html HTTP/1.0\r\n User−Agent: Wget/ (Red Hat modified)\r\nAccept: */*\r\n…. GET /news.htmlGET HTTP/1.0 Contains offsets of all input data Parent node contains offsets of its children

Step 1: Building Protocol Field Tree GET /news.html HTTP/1.0\r\n H news.html GET GET /news.html GET /news.html HTTP/1.0\r\n HTTP/1.0\r\n TTP/1.0 / / / news.html H H TTP/1.0 Overly fine grained fields Redundancy in fields Missing SPACE before “ /n”

Step 2: Refinement (Tokenization) GET /news.html HTTP/1.0\r\n /news.html GET GET /news.html GET /news.html HTTP/1.0\r\n HTTP/1.0\r\n /news.html HTTP/1.0 GET /news.html HTTP/1.0\r\n H news.html GET GET /news.html GET /news.html HTTP/1.0\r\n HTTP/1.0\r\n TTP/1.0 / / / news.html H H TTP/1.0 Merge 2 child nodes if their content can form one token –based on TEXT- BASED PROTOCOLS

Step 2: Refinement (Redundant Node Deletion) GET /news.html HTTP/1.0\r\n /news.html GET GET /news.html GET /news.html HTTP/1.0\r\n HTTP/1.0\r\n /news.html HTTP/1.0 GET /news.html HTTP/1.0\r\n /news.html GET GET /news.htmlHTTP/1.0\r\n An internal node is redundant if it has only 1 child

Step 2: Refinement (Node Insertion) GET /news.html HTTP/1.0\r\n /news.html GET GET /news.htmlHTTP/1.0\r\n GET /news.html HTTP/1.0\r\n /news.html GET GET /news.htmlHTTP/1.0\r\n Insert a new child node to parent IF the offsets of children do not match the parent.

Step 3: Output the Result Parallel & Sequential Hierarchical GET /news.html HTTP/1.0\r\n /news.html GET GET /news.html HTTP/1.0\r\n /news.html GET HTTP/1.0\r\n Parallel: *Collect execution history of each node * For a parent- if child nodes share similar history –MARK it Sequential: *Pre-order traversal of tree -lists the leaf nodes -parent of multiple parallel nodes

Evaluation  Implemented on top of Valgrind  Also applies to QEMU, PIN  Benchmark  30 messages with six known protocols and one unknown protocol.  Evaluation Metric  Re: Ratio of exact match |(A ∩ W)| / |W|  A: set of fields identified by AutoFormat  W: set of fields identified by Wireshark For context aware execution monitor

Overall Result Averages: Re(F) = 88.5% Re(H) = 98.0% Re(P) = 100.0% Re=93.4% Re(F): Re for finest-grained fields Re(H): Re for hierarchical fields Re(P): Re for parallel fields 100% match with Wireshark * (-) => |P| for Wireshark=0

Discussion  Dynamic Trace Dependency -AutoFormat does not detect message formats not present in the execution trace  Byte granularity – AutoFormat does not detect protocol fields at bit level  Protocol State Machine – AutoFormat does not correlate multiple messages of same protocol session.  Obfuscated binaries- AutoFormat does not handle these type of inputs.

Conclusion  Paper also includes the Slapper Worm Messages as a part of second experimental results set.  AutoFormat  A tool for automatic protocol format extraction.  Key insight  A protocol implementation is programmed to recognize the protocol format and usually contains protocol field-specific execution context, and we can actually leverage such context to infer the hierarchical structure of protocol fields, and even get their BNF structures.

Thank you For more information: {zlin, dxu, Q & A