Binary Analysis for Botnet Reverse Engineering & Defense Dawn Song UC Berkeley.

Slides:



Advertisements
Similar presentations
Computer Networks TCP/IP Protocol Suite.
Advertisements

Analyzing and Exploiting Network Behaviors of Malware Jose Andre Morales Areej Al-Bataineh Shouhuai XuRavi Sandhu SecureComm Singapore, 2010 ©2010 Institute.
Chapter 7 Constructors and Other Tools. Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 7-2 Learning Objectives Constructors Definitions.
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Challenges in Making Tomography Practical
1 Resonance: Dynamic Access Control in Enterprise Networks Ankur Nayak, Alex Reimers, Nick Feamster, Russ Clark School of Computer Science Georgia Institute.
17 Copyright © 2005, Oracle. All rights reserved. Deploying Applications by Using Java Web Start.
Cyber Defence Data Exchange and Collaboration Infrastructure (CDXI)
0 - 0.
Addition Facts
1 DTI/EPSRC 7 th June 2005 Reacting to HCI Devices: Initial Work Using Resource Ontologies with RAVE Dr. Ian Grimstead Richard Potter BSc(Hons)
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, ThanassisAvgerinos,
Construction process lasts until coding and testing is completed consists of design and implementation reasons for this phase –analysis model is not sufficiently.
Overview Environment for Internet database connectivity
Processor Data Path and Control Diana Palsetia UPenn
Scalable Parallel Intrusion Detection Fahad Zafar Advising Faculty: Dr. John Dorband and Dr. Yaacov Yeesha 1 University of Maryland Baltimore County.
Campus02.at don't stop thinking about tomorrow DI Anton Scheibelmasser Setubal ICINCO /25 Device integration into automation systems with.
Tivoli Service Request Manager
1 Symbol Tables. 2 Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice.
1 1 Mechanical Design and Production Dept, Faculty of Engineering, Zagazig University, Egypt. Mechanical Design and Production Dept, Faculty of Engineering,
Trap Diagnostic Facility Todays Software Diagnostic Tool with innovative features for the z/OS software developer Arney Computer Systems.
25 July, 2014 Hailiang Mei, TU/e Computer Science, System Architecture and Networking 1 Hailiang Mei Remote Terminal Management.
31242/32549 Advanced Internet Programming Advanced Java Programming
Programming for GCSE Topic 10.2: Designing for File I/O T eaching L ondon C omputing William Marsh School of Electronic Engineering and Computer Science.
Executional Architecture
DB Relay An Introduction. INSPIRATION Database access is WAY TOO HARD The crux.
Enhancing Spotfire with the Power of R
Addition 1’s to 20.
What’s New in WatchGuard Dimension v1.2
1 1999/Ph 514: Channel Access Concepts EPICS Channel Access Concepts Bob Dalesio LANL.
CGI & HTML forms CGI Common Gateway Interface  A web server is only a pipe between user-agents  and content – it does not generate content.
1/16 Steven Leung Introduction to HTML/CGI/JavaScript Intro to HTML/CGI/JavaScript How the Web Works HTML: Basic Concept CGI: Basic Concept JavaScript:
Presenter: James Huang Date: Sept. 29,  HTTP and WWW  Bottle Web Framework  Request Routing  Sending Static Files  Handling HTML  HTTP Errors.
CS 4700 / CS 5700 Network Fundamentals
Modeling Main issues: What do we want to build How do we write this down.
Dr. XiaoFeng Wang Spring 2006 Packet Vaccine: Black-box Exploit Detection and Signature Generation XiaoFeng Wang, Zhuowei Li Jun Xu, Mike Reiter Chongkyung.
1 Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation David Brumley, Juan.
Impeding Malware Analysis Using Conditional Code Obfuscation Paper by: Monirul Sharif, Andrea Lanzi, Jonathon Giffin, and Wenke Lee Conference: Network.
1 Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation David Brumley, Juan.
Ragib Hasan Johns Hopkins University en Spring 2011 Lecture 10 04/18/2011 Security and Privacy in Cloud Computing.
1 Loop-Extended Symbolic Execution on Binary Programs Pongsin Poosankam ‡* Prateek Saxena * Stephen McCamant * Dawn Song * ‡ Carnegie Mellon University.
W3af LUCA ALEXANDRA ADELA – MISS 1. w3af  Web Application Attack and Audit Framework  Secures web applications by finding and exploiting web application.
Lucent Technologies – Proprietary Use pursuant to company instruction Learning Sequential Models for Detecting Anomalous Protocol Usage (work in progress)
MSIT 458 – The Chinchillas. Offense Overview Botnet taxonomies need to be updated constantly in order to remain “complete” and are only as good as their.
Penetration Testing Security Analysis and Advanced Tools: Snort.
Vulnerability-Specific Execution Filtering (VSEF) for Exploit Prevention on Commodity Software Authors: James Newsome, James Newsome, David Brumley, David.
Computer Networking From LANs to WANs: Hardware, Software, and Security Chapter 12 Electronic Mail.
C++ Code Analysis: an Open Architecture for the Verification of Coding Rules Paolo Tonella ITC-irst, Centro per la Ricerca Scientifica e Tecnologica
Paradyn Project Dyninst/MRNet Users’ Meeting Madison, Wisconsin August 7, 2014 The Evolution of Dyninst in Support of Cyber Security Emily Gember-Jacobson.
FiG: Automatic Fingerprint Generation Shobha Venkataraman Joint work with Juan Caballero, Pongsin Poosankam, Min Gyung Kang, Dawn Song & Avrim Blum Carnegie.
Computer Science Detecting Memory Access Errors via Illegal Write Monitoring Ongoing Research by Emre Can Sezer.
1 An Advanced Hybrid Peer-to-Peer Botnet Ping Wang, Sherri Sparks, Cliff C. Zou School of Electrical Engineering & Computer Science University of Central.
EECS 354 Network Security Reverse Engineering. Introduction Preventing Reverse Engineering Reversing High Level Languages Reversing an ELF Executable.
NMD202 Web Scripting Week3. What we will cover today Includes Exercises PHP Forms Exercises Server side validation Exercises.
25th & 26th August 2009ICAT developer workshop 1.
Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution Zhiqiang Lin 1 Xuxian Jiang 2, Dongyan Xu 1, Xiangyu Zhang 1 1.
Week 10-11c Attacks and Malware III. Remote Control Facility distinguishes a bot from a worm distinguishes a bot from a worm worm propagates itself and.
Christopher Kruegel University of California Engin Kirda Institute Eurecom Clemens Kolbitsch Thorsten Holz Secure Systems Lab Vienna University of Technology.
Networking Tutorial Special Interest Group for Software Engineering Luke Rajlich.
Deriving Input Syntactic Structure From Execution Zhiqiang Lin Xiangyu Zhang Purdue University November 11 th, 2008 The 16th ACM SIGSOFT International.
DETECTING TARGETED ATTACKS USING SHADOW HONEYPOTS AUTHORS: K. G. Anagnostakisy, S. Sidiroglouz, P. Akritidis, K. Xinidis, E. Markatos, A. D. Keromytisz.
ASP-2-1 SERVER AND CLIENT SIDE SCRITPING Colorado Technical University IT420 Tim Peterson.
A Binary Agent Technology for COTS Software Integrity Anant Agarwal Richard Schooler InCert Software.
Role Of Network IDS in Network Perimeter Defense.
Multimedia Retrieval Architecture Electrical Communication Engineering, Indian Institute of Science, Bangalore – , India Multimedia Retrieval Architecture.
Cosc 4765 Antivirus Approaches. In a Perfect world The best solution to viruses and worms to prevent infected the system –Generally considered impossible.
Automatic Network Protocol Analysis
WWW and HTTP King Fahd University of Petroleum & Minerals
CSC-682 Advanced Computer Security
Presentation transcript:

Binary Analysis for Botnet Reverse Engineering & Defense Dawn Song UC Berkeley

Binary Analysis Is Important for Botnet Defense Botnet programs: no source code, only binary Botnet defense needs internal understanding of botnet programs – C&C reverse engineering Different possible commands, encryption/decryption – Botnet traffic rewriting – Botnet infiltration – Botnet vulnerability discovery

BitBlaze Binary Analysis Infrastructure: Architecture The first infrastructure: – Novel fusion of static, dynamic, formal analysis methods Loop extended symbolic execution Grammar-aware symbolic execution – Whole system analysis (including OS kernel) – Analyzing packed/encrypted/obfuscated code Vine: Static Analysis Component TEMU: Dynamic Analysis Component Rudder: Mixed Execution Component BitBlaze Binary Analysis Infrastructure

Dissecting Malware Dissecting Malware BitBlaze Binary Analysis Infrastructure Detecting Vulnerabilities Detecting Vulnerabilities Generating Filters Generating Filters BitBlaze: Security Solutions via Program Binary Analysis Unified platform to accurately analyze security properties of binaries Security evaluation & audit of third-party code Defense against morphing threats Faster & deeper analysis of malware

The BitBlaze Approach & Research Foci Semantics based, focus on root cause: Automatically extracting security-related properties from binary code for effective vulnerability detection & defense 1.Build a unified binary analysis platform for security – Identify & cater common needs of different security applications – Leverage recent advances in program analysis, formal methods, binary instrumentation/analysis techniques for new capabilities 2.Solve real-world security problems via binary analysis Extracting security related models for vulnerability detection Generating vulnerability signatures to filter out exploits Dissecting malware for real-time diagnosis & offense: e.g., botnet infiltration More than a dozen security applications & publications

Plans Building on BitBlaze to develop new techniques Automatic Reverse Engineering of C&C protocols of botnets Automatic rewriting of botnet traffic to facilitate botnet infiltration Vulnerability discovery of botnet

Preliminary Work Dispatcher: Enabling Active Botnet Infiltration using Automatic Protocol Reverse-Engineering Binary code extraction and interface identification for botnet traffic rewriting Botnet analysis for vulnerability discovery

Dispatcher: Enabling Active Botnet Infiltration using Automatic Protocol Reverse-Engineering Juan Caballero Pongsin Poosankam Christian Kreibich Dawn Song

Automatic Protocol Reverse-Engineering Process of extracting the application-level protocol used by a program, without the specification – Automatic process – Many undocumented protocols (C&C, Skype, Yahoo) Encompasses extracting: 1.the Protocol Grammar 2.the Protocol State Machine Message format extraction is prerequisite

Challenges for Active Botnet Infiltration 2.Access to one side of dialog only 1.Understand both sides of C&C protocol –Message structure –Field semantics 3.Handle encryption/obfuscation Goal: Rewrite C&C messages on either dialog side

Technical Contributions 1.Buffer deconstruction, a technique to extract the format of sent messages Earlier work only handles received messages 2.Field semantics inference techniques, for messages sent and received 3.Designing and developing Dispatcher 4.Extending a technique to handle encryption 5.Rewriting a botnet dialog using information extracted by Dispatcher

Message Format Extraction Extract format of a single message Required by Grammar and State Machine extraction GET / HTTP/1.1 HTTP/ OK [Polyglot] [Dispatcher]

Message Field Tree Field Range: [3:3] Field Boundary: Fixed Field Semantics: Delimiter Field Keywords: Target: Version HTTP/ OK\r\n\r\n MSG [0:18] Status Line [0:16] Version [0:7] Delimiter [8:8] Status-Code [9:11] Delimiter [12:12] Reason [13:14] Delimiter [15:16] Delimiter [17:18] Message format extraction has 2 steps: 1.Extract tree structure 2.Extract field attributes

Sent vs. Received Both protocol directions from single binary Different problems – Taint information harder to leverage – Focus on how message is constructed, not processed Different techniques needed: – Tree structure Buffer Deconstruction – Field attributes New heuristics

Outline Introduction Problem Techniques Buffer Deconstruction Evaluation Field Semantics Inference Handling encryption

Buffer Deconstruction Intuition – Programs keep fields in separate memory buffers – Combine those buffers to construct sent message Output buffer – Holds message when send function invoked – Or holds unencrypted message before encryption Recursive process – Decompose a buffer into buffers used to fill it – Starts with output buffer – Stops when theres nothing to recurse

Buffer Deconstruction Message field tree = inverse of output buffer structure Output is structure of message field tree – No field attributes, except range Output Buffer (19) A(17) G(2)D(1)E(3)F(1)C(8)H(2) [0:18] [0:16] [17:18] [0:7][8:8][9:11][12:12][13:14][15:16] MSG Delimiter Status Line ReasonStatus Code DelimiterVersion B(2) Delimiter HTTP/ OK\r\n\r\n

Field Attributes Inference Attributes capture extra information – E.g., inter-field relationships AttributeValue Field Range[StartOffset : EndOffset] Field BoundaryFixed, Length, Delimiter Field SemanticsIP address, Timestamp, … Field Keywords Techniques identify –Keywords –Length fields –Delimiters –Variable-length field –Arrays

Field Semantics CookiesKeyboard input Error codesKeywords File dataLength File informationPadding FilenamesPorts Hash / ChecksumRegistry data HostnamesSleep timers Host informationStored data IP addressesTimestamps A field attribute in the message field tree Captures the type of data in the field Programs contain much semantic info leverage it! Semantics in well-defined functions and instructions – Prototype Similar to type inference Differs for received and sent messages

Field Semantic Inference GET /index.html HTTP/1.1 struct stat { … off_t st_size; /* total size in bytes */ … } int stat(const char*path, struct stat *buf); OUT IN HTTP/ OK Content-Length: 25 Hello world! File path File length stat(index.html, &file_info);

Detecting Encoding Functions Encoding functions = (de)compression, (de)(en)cryption, (de)obfuscation… High ratio of arithmetic & bitwise instructions Use read/write set to identify buffers Work-in-progress on extracting and reusing encoding functions

MegaD C&C protocol type MegaD_Message = record { msg_len : uint16; encrypted_payload: bytestring &length = 8*msg_len; } &byteorder = bigendian; type encrypted_payload = record { version : uint16; mtype : uint16; data : MegaD_data (mtype); }; type MegaD_data (msg_type: uint16) = case msg_type of { 0x00 -> m00 : msg_0; […] default -> unknown : bytestring &restofdata; }; C&C on tcp/443 using proprietary encryption Use Dispatchers output to generate grammar – 15 different messages seen (7 recv, 8 sent) – 11 field semantics

C&C Server Cmd?EHLO MegaD Dialog Test SMTP Failed SMTP Test Server

Template Server C&C Server EHLOCmd? Failed MegaD Rewriting Test SMTP Get Template Template? Grammar Success SMTP Test Server

Summary Buffer deconstruction, a technique to extract the format of sent messages Field semantics inference techniques, for messages sent and received Designed and developed Dispatcher Extended technique to handle encryption Rewrote MegaD dialog using information extracted by Dispatcher