Eureka: A Framework for Enabling Static Analysis on Malware

Slides:



Advertisements
Similar presentations
Pokas x86 Emulator for Generic Unpacking By Amr Thabet
Advertisements

Part IV: Memory Management
Saumya Debray The University of Arizona Tucson, AZ
Sample chapter from Reverse Engineering Course.
Malware Detection via Virtual Machine Monitoring Wenke Lee.
Chapter 3 Loaders and Linkers
Operating System Security : David Phillips A Study of Windows Rootkits.
Malware Repository Overview Wenke Lee David Dagon Georgia Institute of Technology.
1 Detection of Injected, Dynamically Generated, and Obfuscated Malicious Code (DOME) Subha Ramanathan & Arun Krishnamurthy Nov 15, 2005.
Disclaimer The Content, Demonstration, Source Code and Programs presented here is "AS IS" without any warranty or conditions.
Impeding Malware Analysis Using Conditional Code Obfuscation Paper by: Monirul Sharif, Andrea Lanzi, Jonathon Giffin, and Wenke Lee Conference: Network.
Effective and Efficient Malware Detection at the End Host Clemens Kolbitsch, Paolo Milani TU Vienna Christopher UCSB Engin Kirda.
Polymorphic blending attacks Prahlad Fogla et al USENIX 2006 Presented By Himanshu Pagey.
Anomaly Detection Using Call Stack Information Security Reading Group July 2, 2004 Henry Feng, Oleg Kolesnikov, Prahlad Fogla, Wenke Lee, Weibo Gong Presenter:
1 Process Description and Control Chapter 3 = Why process? = What is a process? = How to represent processes? = How to control processes?
@ NCSU Zhi NCSU Xuxian Microsoft Research Weidong Microsoft NCSU Peng NCSU ACM CCS’09.
Efficient Instruction Set Randomization Using Software Dynamic Translation Michael Crane Wei Hu.
Automated Malware Analysis
Code Injection and Software Cracking’s Effect on Network Security Group 5 Jason Fritts Utsav Kanani Zener Bayudan ECE 4112 Fall 2007.
Jarhead Analysis and Detection of Malicious Java Applets Johannes Schlumberger, Christopher Kruegel, Giovanni Vigna University of California Annual Computer.
Automated malware classification based on network behavior
A Hybrid Model to Detect Malicious Executables Mohammad M. Masud Latifur Khan Bhavani Thuraisingham Department of Computer Science The University of Texas.
Software Analysis & Deobfuscation Engine. Page  2  Project Name: SADE  Project Members: Faiza Khalid, Komal Babar and Abdul Wahab  Project Supervisor.
Panorama: Capturing System-wide Information Flow for Malware Detection and Analysis Authors: Heng Yin, Dawn Song, Manuel Egele, Christoper Kruegel, and.
Dr. XiaoFeng Wang AGIS: Towards Automatic Generation of Infection Signatures Zhuowei Li 1,3, XiaoFeng Wang 1, Zhenkai Liang 4 and Mike Reiter 2 1 Indiana.
Presented by: Kushal Mehta University of Central Florida Michael Spreitzenbarth, Felix Freiling Friedrich-Alexander- University Erlangen, Germany michael.spreitzenbart,
Meltem Ozsoy*, Caleb Donovick*, Iakov Gorelik*,
Behavior-based Spyware Detection By Engin Kirda and Christopher Kruegel Secure Systems Lab Technical University Vienna Greg Banks, Giovanni Vigna, and.
Bob Gilber, Richard Kemmerer, Christopher Kruegel, Giovanni Vigna University of California, Santa Barbara RAID 2011,9 報告者:張逸文 1.
Part 3: Advanced Dynamic Analysis Chapter 8: Debugging.
Rootkits in Windows XP  What they are and how they work.
KEVIN COOGAN, GEN LU, SAUMYA DEBRAY DEPARTMENT OF COMUPUTER SCIENCE UNIVERSITY OF ARIZONA 報告者:張逸文 Deobfuscation of Virtualization- Obfuscated Software.
Malgram Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation Phillip Porras and Hassen Saidi Computer Science Lab SRI International.
Windows PE files Infections and Heuristic Detection Nicolas BRULEZ / Digital River PACSEC '04.
Ether: Malware Analysis via Hardware Virtualization Extensions Author: Artem Dinaburg, Paul Royal, Monirul Sharif, Wenke Lee Presenter: Yi Yang Presenter:
Quasi-Static Binary Analysis Hassen Saidi. Quasi-Static Analysis in VERNIER Node level: –Quasi-static analysis is a detector of malicious and bad behavior.
Malware Analysis Jaimin Shah & Krunal Patel Vishal Patel & Shreyas Patel Georgia Institute of Technology School of Electrical and Computer Engineering.
Executable Unpacking using Dynamic Binary Instrumentation Shubham Bansal (iN3O) Feb 2015 UndoPack 1.
Auther: Kevian A. Roudy and Barton P. Miller Speaker: Chun-Chih Wu Adviser: Pao, Hsing-Kuo.
Christopher Kruegel University of California Engin Kirda Institute Eurecom Clemens Kolbitsch Thorsten Holz Secure Systems Lab Vienna University of Technology.
MICHALIS POLYCHRONAKIS(COLUMBIA UNIVERSITY,USA), KOSTAS G. ANAGNOSTAKIS(NIOMETRICS, SINGAPORE), EVANGELOS P. MARKATOS(FORTH-ICS, GREECE) ACSAC,2010 Comprehensive.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
CSE451 Linking and Loading Autumn 2002 Gary Kimura Lecture #21 December 9, 2002.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
1 OmniUmpack: Fast, Generic, and Safe Unpacking of Malware Authors: Lerenzo Martignoni, Mihai Christodorescu and Somesh Jha Computer Security Applications.
RIVERSIDE RESEARCH INSTITUTE Deobfuscator: An Automated Approach to the Identification and Removal of Code Obfuscation Eric Laspe, Reverse Engineer Jason.
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
Eureka: A Framework for Enabling Static Malware Analysis the 13 th European Symposium on Research in Computer Security (ESORICS) conference 2008 WANG Zhi.
CISC Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic.
Homework tar file Download your course tarball from web page – Named using your PSU ID – Chapter labeled for each binary.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Memory Management Overview.
It consists of two parts: collection of files – stores related data directory structure – organizes & provides information Some file systems may have.
Windows workshop 2010 Understanding Software Dependencies in Windows Roland Yap School of Computing National University of Singapore Singapore
Lecture 10 Anti-debugger techniques. Anti-debuggers Making reverse-engineering and disassembly painful –Polymorphism –Encryption –Interrupt disabling.
Software Reverse Engineering Binary analysis: concepts, methods and tools. Catalin Patulea Mar 5, 2008.
Unveiling Zeus Automated Classification of Malware Samples Abedelaziz Mohaisen Omar Alrawi Verisign Inc, VA, USA Verisign Labs, VA, USA
On the Analysis of the Zeus Botnet Crimeware Toolkit H. Binsalleeh, T. Ormerod, A. Boukhtouta, P. Sinha, A. Youssef, M. Debbabi, and L. Wang Presented.
Cosc 4765 Antivirus Approaches. In a Perfect world The best solution to viruses and worms to prevent infected the system –Generally considered impossible.
HookScout: Proactive Binary-Centric Hook Detection
Malware Reverse Engineering Process
Chapter 1. Basic Static Techniques
Techniques, Tools, and Research Issues
Harvesting Runtime Values in Android Applications That Feature Anti-Analysis Techniques Presented by Vikraman Mohan.
Malware Reverse Engineering Process
Techniques, Tools, and Research Issues
Part 1: Basic Analysis Chapter 1: Basic Static Techniques
Attacking Obfuscated Code with IDA Pro
Chapter 2: The Linux System Part 2
CMSC 491/691 Malware Analysis
CSC 497/583 Advanced Topics in Computer Security
Presentation transcript:

Eureka: A Framework for Enabling Static Analysis on Malware MARS.MTC.SRI.COM

Motivation Malware landscape is diverse and constant evolving Large botnets Diverse propagation vectors, exploits, C&C Capabilities – backdoor, keylogging, rootkits, Logic bombs, time-bombs Malware is not about script-kiddies anymore, it’s real business. Manual reverse-engineering is close to impossible Need automated techniques to extract system logic, interactions and side-effects

Dynamic vs Static Malware Analysis Dynamic Analysis Techniques that profile actions of binary at runtime Better track record to date CWSandbox, TTAnalyze Only provides partial ``effects-oriented profile’’ of malware potential Static Analysis Can provide complementary insights Potential for more comprehensive assessment

Malware Evasions and Obfuscations To defeat signature based detection schemes Polymorphism, metamorphism: started appearing in viruses of the 90’s primarily to defeat AV tools To defeat Dynamic Malware Analysis Anti-debugging, anti-tracing, anti-memory dumping VMM detection, emulator detection To defeat Static Malware analysis Encryption (packing) API and control-flow obfuscations Anti-disassembly

System Goals Desiderata for a Static Analysis Framework Unpack over 90% of contemporary malware Handle most if not all packers Deobfuscate API references Automate identification of capabilities Provide feedback on unpacking success Simplify and annotate call graphs to illustrate interactions between key logical blocks

The Eureka Framework Novel unpacking technique based on coarse grained execution tracing Heuristic-based and statistic-based upacking Implements several techniques to handle obfucated API references Multiple metrics to evaluate unpack success Annotated call graphs provide bird’s eye view of system interaction

Annotated Call-Graphs The Eureka Workflow Packed Binary Dis-assembly IDA-Pro Packed .ASM Statistics based Evaluator Unpack Evaluation Trace Malware syscalls in VM Eureka’s Unpacker Un- packed Binary Dis-assembly IDA-Pro Un-Packed .ASM Eureka’s API Resolver (Control and Data-flow Analysis) Syscall trace Favorable execution point Heuristic based offline analysis Un- obfuscated .ASM Detailed call-graph Statistics based Evaluator Annotated Call-Graphs (Control and Data-flow Analysis)

Coarse-grained Execution Monitoring Generalized unpacking principle Execute binary till it has sufficiently revealed itself Dump the process execution image for static analysis Monitoring exection progress Eureka employs a Windows driver that hooks to SSDT (System Service Dispatch Table) Callback invoked on each NTDLL system call Filtering based on malware process pid

Related Work PolyUnpack (Royal et al. ACSAC 2006) Static model using program analysis Fine-grained execution tracking detects execution steps outside the model Renovo (Kang et al. WORM 2007) Fine-grained execution tracking using QEMU Dumping trigger: execution of newly written code OmniUnpack (Martigoni et al. ACSAC 2007) Coarse-grained monitoring using page-level protection mechanisms Co

Design Space System Environment Granularity Trigger Child process monitoring Output Layers Speed Evasions Poly-Unpack Inside VM Instruction Model No 1 Slow 1,2,3 Renovo Outside VM Heuristic Yes Many 2,4 Omni-Unpack Page Fast 2,3 Eureka System Call Statistic 1,Many Evasions: (1) multiple packing (2) partial code revealing packers (3) VM detection (4) Emulator detection

Heuristic-based Unpacking How do you determine when to dump? Heuristic #1: Dump as late as possible. NtTerminateProcess Heuristic #2: Dump when your program generates errors. NtRaiseHardError Heuristic #3: Dump when program forks a child process. NtCreateProcess Issues Weak adversarial model, too simple to evade… Doesn’t work well for package non-malware programs

Statistics-based Unpacking Observations Statistical properties of packed executable differ from unpacked exectuable As malware executes code-to-data ratio increases Complications Code and data sections are interleaved in PE executables Data directories(import tables) look similar to data but are often found in code sections Properties of data sections vary with packers

Statistics-based Unpacking (2) Our Approach Model statistical properties of unpacked code Volume of unpacked code must strictly increase Estimating unpacked code N-gram analysis to look for frequent instructions We use bi-grams (2-grams) because x-86 opcodes are 1 or 2 bytes Extract subroutine code from 9 benign executables FF 15 (call), FF 75 (push), E8 _ _ _ ff (call), E8 _ _ _ 00 (call)

Statistics-based Unpacking (3) Bigram Calc 117 KB Explorer 1010 KB Ipconfig 59 KB lpr 11 KB Mshearts 131 KB Notepad 72 KB Ping 21 KB Shutdown 23 KB Taskman 19 KB FF 15 call 246 3045 184 24 192 415 58 132 126 FF 75 push 235 2494 272 33 274 254 41 63 85 E8 _ _ _ 0xff 1583 2201 181 19 369 180 87 49 E8 _ _ _ 0x00 746 1091 152 62 641 108 57 66 50

Statistics-based Unpacking (4) Feasibility test Corpus of (pre- and post-unpacked) executables unpacked with heuristic unpacking 1090 executables: 125 originally unpacked, 965 unpacked Simple bi-gram counting was able to distinguish 922 out of 965 unpacked executables (95% success rate)

STOP Algorithm STOP – Statistical Test for Online unPacking Online algorithm for determing dumping trigger Simple hypothesis test for change in mean Null Hypothesis: mean bigram count has not increased Assumption: bigram counts are normally distributed with prior mean μo. If (μ1 – μ0) / σ1 > 1.645, we reject null hypothesis with confidence level of 0.95. Test is repeated to determine beginning of unpacking and end of unpacking.

API Resolution User-level malware programs require system calls to perform malicious actions Use Win32 API to access user level libraries Obufscations impede malware analysis using IDA Pro or OllyDbg Packers use non-standard linking and loading of dlls Obfuscated API resolution

Standard API Resolution API Calls Calls to various user-level DLL’s linked by the Windows Linker/Loader Legitimate executables have import table Import table is used to fill up IAT with virtual addresses at run-time CALL F ; call by thunk … CALL [X] ; indirect call CALL X Imports X KERNEL32.OpenFile …….. IAT (Import Address Table) B+R Exports OpenFile R KERNEL32.DLL B: R: Entrypoint to OpenFile X: Dynamic linking F: JMP [X] ; thunk

Standard API Resolution Imports in IAT identified by IDA by looking at Import Table

API Obfuscation by Packers Import table is removed IAT is not filled in by the linker and loader Unpacker fills in IAT or similar data structure by itself Hard to identify corresponding API call in executable …………. CALL F CALL X Imports X KERNEL32.OpenFile …….. IAT (Import Address Table) B+R Exports OpenFile R KERNEL32.DLL B: R: Entrypoint to OpenFile X: F: JMP [X] ; thunk

Identifying APIs by Address For each DLL build relative and absolute address database Default “Image address” is the base address Calculate corresponding virtual address for each exported API Match addresses used in calls with the databaseS …………. CALL [X] CALL X Imports X KERNEL32.OpenFile …….. IAT (Import Address Table) 7c810332 Exports OpenFile R KERNEL32.DLL 7c800000: 7c810332: Entrypoint to OpenFile X: Dynamic linking

Handling DLL Load Obfuscations Intercept dynamic loading at arbitrary addresses Look for “NtOpenSection” and “NtMapViewOfSection” in trace Search for DLL headers in memory during dumping Can even identify DLL code that are copied to arbitrary location …………. CALL F CALL X Imports X KERNEL32.OpenFile …….. IAT (Import Address Table) 21810332 Exports OpenFile R KERNEL32.DLL RVA:00000: RVA:10332: Entrypoint to OpenFile X: Dynamic linking

Handling Thunks Identify subroutines with a JMP instruction only Treat any calls to these subs as an API call IsDebuggerPresent

Using Dataflow Analysis Identify register based indirect calls GetEnvironmentStringW def use

Handling Dynamic Pointer Updates Identify register based indirect calls A def to dword_41e308 is found Look for probable call to GetProcAddress earlier dword_41e304 has no static value to look up API Call to GetProcAddress def use

Evaluation Metrics Measuring analyzability Code-to-data ratio Use disassembler to separate code and data. Most successfully unpacked malware have code-to-data ratio over 50% API resolution success Percentage of API calls that have been resolved from the set of all call sites. Higher percentage implies more the malware is amenable to static anlaysis.

Graph Generation Call graph simplification Micro-ontology labeling Most malware contain hundreds of functions Remove nodes without APIs connecting inbound and outbound edges Micro-ontology labeling Bird’s eye view of malware instance Translate API functions into categories based on functionality Categories based on Microsoft’s Classifications Common Filesystem, Random, Time, Registry, Socket, File Management

Storm Worm Case Study Storm Worm: Bird’s Eye View (Semi-manually generated)

Storm Worm Case Study (2) Control Flow Graph: eDonkey Handler

Eureka Ontology Graph

Experimental Evaluation Evaluation using three different datasets Goat (packed benign executable) dataset 15 common packers Provides ground truth for what packer is used and what is expected after unpacking Spam malware corpus Honeynet malware corpus

Goat Dataset Packer Poly-Unpack Renovo Eureka Eureka-API Armadillo No Partial Yes 64% ASPack 99% ASProtect - ExeCryptor 2% ExeStealth 97% FSG 0% MEW

Goat Dataset Packer Poly-Unpack Renovo Eureka Eureka-API MoleBox No Yes 98% Morphine Partial 0% Obsidium 99% PeCompact Themida - UPX WinUPack Yoda 97%

Evaluation (ASPack)

Evaluation (MoleBox)

Evaluation (Armadillo)

Spam Corpus Evaluation Evaluation of a corpus of 481 executables Binaries collected at spam traps 470 executables successfully unpacked (over 97% success) 401 unpacked simply using heuristic unpacker Rest unpacked using statistical hypothesis test Most API references were successfully deobfuscated

Spam Corpus Evaluation (2) Packer Count Eureka Eureka-API Unknown 186 184 85% UPX 134 132 78% Virus 79 79% PEX 18 58% MEW 12 11 70% Rest(10) 52 46 83%

Spam Corpus Evaluation (3) Virus Family Count Eureka Eureka-API TRSmall 98 93% TRDldr 63 61 48% Bagle 67 84% Mydoom 45 44 99% Klez 77 78% Rest(39) 131 123

Honeynet Corpus Evaluation Evaluation of a corpus of 435 executables Binaries collected at SRI honeynet 178 out of 435 packed with Themida (only partially analyzable) Analysis of the 257 non-Themida binaries 20 did not execute on Win XP Eureka unpacks 228 / 237 remaining binaries High API resolution rates on unpacked binaries

Honeynet Corpus Evaluation (2) Packer Count* Eureka Eureka-API PolyEne 109 97% FSG 36 35 94% Unknown 33 29 67% ASPack 23 22 93% tELock 9 91% Rest(9) 27 24 62% *Includes all binaries except those packed with Themida

Honeynet Corpus Evaluation (3) Virus Family Count* Eureka Eureka-API Korgo 70 86% Virut 24 90% Padobot 21 82% Sality 17 96% Parite 15 Rest(19) 90 81 *Includes all binaries except those packed with Themida

Runtime Performance Evaluation of a corpus of 435 executables Binaries collected at SRI honeynet 178 out of 435 packed with Themida (only partially analyzable) Analysis of the 257 non-Themida binaries 20 did not execute on Win XP Eureka unpacks 228 / 237 remaining binaries High API resolution rates on unpacked binaries