RAID 2010 Hybrid Analysis and Control of Malware Barton P. Miller 1 Hybrid Analysis of Program Binaries 1 Kevin A. Roundy Computer Science Department
RAID Need for forensic analysis Malware attacks cost billions of dollars annually [1] 65% of users feel effect of cyber crime [2] 28 days to resolve an average cybercrime [2] 90% of malware resists analysis [3] 7a 77 0e 20 e9 3d e0 09 e8 68 c0 45 be 79 5e c0 73 1c a d8 6a d0 56 4b fe 92 malware binary Our approach analyze code before executing it CFG-based interface for instrumentation bring malware under analyst’s control [1] Computer Economics [2] Norton [3] McAfee. 2008
RAID 2010 malware binary 7a 77 0e 20 e9 3d e0 09 e8 68 c0 45 be 79 5e c0 73 1c a d8 6a d0 56 4b fe af 40 0c b6 f f5 07 b Malware analysis factory Hybrid Analysis of Program Binaries 3 SD-Dyninst code coverage instrumentation network call instrumentation Stack trace at 1 st network communication Control flow graph showing code coverage Defensive tactics report unpacked code overwritten code control flow obfuscations Trace of Win API calls
RAID 2010 storm worm Obfuscated control flow Hybrid Analysis of Program Binaries 4 Entry Point obfuscated control flow a0b0c0d e80300 e9eb045d4555c3 CALLJMP 40d00a459dd4f7 JMPPOPINCPUSHRET 40d00eebp 40d002 CALL ptr[eax] ? XOR eax,eax MOV ecx,*[eax] exceptionhandler ? handler-based ctrl flow unpacked code overwritten code obfuscated control flow handler-based ctrl flow
RAID 2010 storm worm Unpacked code Hybrid Analysis of Program Binaries 5 Entry Point 7a 77 0e 20 e9 3d e0 09 e8 68 c0 45 be 79 5e c0 73 1c a d8 6a d0 56 4b fe af 40 0c b6 f f5 07 b c 85 a5 94 2b 20 fd 5b 95 e7 c a d9 83 a1 37 1b 2f b c 22 8e obfuscated control flow handler-based ctrl flow unpacked code overwritten code
RAID 2010 Overwritten code Hybrid Analysis of Program Binaries 6 Upack packer 7a 77 0e 20 e9 3d e0 09 e8 68 c0 45 be 79 5e c0 73 1c a d8 6a d0 56 4b fe af 40 0c b6 f f5 07 b c 85 a5 94 2b 20 fd 5b 95 e7 c a d9 83 a1 37 1b 2f b c 22 8e Entry Point obfuscated control flow handler-based ctrl flow unpacked code overwritten code
RAID 2010 Factory results for Conficker A 7 initial bootstrap code packed payload Hybrid Analysis of Program Binaries
RAID 2010 Hybrid Analysis of Program Binaries Factory results for Conficker A 8 API func non executed block static block unpacked block
RAID 2010 Factory results for Conficker A 9 Hybrid Analysis of Program Binaries Stack-walk of Conficker’s communications thread Frame pc=0x7c func: DbgBreakPoint at 7x901230[Win DLL] Frame pc=0x10003c83 func: DYNbreakPoint at 0x100003c70[instrument.] Frame pc=0x100016f7 func: DYNstopThread at 0x [instrument.] Frame pc=0x71ab2dc0 func: select at 0x71ab2dc0[Win DLL] Frame pc=0x401f34 func: nosym1f058 at 0x41f058[Conficker] Instrument select and perform a stack-walk
RAID 2010 Outline Hybrid Analysis of Program Binaries 10 R.W. Par. Related work Hybrid analysis algorithm Parsing Dynamic analysis components Results D.A. H.A. Res.
RAID 2010 Non-Defensive Binary Analysis 11 Hybrid Analysis of Program Binaries program binary Process Dynamic instrumenter Static tool static code CFG un-controlled executionpre-execution R.W. parsing value-set analysis binary slicing e.g., Dyninst, CodeSurfer-x86 CFG-based API for instrument- ation e.g., ATOM, Vulcan (static) Dyninst (dynamic)
RAID 2010 Static tool analysis resistant binary Hybrid Analysis of Program Binaries 12 obfuscated code static code un-controlled execution Dynamic instrumenter dynamic code Process pre-execution CFG R.W. Non-Defensive Binary Analysis parsing value-set analysis binary slicing e.g., Dyninst, CodeSurfer-x86 CFG-based API for instrument- ation e.g., ATOM, Vulcan (static) Dyninst (dynamic)
RAID 2010 un-controlled execution analysis resistant binary Dynamic instrumenter 13 Hybrid Analysis of Program Binaries obfuscated code static code dynamic code Process pre-execution post-execution analysis CFG Trace analysis Trace R.W. Non-Defensive Binary Analysis Instruction- filter based API for instrument- ation e.g.: PIN, Valgrind, DynamoRIO, DIOTA e.g.: Madou et al Quist, Liebrock. 2009
RAID 2010 un-controlled execution Our approach 14 Hybrid Analysis of Program Binaries SD-Dyninst obfuscated code static code analysis resistant binary Parser pre-execution Dynamic instrumenter Parser (source,dest) CFG dynamic code Process R.W. CFG-based API for instrument- ation
RAID 2010 Outline 15 Hybrid Analysis of Program Binaries Related work Hybrid analysis algorithm Parsing Dynamic analysis components Results D.A. Res. R.W. P. H.A.
RAID 2010 Code discovery algorithm 16 Hybrid Analysis of Program Binaries Hybrid algorithm: ? ? Parse from known entry points Instrument control flow that may lead to new code Resume execution H.A. instrumentexceptionoverwrite CALL ptr[eax] DIV eax, 0
RAID 2010 Code discovery algorithm 17 Hybrid Analysis of Program Binaries ? Parse from known entry points Instrument control flow that may lead to new code Resume execution ? Hybrid algorithm: H.A. instrumentexceptionoverwrite CALL ptr[eax] DIV eax, 0
RAID 2010 Code discovery algorithm 18 Hybrid Analysis of Program Binaries ? Parse from known entry points Instrument control flow that may lead to new code Resume execution ? Hybrid algorithm: H.A. instrumentexceptionoverwrite CALL ptr[eax] DIV eax, 0
RAID 2010 Code discovery algorithm 19 Hybrid Analysis of Program Binaries ? Parse from known entry points Instrument control flow that may lead to new code Resume execution ? Hybrid algorithm: H.A. instrumentexceptionoverwrite CALL ptr[eax] DIV eax, 0
RAID 2010 Code discovery algorithm 20 Hybrid Analysis of Program Binaries Parse from known entry points Instrument control flow that may lead to new code Resume execution ? Hybrid algorithm: H.A. instrumentexceptionoverwrite CALL ptr[eax] DIV eax, 0
RAID 2010 Outline 21 Hybrid Analysis of Program Binaries Related work Hybrid analysis algorithm Parsing Dynamic analysis components Results D.A. H.A. Res. R.W. P.
RAID 2010 Standard control-flow traversal [1] start from known entry points follow control flow to find code New conservative assumption un-analyzed calls (pointer-based) may not return New stack tamper detection backwards slice at return instruction call 40d00a pop ebp inc ebp push ebp ret garbage 22 Hybrid Analysis of Program Binaries Accurate parsing P. [1] Sites et al., Binary Translation
RAID 2010 Outline 23 Hybrid Analysis of Program Binaries Related work Hybrid analysis algorithm Parsing Dynamic analysis components Results H.A. Res. R.W. P. D.A.
RAID Invalid control transfers Indirect jumps/calls Abnormal return instructions push eax ret call Invalid Region call ptr [eax] ? jmp eax ? Instrumentation-based discovery D.A. Hybrid Analysis of Program Binaries
RAID 2010 ? call ptr[eax] findTarget (ptr[eax]) SD-Dyninst process findTarget (ptr[eax]) new target 0x402d8a resume execution call ptr[eax] Instrumentation-based discovery D.A. 25 Hybrid Analysis of Program Binaries
RAID SD-Dyninst Overwritten code discovery Overwrite Detection Possible strategies Check each executed instruction for changes [1] Monitor writes to code Page-level write detection [2] Remove write permissions from code pages Write to code causes exception Handle exception [1] Royal et al. PolyUnpack. ACSAC ’06 [2] Maebe, De Bosschere. AADEBUG ’03 code write handler write RWE R E RWER E D.A. Hybrid Analysis of Program Binaries
RAID 2010 Hybrid Analysis of Program Binaries 27 write SD-Dyninst Overwritten code discovery When to update Cases to consider large incremental overwrites writes to data writes to own page R E code write handler CFG update routine D.A.
RAID 2010 Hybrid Analysis of Program Binaries 28 SD-Dyninst Overwritten code discovery When to update Cases to consider large incremental overwrites writes to data writes to own page Delaying the update until write routine terminates R E CFG update routine code write handler D.A. write
RAID 2010 Delayed updates Two components 1.Handle overwrite signal a)instrument write loop b)copy overwritten page c)restore write permissions 2.Update CFG when writes end a)remove overwritten and unreachable blocks b)parse at entry points to overwritten regions c)remove write permissions Hybrid Analysis of Program Binaries 29 SD-Dyninst Overwritten code discovery R E code write handler CFG update routine D.A. write Delayed updates Two components 1.Handle overwrite signal a)instrument write loop b)copy overwritten page c)restore write permissions 2.Update CFG when writes end a)remove overwritten and unreachable blocks b)parse at entry points to overwritten regions c)remove write permissions cb RWE cb R E
RAID 2010 Hybrid Analysis of Program Binaries 30 SD-Dyninst Overwritten code discovery Delayed updates Two components 1.Handle overwrite signal a)instrument write loop b)copy overwritten page c)restore write permissions 2.Update CFG when writes end a)remove overwritten and unreachable blocks b)parse at entry points to overwritten regions c)remove write permissions R E RWE code write handler CFG update routine cb D.A. write cb
RAID 2010 Exception State eip eip 402d8a 31 xoreax,eax movecx,*[eax] pusheax... Operating System Handler-based CF obfuscations [1] [1] Popov, Debray, Andrews. Usenix Danekhar Monitored Program D.A. access violation handler … mov *[ebp+10],eax mov 402d8a,edx mov edx,*[eax+b8] Hybrid Analysis of Program Binaries
RAID 2010 Exception State eip eip 402d8a 32 xoreax,eax movecx,*[eax] pusheax... Operating System [1] Popov, Debray, Andrews. Usenix Danekhar Monitored Program D.A. access violation handler … mov *[ebp+10],eax mov 402d8a,edx mov edx,*[eax+b8] Resolving handler-based CF access violation handler … mov *[ebp+10],eax mov 402d8a,edx mov edx,*[eax+b8] SD-Dyninst instrument exit analyze code at new target Hybrid Analysis of Program Binaries
RAID Outline Related work Hybrid analysis algorithm Parsing Dynamic analysis components Results R.W. P. D.A. Res. H.A. Hybrid Analysis of Program Binaries
RAID 2010 yes 34 Fully analyzed packed programs Packer Malware market share [1] 0.13%MEW 0.17%WinUPack 0.33%Yoda's Protector 0.37%Armadillo 0.43%Asprotect 1.26%FSG 1.29%Aspack 1.74%nPack 2.08%Upack 2.59%PECompact 2.95%Themida 4.06%EXECryptor 6.21%PolyEnE 9.45%UPX 0.89%Nspack Res. Self check- summing yes Self- modifying yes Exception- based ctrl yes Obfuscated yes [1] Packer (r)evolution. Panda Research, Two-month average Feb-March 2008.
RAID 2010 Self-checksumming techniques Hybrid Analysis of Program Binaries [1] Packer (r)evolution. Panda Research, Two-month average Feb- March Fully analyzed packed programs Packer Malware market share [1] 0.13%MEW 0.17%WinUPack 0.33%Yoda's Protector 0.37%Armadillo 0.43%Asprotect 1.26%FSG 1.29%Aspack 1.74%nPack 2.08%Upack 2.59%PECompact 2.95%Themida 4.06%EXECryptor 6.21%PolyEnE 9.45%UPX SD- Dyninst yes 0.89%Nspackyes Time to unpack uninstrumented times are about.02 secs unoptimized overwrite detection expensive overwrite detection Res. 35
RAID 2010 Instrumentation costs 36 Hybrid Analysis of Program Binaries Res. Packer Pre-payload execution timeInstrumented locations SD- DyninstRenovo Saffron Intel-PIN Ether Unpack SD- DyninstRenovo Saffron Intel-PIN UPX ,2784,526 Aspack4.45fail ,0454,141 FSG ,82231,854 WinUpack ,82632,945 MEW4.06fail ,18635,466
RAID 2010 Conclusion 37 Hybrid Analysis of Program Binaries Analysis before execution allows for Understanding & control of before execution Selective monitoring Build-your-own analysis factory Ongoing work Handling self-checksumming code Releasing Dyninst w/ SD-Dyninst inside