IDA and obfuscated code Hex-Rays Ilfak Guilfanov
2 Presentation Outline Is obfuscated code a problem for IDA Pro? IDA Pro expects nice proper code A lost battle? At the first sight, yes Solutions exist They are numerous... Future development Your feedback Online copy of this presentation is available at
3 Sample obfuscated code IDA is a static analysis tool and it makes many assumptions about the input code When these assumptions are violated, the analysis goes wrong An extremely simple case, call instructions are expected to return to the next instruction: problem The solution will be presented later...
4 Obfuscation categories Redundancy Blow the code size: code cleaning is necessary Camouflage Hide & seek: the seeker is to win Anti-debugger tricks Tricks can be learned even by old dogs Since it is “just” obfuscation, a determined reverse engineer will eventually overcome it
5 Redundancy Instructions with no effect Useless jumps Complex computations with a constant result Code duplication
6 Instructions with no effect In fact CL is zero
7 Instructions with no effect - countermeasures Replace them by 'nop's Collapse regions of useless instructions into one line (select useless instructions, then View, Hide) Ideally, a plugin to clean up the code would be nice. The Hex-Rays decompiler ignores useless instructions because it simply removes all dead code but it can not handle obfuscated code well – expect improvements in this direction
8 Useless jumps Text view is pretty useless:
9 Useless jumps Graph view is slightly better: A plugin to clean the graph and combine adjacent nodes would be really useful (can be done without modifying the database)
10 Graph view and plugins Graphs generated by IDA can be modified by a plugin on the fly – just hook to grcode_changed_graph event This allows for improving the graph. Some ideas: Combine sequential nodes into one Hide dead code paths Remove dead edges Add annotations to graph nodes/edges Automatically recognize and collapse patterns (e.g.strlen) Local optimization (within a node; constant folding, etc) All this can be really useful for obfuscated code!
11 Constant result calculations Some constant calculations can be easily handled Ctrl-R
12 When there are too many offsets... The answer is obvious – write a script or a plugin :) Here's very simple one-line script: OpOffEx(here, 1, REF_OFF32|REFINFO_NOBASE, -1, EBP, 0) To make your life even easier, you may assign a script to a hotkey, press Shift-F2 and enter: This trick and many others are explained on AddHotkey("w", "make_ebp_offset"); } static make_ebp_offset() { OpOffEx(here, 1, REF_OFF32|REFINFO_NOBASE, -1, EBP, 0);
13 What if there are thousands of such offsets?... Improve the script to check all instructions for the desired pattern. Here's how to organize a loop over all instructions: auto ea, ea2; ea2 = MaxEA(); for ( ea=MinEA(); ea < ea2; ea=NextHead(ea, ea2) ) { if ( !isCode(GetFlags(ea)) ) continue; if ( GetMnem(ea) == "mov" && GetOpnd(ea, 0) == "ebp" ) Message("%a: found mov ebp!\n", ea); }
14 What if these offsets appear and vanish dynamically? Well, then you have to create a plugin. It would: Recognize the desired pattern Modify the database (create an offset, code, add cmt, etc) Such plugins are fully automatic They hook to analysis events (frequently to custom_emu) This is the most powerful technique but, alas, it requires DLL programming in C and using the SDK Just three wishes for your plugins: Maybe a switch to turn your plugin off is a good idea Try to be user-friendly (for example, check if there is a comment before calling set_cmt; otherwise you may overwrite a user-defined comment) Do not exit to OS in the case of errors
15 Constant calculations – some ideas Create a script or plugin to: Add calculation results as comments (what about a script that traces the application and adds register values as comments for each instruction?) Modify the database and simplify instructions
16 Camouflage Opaque predicates Proprietary virtual machine Encryption/compression Message-driven systems No direct references – PIC (position independent code) code Hidden execution flow using SEH Rootkit techniques Hidden entry point (TLS callbacks, entry point in the resources section or in the header)
17 Opaque predicates The definition says that opaque predicate is a predicate (an expression that evaluates to either "true" or "false") for which the outcome is known by the programmer a priori, but which, for a variety of reasons, still needs to be evaluated at run time In fact, some expressions evaluate to any integer value: GetLastError returns 0x57 (Invalid Parameter)
18 Opaque predicates They may come in many varieties. Since we can not determine the outcome statically, we have to find it out ourselves and Inform IDA about the predicate outcome Prune dead code paths and simplify the code Working on graph view or pseudocode is easier Automate this? How? Future versions of IDA/Hex-Rays will offer some solutions Interactivity and extendibility helps
19 Proprietary virtual machine Many implementations use this obfuscation method Requires reverse engineering the virtual machine Examples: Themida & Code Virtualizer ( Various malware In general case, building a processor module for the VM is required Let me show you a simple case
20 Bagle malware case This mass mailer contains the following code sequence:
21 Bagle - opcodes Opcode handlers are very simple, I renamed them:
22 Bagle – opcode table After renaming all handlers the opcode table was:
23 Bagle – create opcode enumeration The following script created a enumeration for all VM opcodes based on the handler names:
24 Bagle – enumeration ready We can use this enumeration in the disassembly now Just declare an array of bytes and convert them to VM_CODES All this without quitting IDA (in fact, I was in the middle of a debugging session since there was another layer of protection before the VM)
25 Bagle – virtual machine readable Create an array of bytes, declare them as VM_CODES:
26 Bagle – VM logic visible The logic of the VM program became visible but there were immediate constants in the code that required manual intervention:
27 Bagle – VM decoding automated The following script solve the problem:
28 Bagle – comfortable analysis of VM After assigning a hotkey to the previous script, it was almost the same as having a processor module for the VM However, another level of deobfuscation is required (0x63FE34B2 ^ 0x9C01CB4D = 0xFFFFFFFF)
29 VM - summary We have to Analyze VM opcodes Give them meaningful, descriptive names In simple cases, simple enumeration will do the job In complex cases, a processor module has to be developed It is not _that_ difficult after all ;) Rolf Rolles created a processor module for a VM:
30 Executable packing Plethora of packing methods, good and bad Manual unpacking is always possible; automatic unpacking would be ideal There are sample scripts and plugins in IDA uunp – proof of concept unpacker plugin, exists as an IDC script as well unpack – another sample unpacker IDA stayed away from this arms race There are many other solutions available (unpackers, process dumpers, etc)
31 Executable packing - approaches Static analysis too time consuming requires tedious manual work Dynamic analysis (debugger) much faster requires special sandboxed environment vulnerable to anti-debugger tricks Code emulation a good idea any widespread emulator will be attacked emulation imperfections are a problem No ideal solution...
32 Encryption Methods vary from simple XOR encryption to serious encryption schemes like AES, Blowfish, etc Since the key must be present to run the executable, the strength of the encryption method does not matter Ideally we just let the application decrypt itself and then take a memory snapshot If only part of the executable is decrypted at a time, then we need to automate the process of taking memory snapshots
33 Position independent code No fixed addresses means no xrefs Analysis is harder but user-defined offsets can help
34 Anti-debugging tricks I'm sure you know better since you are the practitioners :) IDA related: Its default settings are not good for hostile code debugging Exceptions are handled by the debugger – change it in the debugger settings Just two simple methods
35 Use tracing to find anti-debugging tricks Tracing is slow but it may be used to find why/when/how the process misbehaves Sample trace log from a naïve code:
36 Simple method to neutralize found tricks Use “conditional” breakpoint to neutralize tricks encountered while single-stepping The breakpoint condition for the call instruction is ip=ip+2 Breakpoint conditions may call all defined IDC functions (including user-defined ones) – can be used for logging and changing the application behavior
37 Debugger – current state IDA debugger advantages The annotated database is available during debugging All facilities continue to work: FLIRT signatures, function prototypes and argument names, structures, enumerations, your scripts and plugins, etc... Scriptable Available on multiple platforms (+remote debugging) Shortcomings Slow operation Multithreaded applications poorly handled Only application level debugging is available We continue to work on the shortcomings Future versions will be more fit for hostile code analysis
38 Debugger - ideas A debugger plugin to configure the 'stealth' mode Exceptions are passed to the application Calls to IsDebuggerPresent, NtSetInformationThread and similar functions are intercepted Emulating debugger module A 'stealth' debugger module Do not use the standard debugger interface (CreateProcess/WaitForDebugEvent) Inject a debugger DLL into the process and communicate with it (the must-have functionality is breakpoint handling and memory access) Higher level debugging Skip hidden code areas, group nodes in the graph view Source level debugging using the pseudocode view
39 Summary Obfuscation methods vary, no single receipt for all cases The key is to be able to represent the code nicely on the screen The problem is generic: what to do if IDA displays things not the way I want? The answer is: modify the output! Use interactive commands, menus, etc Represent data in meaningful way Hide irrelevant information Patch the database and simplify it Create scripts, plugins, processor modules to avoid routine work
40 The obfuscating call instruction The function returns a few bytes further that it would normally:
41 Example: solution to obfuscating call The idea: intercept emulation of calls to “ex_obfuscating” and create correct xrefs Just a few lines of code (unfortunately, a plugin) Can be made more complex if necessary The source code of the sample plugin can be found at See the next slide for the essential part of the plugin
42 Plugin to handle weird call instructions
43 Deobfuscated code Note the arrow on the left side of the listing Graph could be simplified further by a plugin
44 The “thank you” slide Thank you for your attention! Questions?