Eureka: A Framework for Enabling Static Malware Analysis the 13 th European Symposium on Research in Computer Security (ESORICS) conference 2008 WANG Zhi
Outline Overview of Generic Unpacker 1 System Call Level Heuristic 2 Statistics-Based Unpacking 3 Evaluation Metrics 4
Overview of Unpacker Static analyses: decompile and analyze the logical structure, flow, and data stored within the binary itself. Dynamic analyses: monitor the behavior of the malware binary at runtime. Fine-grained monitor (Instruction-level) Coarse-grained monitor (page-level)
Generic Automatic Unpackers PolyUnpackRenovoOmniUnpack Eureka Instruction-level Page-levelSystem call level Model-base trigger Heuristic trigger Heuristic and Statistical trigger slow fast The variability in unpacking strategies come from the granularity of tracking unpacking behavior.
Eureka Coarse-grained execution tracing NtTerminateProcess NtCreateProcess Eureka Statistical bigram analysis bigram.
Coarse-grained Execution Tracing Eureka uses the event of program exit as a trigger. NtTerminateProcess implies that the unpacked malicious payload has been successfully decrypted. A large fraction of current malware use a new process (NtCreateProcess) to execute the unpacked malicious payload.
Problems Not all malware exit and keep an executing version resident in memory Packers can make spurious event of creating new process. Malware authors can simply avoid exiting the malware process. The above two simple heuristics may work for a large fraction of malware today( as much as 80%), it may not be the same for future malware.
Statistical bigram analysis Mining statistical patterns in x86 code Use simple n-gram analysis Use the IDA Pro to extract regions from executable that were marked as functions. Looking for the most common bigrams ( opcode pairs or 2-byte opcodes) and space bigrams( byte pairs separated by 1 or more bytes) Found FF 15(call), FF 75(push), E and E8---FF are prevalent in x86 code.
Occurrence summary of bigrams calcexplorernotepadpingshutdown FF 15(call) FF 75(push) E8---FF(call) E8---00(call)
Bigram Counts Bigram counts during execution of goat file packed with Aspack
Bigram Counts Bigram counts during execution of goat file packed with Molbox
Bigram Counts Bigram counts during execution of goat file packed with Armadillo
Bigram Counts There are consistent and significant shifts in the bigram counts. The simple bigram counting approach had over a 95% success rate in distinguishing between packed and unpacked malware instance.
Evaluation Metrics Code-to-data ratio An observable difference between packed code and unpacked code is the amount of identifiable code and data found in the binary Use IDA Pro to identify valid code sequences. In IDA Pro, data are represented by db, dw or dd. In packed executables, the ratio is below 3%. In unpacked executables, the ratio is above 50%.
Code-to-data ratio Packed Unpacked
Code-to-data ratio Grey area stand for data Blue area stand for code Packed notepad.exe memory space Original notepad.exe memory space