Polymorphic Malware Detection Connor Schnaith, Taiyo Sogawa 9 April 2012
Motivation “5000 new malware samples per day” --David Perry of Trend Micro Large variance between attacks Polymorphic attacks Perform the same function Altered immediate values or addressing Added extraneous instructions Current detection methods insufficient Signature-based matching not accurate Behavioral-based detection requires human analysis and engineering
Malware Families Classified into related clusters (families) Tracking of development Correlating information Identifying new variants Based on similarity of code Koobface Bredolab PoisonIvy Conficker (7 mil. Infected) Source: Carrera, Ero, and Peter Silberman. "State of Malware: Family Ties." Media.blackhat.com Web. 7 Apr
~300 samples of malware with 60% similarity threshold
Current Research Techniques for identifying malicious behavior Mining and clustering Building behavior trees Industry ThreatFire and Sana Security developing behavioral-based malware detection
Design challenges Discerning malicious portions of code o Dynamic program slicing o accounting for control flow dependencies Reliable automation o Must be able to be reliable w/o human intervention o Minimal false positives
Holmes: Main Ideas Two major tasks o Mining significant behaviors from a set of samples o Synthesizing an optimally discriminative specification from multiple sets of samples Key distinction in approach o "positive" set - malicious o "negative" set - benign o Malware: fully described in the positive set, while not fully described in the negative set
Main Ideas: behavior mining Extracts portions of the dependence graphs of programs from the positive set that correspond to behaviors that are significant to the programs’ intent. The algorithm determines what behaviors are significant (next slide) Can be thought of as contrasting the graphs of positive programs against the graphs of negative programs, and extracting the subgraphs that provide the best contrast.
Main ideas: behavior mining A "behavior" is a data dependence graph G = (V, E, a, B) o V is the set of vertices that correspond to operations (system calls) o E is the edges of the graph and correspond to dependencies between operations o a is the labeling function that associates nodes with the operations they represent o B is the labeling function that associates the edges with the logic that represents the dependencies
Main ideas: behavior mining A program P exhibits a behavior G if it can produce an execution trace T with the following properties o Every operation in the behavior corresponds to an operation invocation and its arguments satisfy certain logical constraints o the logic formula on edges connecting behavior operations is satisfied by a corresponding pair of operation invocations in the trace Must capture information flow in dependence graphs o two key characteristics the path taken by the data in the program security labels assigned to the data source and the data sink
Security LabelDescription NameOfSelf The name of the currently executing program IsRegistryKeyForBootLis t A Windows registry key lsiting software set to start on boot IsRegistryKeyForWindows A registry key that contains configuration settings for the operating system IsSystemDirectory The Windows system directory IsRegistryKeyForBugfix The Windows registry key containing list of installed bugfixes and patches IsRegistryKeyForWindows Shell The Windows registry key controlling the shell IsDevice A named kernel device IsExecutableFile Executable file
Main ideas: behavior mining Information gain is used to determine if a behavior is significant. A behavior that is not significant is ignored when constructing the dependency graph Information gain is defined in terms of Shannon entropy and it means gaining additional information to increase the accuracy of determining if a G is in G+ or G- Shannon entropy o H(G+ U G-) corresponds to the uncertainty that a graph G belongs to G+ or G- o partition G+ and G- into smaller subsets to decrease that uncertainty o process called subgraph isomorphism
Main ideas: behavior mining A significant behavior g is a subgraph of a dependence graph in in G+ such that: Gain(G+ U G-, g) is maximized Information gain is used as the quality measure to guide the behavior mining process Some non-significant actions can get passed as significant o these actions may or may not throw off the algorithm that determines if the program is malicious
Main ideas: behavior mining Significant behaviors mined from malware Ldpinch o Leaking bugfix information over the network o Adding a new entry to the system autostart list o Bypassing firewall to allow for malicious traffic Could say any program that exhibits all three of these behaviors should be flagged malicious o This is too specific of a statement i.Doesn't account for variations within a family ii.It is known that smaller subsets of behaviors that only include one of these actions could still be malicious iii.Need discriminative specifications
Main ideas: discriminative specifications Creates clusters of behaviors that can be classified into as characteristic subset o Program matches specification if it matches all of the behaviors in a subset o "Discriminative" in that it matches the malicious but not the benign programs
Main ideas: discriminative specifications Each set of subset of behaviors induces a cluster of samples o Malicious and benign samples are mined are organized into these clusters o Goal: find an optimal clustering technique to organize the malicious into the positive subset and the benign into negative subset
Main ideas: discriminative specifications Three part algorithm o Formal concept analysis o Simulated annealing o Constructing optimal specifications Formal concept analysis o O is a cluster of samples o A is the set of mined behaviors in O o A concept is the pair (A, O) Set of concepts: {c1, c2, c3,..., cN) Behavior specification: S(c1, c2, c3,..., cN)
Main ideas: discriminative specifications Formal Concept Analysis (continued) Begins by constructing all concepts and computes pairwise intersection of the intent sets of these concepts Repeated until a fixpoint is reached and no new concepts can be constructed When algorithm terminates, left with an explicit listing of all of the sample clusters that can be specified in terms of one or more mined behaviors Goal is to find {c1, c2, c3,..., cN} such that S(c1, c2, c3,..., cN) is optimal (based on threshold)
Main ideas: discriminative specifications Simulated annealing Probabilistic technique for finding approximate solution to global optimization problem At each step, a candidate solution i is examined and one of its neighbors j is selected for comparison The algorithm moves to j with some probability A cooling parameter T is reduced throughout process and when it gets to a minimum the process stops
Main ideas: discriminative specifications Constructing Optimal Specifications Threshold t, a set containing positive and negative samples, and a set of behaviors mined with the previous process Called SpecSynth o Constructs full set of concepts o Removes redundant concepts o Run simulated annealing until convergence, then return the best solution
Holmes: Mining an Clustering
Evaluation and Results: Holmes Used six malware families to develop specifications Tested final product against 19 malware families Collected 912 malware samples and 49 benign
Holmes Continued Experiments carried over varying threshold values (t) Demonstrates high sensitivity to system accuracy Perhaps only efficient for a specific subset of malware
Holmes Scalability Worst-case complexity is exponential Behaviors of repeated executions (Stration and Delf) took hours to analyze Scalability for Holmes is a nightmare! “scary and scaled”
USENIX The Advanced Computing Systems Association (Unix Users Group) 2009 article: automatic behavior matching o Behavior graphs (slices) o Tracking data and control dependencies o Matching functions o Performance evaluations Source: Kolbitsch, Clemens. "Effective and Efficient Malware Detection at the End Host." Usenix Security Symposium (2009). Web. 8 Apr
USENIX: Producing Behavior Graphs Instruction log o Trace instruction dependencies o Slicing doesn't reflect stack manipulation Memory log o Access memory locations Partial behavior graph of Netsky (Kolbitsch et al)
USENIX: Behavior Slices to Functions Use instruction and memory log to determine input arguments Identify repeated instructions as loops Include memory read functions We can now compare to known malware
Evaluation Six families used for development (mostly mass-mailing worm) Expanded test set
Performance Evaluation Installed Internet Explorer, Firefox, Thunderbird, Putty, and Notepad on Windows XP test machine Single-core, 1.8 GHz, 1GB RAM, Pentium 4 processor
USENIX Limitations Evading system emulator o USENIX detector uses Qemu emulator o delays o time-triggered behavior o command and control mechanisms Modifying algorithms behavior o A more fundamental change, but cannot be detected using same signatures End-host based system o Cannot track network activity
Questions/Discussion