Software Obfuscation Anirban Majumdar University of Trento

Software Obfuscation Anirban Majumdar University of Trento anirban@disi.unitn.it

March 25 2008 2 Previous Talk (Mariano) The state of the art in computer security research. The problem of mobile code and malicious hosts.

March 25 2008 3 Previous Talk (Mariano) The problem of software piracy and malicious reverse engineering … 2005 BSA reports USD34 bn loss per year to software firms due to piracy.

March 25 2008 4 Research Problem: Software Protection Valuable software is distributed in highly-portable intermediate language formats MSIL for.NET Bytecode for Java Intermediate code can be reverse engineered Non-malicious RE: for testing, integration, extending, … Malicious RE: for security attacking, piracy, Trojan Horse insertion, … How can we “harden” software so that it resists reverse engineering?

March 25 2008 5 Existing Protection Techniques 1. Hardware dongle Not viable. Retrofit hardware needed. 2. Server-side execution High bandwidth always-on requirement. Additional problems of network security – authorisation, authentication need to be taken care of. 3. Encryption Chicken-egg conundrum … the decryption routine is visible. Works only if entire decryption/execution takes place in hardware.

March 25 2008 6 Talk Outline Define obfuscation. Provide taxonomy of obfuscations with examples. Play two obfuscation games to defeat reverse engineering. Future of obfuscation.

March 25 2008 7 What is obfuscation? It is a software protection technique. Transforms the application into one that is functionally identical to the original but is more difficult to reverse engineer. Can never completely protect an application from malicious reverse engineering. Given sufficient time and resources, an adversary can reverse engineer any obfuscated code.

March 25 2008 8 Potential application domains Good ones … Obscure program logic. Hide ownership information (e.g. watermarks --- discussed by Mariano) Bad ones … Development of polymorphic virus or code that contains obfuscated malicious payload. Code Plagiarism!

March 25 2008 9 Defining Obfuscation Let P  P’ be a transformation from source program P to target program P’. P  P’ is an obfuscating transformation if P and P’ have the same observable behaviour; i.e. the following two conditions hold (Collberg and Thomborson): If P fails to terminate or terminates with an error, then P’ may or may not terminate. Otherwise, P’ must terminate and produce the same output as P. Two important conditions that need to be preserved: functionality – the obfuscated program should have the same input/output behaviour as the input program (semantics preserving transformation), and unintelligibility – the obfuscated program should be unintelligible to the adversary in some sense.

March 25 2008 10 Quality of Software Obfuscation Evaluated according to four criteria: Potency: How much obscurity it adds to the program (we can use Software Complexity Metrics to determine this.) Resilience: How difficult it is to break for an automatic deobfuscator (combination of programmer effort and deobfuscator effort). Stealth: How well obfuscated code blends in with the rest of the program (context-sensitive metric). Cost: How much computational overhead (time/space penalty) it adds to the obfuscated application (this can be measured but is probably the least important evaluation criteria).

March 25 2008 11 Goals of obfuscation … Ideal obfuscator (Ehud Barak, PhD, 2004):- Should simulate the “black box” property. Fails if there exists at least one program that cannot be obfuscated by this method; i.e. an adversary can learn something from an examination of the obfuscated version of this program that cannot be learned by merely executing the program repeatedly. Practical obfuscator (What we have now):- Use transforms such that the resources required for undoing them are too expensive for attackers.

March 25 2008 13 Taxonomy of Obfuscations Layout obfuscation: Changes or removes useful information from the IL without affecting real instructions. E.g. comment stripping, identifier renaming. Data Obfuscation: Targets data and data structures in the program. E.g. changing data encoding, splitting/merging arrays. Control-flow obfuscation: Affects the control-flow within the code. E.g. Reordering statements, introducing dummy control-flow.

March 25 2008 14 Layout Obfuscation Changes or removes useful information from the IL without affecting real instructions. E.g. comment stripping, identifier renaming. Used in commercial obfuscators like DashO for Java and Dotfuscator for MSIL … both from PreEmptive Corp.

March 25 2008 15

March 25 2008 16 Data Obfuscations Variable Encoding

March 25 2008 17 Data Obfuscations Variable splitting and merging Arrays can be split into several sub- arrays, two or more arrays can be merged into one bigger array, folded so as to increase the number of dimensions, or flattened to decrease the number of dimensions.

March 25 2008 18

March 25 2008 19 Control-flow Obfuscations Aggregation/De-Aggregation: The original control-flow logic is disturbed by coalescing unrelated methods or splitting related methods. E.g. DOJ (Design Obfuscator for Java) Method inlining, outlining, cloning, and loop transformations are also fall in this class. Ordering: This category performs reordering operations on statements, loops, and expressions to disturb the locality of related information. Spurious Computations: This type of obfuscation is done by modifying the real control-flow by adding spurious computation blocks. E.g. Opaque predicates

March 25 2008 20 The branch dispatcher model [Wang 2001 PhD]

March 25 2008 21 The branch dispatcher model [Wang 2001 PhD]

March 25 2008 22

March 25 2008 23

March 25 2008 24 Opaque Predicates An opaque predicate (  ): conditional expression  thus called predicate value is known to the obfuscator, value difficult for the adversary to deduce (by statically analysing the code)  thus called opaque The opacity property of predicates determines the resilience of control-flow transformations, i.e.  opaque a predicate   difficulty in determining its outcome by static analysis.

March 25 2008 25 Opaque Predicates  T /  F –  always evaluates to T/F (Opaquely T/F Predicate)  ? – may sometimes evaluate to T and sometimes to F. (Opaquely Unknown Predicate)

March 25 2008 26 Embedding of opaque predicates (Dummy Code insertion)

March 25 2008 27 Embedding of opaque predicates (Loop condition extension) i = 1; while (i < 100){ … i++; } Can be transformed into: i = 1; j = 100; while ((i < 100) && (j*j*(j+1)*(j+1)%4 == 0) T ){ … i++; j = j*i+3; }

March 25 2008 28 Opaque Predicates based on aliasing Aliasing occurs when two variables refer to the same memory location. In the presence of aliasing, inter-procedural static analysis is intractable. This intractability property of pointer aliasing can be used to construct opaque predicates. Construction based on the fact that it is impossible for approximate static analysers to detect all aliases all of the time. The basic idea: Construct a dynamic data structure and maintain a set of pointers on it. Make opaque predicates from these pointers. Insert code for manipulating these pointer locations, yet maintain the invariant condition.

March 25 2008 29 Opaque Predicates based on aliasing

March 25 2008 30 Opaque Predicates based on concurrency Parallel programs are more difficult to analyse than their sequential counterparts because of their interleaving semantics. Parallel semantics can be incorporated in an otherwise sequential program using threads. If asynchronous events dictate the scheduling policy of threads, a large amount of nondeterminism may be generated which can be used to construct opaque predicates.

March 25 2008 32 Obfuscatory Strength Evaluation through Reverse Engineering We do not know, in practice, how to arbitrarily generate sufficiently hard obfuscated problem instances such that all program analysis techniques would fail (e.g. give imprecise, unanalysable results, or be unscalable, run out of memory, crash, or never terminate). What sort of analysis tools are useful for the automated understanding of the code obfuscated with “computationally intractable” transforms? Can general purpose program analysis tools be used to assess the obfuscatory strength of aliasing transforms or do we need to develop customised analysis tools instead? Can we guarantee that all general tools of that category (and its improved versions) can “crack” any general instance of code obfuscated with a particular obfuscation?

March 25 2008 33 Program Slicing A reverse engineering technique often used to aid program comprehension. A slice consists of the program parts that potentially affect the values computed at a particular point. We will restrict ourselves just to backwards slices and output statements.

March 25 2008 34 Experimental Design We would like to restrict the usefulness of slicing for program comprehension. Use CodeSurfer to slice our programs. We slice our unobfuscated program and use this information to create obfuscations that are targetted to restrict the effectiveness of slicing.

March 25 2008 35 Adding dependencies We consider the nodes from the SDG that are left behind after slicing – we call such nodes the orphans. Add in obfuscations that create dependencies between the slicing variable and the variables contained within the orphans.

March 25 2008 36 A Particular Example As a running example, we will use the following method which calculates the sum and product of the first n positive integers.

March 25 2008 37 A Particular Example As a running example, we will use the following method which calculates the sum and product of the first n positive integers. The backwards slice from out( y ) is indicated in red.

March 25 2008 38 A Particular Example As a running example, we will use the following method which calculates the sum and product of the first n positive integers. The backwards slice from out( y ) is indicated in red. The goal is to include the orphans in the slice for y.

March 25 2008 39 Inserting a Bogus Predicate We can add a opaquely true (or false) predicate so that y appears to depend on x. We use the relationship:

March 25 2008 40 Inserting a Bogus Predicate We can add a opaquely true (or false) predicate so that y appears to depend on x. We use the relationship: Here’s the full method…

March 25 2008 41 Inserting a Bogus Predicate We can add a opaquely true (or false) predicate so that y appears to depend on x. We use the relationship: With the slice…

March 25 2008 42 Creating a Variable Encoding We can transform y so that the definition of y seems to depend on x. When we define x we also have to define y too.

March 25 2008 43 Creating a Variable Encoding We can transform y so that the definition of y seems to depend on x. When we define x we also have to define y too. Here’s the obfuscation….

March 25 2008 44 Creating a Variable Encoding We can transform y so that the definition of y seems to depend on x. When we define x we also have to define y too. With the slice…

March 25 2008 45 Adding a variable into the loop We can add a new variable into the loop that depends on x and y : Change the guard Initialise j so that the loop invariant is maintained.

March 25 2008 46 Adding a variable into the loop We can add a new variable into the loop that depends on x and y : Change the guard Initialise j so that the loop invariant is maintained. The new loop…

March 25 2008 47 Adding a variable into the loop We can add a new variable into the loop that depends on x and y : Change the guard Initialise j so that the loop invariant is maintained. With the slice…

March 25 2008 48 Another example Consider the program wc which counts the number of lines ( nl ), words ( nw ) and characters ( nc ) in a file. The backwards slice from nl.

March 25 2008 49 A Particular Example Consider the program wc which counts the number of lines ( nl ), words ( nw ) and characters ( nc ) in a file. The backwards slice from nl … Our goal is to include these orphans in the slice.

March 25 2008 50 An Example Obfuscation As an obfuscation, we add a bogus predicate (that is always false) to create dependencies.

March 25 2008 51 An Example Obfuscation As an obfuscation, we add a bogus predicate (that is always false) to create dependencies. This predicate uses the invariant:

March 25 2008 52 An Example Obfuscation As an obfuscation, we add a bogus predicate (that is always false) to create dependencies. The backwards slice from nl. Now we’ve included all of the orphans in the slice for nl.

March 25 2008 53 An Example Obfuscation As an obfuscation, we add a bogus predicate (that is always false) to create dependencies. We have also included the orphans of the slices for the other two output variables.

March 25 2008 54 Slicing Metrics

March 25 2008 55 Results for wordcount Measurements obtained from CodeSurfer

March 25 2008 56 Orphans and Residues

March 25 2008 57 Residue Metrics

March 25 2008 58 Table of Results

March 25 2008 59 Graph of Results

March 25 2008 60 State of the art … We have source code and IL obfuscators such as DashO and SandMark (PreEmptive, UAuckland, UArizona, …) We have been able to prove correctness of obfuscations (Majumdar and Drape). Have used opaque predicates to hide watermarks (Nagra). Have a few patents (Identifier renaming – Paul Tyma, DOJ – Mikhail Sosonkin, Opaque Predicates – Clark Thomboroson). But …

March 25 2008 61 … the problem is far for being over. We do not have good instance generators -- - How to automatically embed opaque predicates at “interesting” points or how to generate them? We don’t have black-box security … so cryptographers are not happy with us. We do not have a good theoretical model of security obfuscation provides. Do not have killer use cases.

March 25 2008 62

March 25 2008 63 Reverse Engineering Layout Obfuscation Tool: A Java refactoring tool called KABA which uses concept analysis [Snelting-Tip algorithm]. Concept lattices are natural inheritance structures and a natural application domain is in the understanding of class hierarchies for object-oriented languages. KABA uses the Snelting-Tip concept analysis algorithm in order to determine a behaviour preserving refactoring transform which is optimal with respect to a given set of instance usage.

March 25 2008 64 Class B is a subclass of A, redefines f() and accesses the inherited fields x, y. The main program creates two objects of type A and two objects of type B, and performs some field accesses and method calls.

March 25 2008 65 The concept lattice Lattices are marked with class members above, and with variables or objects below. The members above an element (i.e. a new class) define the new class’ members; variables below an element will obtain this element as their new type. The two objects of class B have different behaviour, as one calls g and the other calls h. Therefore, the original class B class is split into two unrelated classes. The two objects of original type A have related behaviour, as A2 accesses everything accessed by A1 and also A.f(). Therefore, the original class A is split into a class and a subclass. A1 only contains A.x but not A.y. A.z is not live and does not appear in the concept lattice.

March 25 2008 66

March 25 2008 67 Concept lattices for term1 and renamed term1

Software Obfuscation Anirban Majumdar University of Trento

Similar presentations

Presentation on theme: "Software Obfuscation Anirban Majumdar University of Trento"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Software Obfuscation Anirban Majumdar University of Trento

Similar presentations

Presentation on theme: "Software Obfuscation Anirban Majumdar University of Trento"— Presentation transcript:

Similar presentations

About project

Feedback