Download presentation
Presentation is loading. Please wait.
Published byBernard Grant Modified over 9 years ago
1
Códigos y Criptografía Francisco Rodríguez Henríquez Software Security Through Code Obfuscation
2
Códigos y Criptografía Francisco Rodríguez Henríquez Outline Introduction –Definition –Problem Statement Code Obfuscation Process Transformations –Metrics for Obfuscation Transformations –Classification of Transforms De-Obfuscation –Commonly Employed Techniques The Power of Obfuscation
3
Códigos y Criptografía Francisco Rodríguez Henríquez Why Code Obfuscation? Intellectual Protection Legal protection Obfuscation Encryption Technical Protection Server-side Execution Trusted native code
4
Códigos y Criptografía Francisco Rodríguez Henríquez Justification If Bob is able to retrieve Alice’s original source, he can intercept proprietary information such as data structures, algorithms, etc. Source Object Code obfuscate compile Obfuscated Object code Server Client Obfuscated Object code De-obfuscate De-compile Executer Object Code Source AliceBob
5
Códigos y Criptografía Francisco Rodríguez Henríquez Code Obfuscation Process Determining Potency vs. Cost: –Potency: The level of obfuscation applied to the code. –Cost: Maximum execution time/space that the obfuscated code adds to the application. –In order to determine which level of obfuscation we desire, we must first analyze how much we are willing to forgo in program efficiency; hence the relation: Potency vs. Cost.
6
Códigos y Criptografía Francisco Rodríguez Henríquez Source Pre-Processing –Much like a compiler, this step gathers information about the application in order to determine which transformations will lead to the desired level of obfuscation. –Types of Information Gathered: Symbol Table Data-Flow Data-Dependence Language Constructs Programming Idioms
7
Códigos y Criptografía Francisco Rodríguez Henríquez –Source goes through a number of pre-defined Obfuscating Transformations until the desired relation of potency vs. cost is reached. –Definition of an Obfuscating Transformation: Let P P’ be a transformation of a source program P into a target program P’. P P’ is an obfuscating transformation if P and P’ have the same observable behavior. More precisely, in order for P P’ to be a legal transformation the following must hold: –If P fails to terminate or terminates with an error, then P’ may or may not terminate. –Otherwise, P’ must terminate and produce the same output as P. –We classify an obfuscation transformation according to the type of information it targets and its level of potency. Transformations.
8
Códigos y Criptografía Francisco Rodríguez Henríquez Measure of Potency Measure of Resilience Measure of Execution Cost Formal Definition of the Quality of an obfuscating transform: –Tqual(P) = [Tpot(P), Tres(P), Tcost(P)] Evaluation of Obfuscating Transforms (3 Metrics)
9
Códigos y Criptografía Francisco Rodríguez Henríquez Let T be a behavior-conserving transformation s.t. P T P’ transforms a source program P into a target program P’. Let E(P) be the complexity of P, as defined by known software complexity metrics. –Tpot(P) is defined as E(P’)/E(P) – 1. –T is a potent obfuscating transformation if Tpot > 0. From here, we will define the potency of a transform as. –In order for a transform to be sufficiently potent, it should: Increase overall program size and introduce new classes/methods Introduce new predicates and increase the nesting level of conditional/looping constructs Increase the number of method arguments and inter-class instance variable dependencies Increase the height of the inheritance tree Increase long-range variable dependencies Measure of Potency
10
Códigos y Criptografía Francisco Rodríguez Henríquez Resilience (according to the Merriam-Webster): –1 : the capability of a strained body to recover its size and shape after deformation caused especially by compressive stress 2 : an ability to recover from or adjust easily to misfortune or change –A transform is potent if it manages to confuse a human reader, but it is resilient if it confuses an automatic de-obfuscator. –We base resiliency primarily on the scope of effect due to a transform. That is, if a transform effects an entire program it is more likely to provide is with a more resilient program. –Resiliency is measured from trivial to one-way, with one-way defining a transformation that gives code P’ from which it is impossible to recover P. Measure of Resilience
11
Códigos y Criptografía Francisco Rodríguez Henríquez –The third component in describing the quality of a transformation is that of cost, which is based on the execution time/space penalty which is incurred upon an obfuscated application after transformation. –Cost is measured on a four-point scale: Dear: if executing P’ requires exponentially more resources than P Costly: if executing P’ requires O(n^p), p > 1 more resources than P Cheap: if executing P’ requires O(n) more resources than P Free: if executing P’ requires O(1) more resources than P Measure of Execution Cost
12
Códigos y Criptografía Francisco Rodríguez Henríquez Trivial but irreversible transformations Examples: –Formatting Removal: Tqual(P) = [low, one-way, free] Removes source code formatting such as tabulation and carriage returns. This is a free yet un-reversible transformation. Code: voltage = current * resistance; power = (voltage * voltage) * resistance; voltage=current*resistance;power=(voltage*voltage)*resistance ; Classification of Transformations: Layout Transformations
13
Códigos y Criptografía Francisco Rodríguez Henríquez –Scrambling Identifier Names: Tqual(P) = [medium, one-way, free] Removes pragmatic information inherent in identifier names thus providing a higher level of potency; however, once transformed it cannot be undone. Code: voltage=current*resistance;power=(voltage*voltage)*resistance; v4=i12*r15; p6=(v4*v4)*r15; Classification of Transformations: Layout Transformations
14
Códigos y Criptografía Francisco Rodríguez Henríquez Purpose is to obscure the control flow of the source application –Control Aggregation Transformations break up computations that logically belong together or merge computations that do not. –Control Ordering Transformations randomize the order in which computations are carried out. –Control Computation Transformations insert new redundant or dead code, or make algorithmic changes. Transformations which alter the flow of control have the largest computational overhead. Classification of Transformations: Control Transformations
15
Códigos y Criptografía Francisco Rodríguez Henríquez The real challenge in designing control-altering transformations is to make them cheap and resistant to attack from de-obfuscation. To accomplish this, many transformations are based upon opaque variables and opaque predicates. A variable V is opaque if it has some property q which is known a priori to the obfuscator, but is difficult for a de-obfuscator to deduce. Likewise, a predicate P (boolean expression) is opaque if a de- obfuscator can only deduce its outcome with great difficulty, while this outcome is known to the obfuscator. Creation of Opaque Variables and Predicates which are difficult for a de-obfuscator to crack yet use little resources is a major area of research within Code Obfuscation, and is the key to highly resilient control transformations. Opaque Predicates
16
Códigos y Criptografía Francisco Rodríguez Henríquez Examples of applied Control Aggregation Transformations: –Cloned Methods Example: A Reverse Engineer, when trying to understand the purpose of a subroutine, will often examine its signature and body as well as the different environments in which it is called. To obfuscate this, we apply a transform which obscures a method’s call sites. In doing this, we make it appear that different routines are being called. We create several different versions of a method by applying various transformations to the original code. At runtime we use different predicates to select which version to run. Aggregation Transformations
17
Códigos y Criptografía Francisco Rodríguez Henríquez Aggregation Transformations
18
Códigos y Criptografía Francisco Rodríguez Henríquez In object-oriented languages such as Java, control is organized around data structures rather than the reverse. Therefore, the most important part of reverse engineering such languages is to recover their data structures. Aggregation Transforms are used to aggregate data in arrays and objects. Example: Restructuring Arrays Next we see a number of transformations performed to obscure an array. First, we attempt to split an array into several sub-arrays [statements (1-2)]. We then merge two arrays into one array [statements (3-5)]. Folding an array increases its number of dimensions [statements (6-7)]. Finally, we show the concept of flattening an array thus reducing its number of dimensions [statements (8-9). Performing splitting and folding greatly increases the complexity of our array structures, while merging and flattening decreases the complexity. The purpose of this is to introduce structure to a program where little existed before, and remove structure where it once existed. Therefore, the obscurity of the program is greatly increased. Aggregation Transformations
19
Códigos y Criptografía Francisco Rodríguez Henríquez Aggregation Transformations
20
Códigos y Criptografía Francisco Rodríguez Henríquez (a) Next, we see a Loop Blocking transformation applied to the given loop. Loop Blocking is the process in which we aim to improve the cache behavior of a loop by breaking up the iteration space such that the inner loop fits into the cache space. (b)Here we apply the concept of Loop Unrolling, during which we replicate the body of the loop one or more times. If we know the loop bounds at compile time, we can unroll the loop in its entirety. (c)Loop Fission is applied in this example. Here we aim to turn a loop with a compound body into several loops of the same iteration space. All three types of Loop Transformations increase the source applications total size and number of conditions, while the first transformation also introduces extra nesting. When we use these methods in isolated circumstances, they provide us with little resilience. However, when applied in serial, the resilience of the total transformation increases dramatically thus requiring significant analysis by a de-obfuscator. Loop Transformations
21
Códigos y Criptografía Francisco Rodríguez Henríquez Loop Transformations
22
Códigos y Criptografía Francisco Rodríguez Henríquez Example of an applied Control Computation Transformation (Inserting Dead or Irrelevant Code): –A) We insert an opaque predicate Pt into S (= S1…Sn), essentially splitting it up. This predicate is irrelevant because it will always evaluate to True. One possible predicate to use would be an if-statement such as: if (1 < 5) ; else ; –B) We again break S into two halves, which creates two different obfuscated versions Sa and Sb.These are created by applying various computational transforms to the second half of S. Therefore, it becomes not directly obvious to a reverse engineer that Sa and Sb perform the same function. We use a predicate P? to select between the two at runtime. Computation Transformations
23
Códigos y Criptografía Francisco Rodríguez Henríquez C) Finally, we perform a function similar to (B), but we introduce a bug into Sb and make sure that the predicate Pt always evaluates to Sa. Thus, de-obfuscation of Sb would lead to incorrect and non-functioning source code. Computation Transformations
24
Códigos y Criptografía Francisco Rodríguez Henríquez Aim to obscure the data structures used in the source application. Most important for keeping proprietary structures hidden to a Reverse Engineer. Storage Transformations: –Attempt to choose an unnatural storage class for dynamic as well as static data, thus making it difficult for a de-obfuscator to determine the type of data stored. Encoding Transformations: –Attempt to choose unnatural encoding for common data types.. Data Transformations
25
Códigos y Criptografía Francisco Rodríguez Henríquez Loop Transformations Example: Change Encoding Here we encode a simple variable i by transforming it into: i’ = c1 * i + c2 where c1 and c2 are constants. Below, we choose c1 to be a power of 2 for efficiency, and let c1 = 8, c2 = 3. By making this transformation, we add a small amount of execution time, while obfuscating the original purpose of i.
26
Códigos y Criptografía Francisco Rodríguez Henríquez Ordering Transformations Randomize the order in which data structures are declared in a source application. Particularly, here we aim to randomize the order of methods and instance variables within classes and formal parameters within methods. Example: Opaque Encoding Function
27
Códigos y Criptografía Francisco Rodríguez Henríquez De-obfuscation Techniques Identifying Opaque Constructs –This is the most difficult part of de-obfuscation, the identifying and evaluating of opaque constructs. These fall under three main categories: Local: Global: Inter-procedural:
28
Códigos y Criptografía Francisco Rodríguez Henríquez De-obfuscation Techniques Identification by Pattern Matching –Uses knowledge of strategies employed by obfuscators to identify opaque predicates. This can be gathered through de-compilation and analysis of popular obfuscation problems. To prevent this attach avoid using canned opaque constructs. Also, choose constructs that are syntactically similar to those used in the real application..
29
Códigos y Criptografía Francisco Rodríguez Henríquez De-obfuscation Techniques Identification by Program Slicing –Used by a Reverse Engineer to counter the problem that logically related pieces of code have been broken up and dispersed over the program. Also used to filter “live” code from “dead” code. –Countering this technique of de-obfuscation requires that one adds parameter aliases and variable dependencies to increase the slice size, thus making de-obfuscation a more computationally-intensive process.
30
Códigos y Criptografía Francisco Rodríguez Henríquez Statistical Analysis Used to analyze the outcome of all predicates in an obfuscated system. An alert is made about any predicate value pointing to true over multiple test runs, as it may turn out to be an opaque predicate. A powerful method of preventing this attack is to design opaque predicates in such a way that several predicates would have to be cracked at the same time in order to retrieve information. Example: Protecting Against Statistical Analysis
31
Códigos y Criptografía Francisco Rodríguez Henríquez Statistical Analysis Example: Protecting Against Statistical Analysis Here we aim to thwart statistical analysis by forcing our opaque predicates to have side effects. Below, an obfuscator has determined that S1 and S2 must always execute the same number of times. The statements are then obfuscated using opaque predicates that call to functions Q1 and Q2, which both increment and decrement a global variable k. Now, if a de-obfuscator tries to replace one of the predicates with True, k will overflow. Thus, the de- obfuscated program will always terminate with an error.
32
Códigos y Criptografía Francisco Rodríguez Henríquez The Power of Obfuscation In reality, an obfuscated program really consists of two programs merged into one: a real program which performs a useful task and a bogus task which computes useless information. The sole purpose of this bogus task is to confuse Reverse Engineers by hiding the real program behind irrelevant code. Encryption vs. Obfuscation: –Both are attempts at hiding data from “prying” eyes. –Both have a shelf life lasting until it is possible to “crack” the given protection. Future Areas of Research: –New obfuscating transformations –Interaction and ordering between different transformations (optimization) –Relationship between potency and cost (which has the most “bang-for-the-buck”) Other Uses of Obfuscation: –Tracing of Software Piracy Different obfuscated versions of the same code would be sold to all customers, thus making it easily identifiable which one distributed their application to others. –Mobile Agent Security Enforcing “Blackbox” security techniques on un-trusted hosts.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.