Presentation is loading. Please wait.

Presentation is loading. Please wait.

Security by Obscurity: Code Obfuscation

Similar presentations


Presentation on theme: "Security by Obscurity: Code Obfuscation"— Presentation transcript:

1 Security by Obscurity: Code Obfuscation
Kai-fan Lee 11/22/2018

2 Introduction Current state of protecting intellectual property
Legal protection Server side execution Code encryption Code obfuscation a transformation that transforms P to P`, such that P` preserves the same observable behavior as P, but much more difficult to analyze So far, the security system we discussed that protects data secrecy is crypto system. Today, I am going to talk another security that provides secrecy, code obfuscation. Which is targeted toward software Intellectual property. That’s an important topic today because with fast development of internet and lotsa programs shipped in ANDF format (Java), a determined hacker can easily reverse engineer the program and gain unfair advantage over software vendor. Before we go into the Code obfuscation, let’s review the current state of intellectual property protection. Legal Protection: it is effective, but hard for small company to use legal means against giant corp like Microsoft Server side execution: software resides on server side that customer doesn’t has access to. Vendor essentially provide a thin client, and sells software service to customer, but performance can be a problem Code encryption: executable is encrypted before shipped to customer, and customer’s machine will decrypt the code and execute it. Very secure, but need specialized hardware, otherwise the decrypted will be stored in memory and can be sniffed by hacker. Code obfuscation: 11/22/2018

3 Goals of Obfuscation Collberg’s 4 criteria
Potency: adds obscurity to confuse human reader Stealth: transformation should not look obvious ie: isPrime( …71) Resilience: hard to remove by automatic method Cost: should not add too much overhead So the definition is kind of high level, but it doesn’t really tell us how to obfuscate a program. So Collberg, pioneer of code obfuscation had come up with the following criteria to evaluate code obfuscation. Potency: some typical measures include increased code size, increase nested level of predicate, increase class hierarchy… etc. Stealth: the obfuscated section should not look obvious ie; isPrime( …1351), this definitely looks like artificial code Resilience: the transformed may confuse human reader, but decompilation may use static local/global analysis to remove those changes Cost: the obfuscated program should not incur too much time/space overhead to the program to hinge the performance 11/22/2018

4 Classification Of Obfuscation
Layout Transformation Preventive Transformation ie: Mocha (decompiler) vs. HoseMocha (obfuscator) Data Transformation Storage: ex: convert static data to procedure Encoding: ex: redefine data value Aggregation Ordering Control Transformation Aggregation: ex: inline & outline Ordering: spaghetti code Computation: ex: loop transform, dead code insertion Collberg also defined the classifications of obfuscation Layout transformation: makes changes to the program lay out, ie: removal of comments, line number, scramble variable names, function names. For example if you have a variable name serial_number, it will draw a lot of attention to crackers, to work on that. So you can either scramble those names ie: rename serial_number to foo, or swap names, ie: swap serial_number with counter Preventive transformation: exploits weakness in current decompilers/deobfuscator. Ie: Mocha is a decompilation tool that transform java class file into java source code, so this particular deobfuscator HoseMocha inserts a bogus instruction after “return” byte code instruction which will cause Mocha to crash. Data Transformation: this affects the data structures in the program, it can be broken into 3 categories Storage and encoding: change way data structure is stored or interpreted Ie: change local variable to global variable with dataflow analysis Ie: since the representation of a particular data value is a convention rather than absolute, we can re interpret them, for example, 1100 is known as 12 in Decimal, but it can well be anything Ordering: change the order which variables are declared, not very potent, but very resilient, since deobfuscator can’t tell what the original order is Aggregation: split or group data together. For example, we can merge, split or insert inheritance classes in java. Since those are simple set operations, ie: the inherited class perform union operation on class fields, and override methods. Or we can split a array into two, or merge two arrays into one, or make a single dimensional array into multidimensional. Control Transformation: this transformation disguise the real control flow of the program Aggregation: merge or break up of computation An example is method inlining and outlining. Inlining replaces a procedure call with the body of the procedure, while outlining does the opposite (turning sequences of statements into procedure), inlining and outlining. They creates great confusion into the program because modern programming languages relies on procedure as basic building block. Ordering transformation: is by analyzing Control flow graph , and data flow of the function, and insert random jump statement to modify the control flow of the code segment, basically, make them more speghetti. Computation: computation transformation is achieved by inserting “dead” code or modification of the program algorithm, logic Dead code is code that never executed, but will just cause confustion ie: if (1>5) Sdead Or we can create non reducible workflow, which is by introducing jump statement into body of loop, which will will give de obfuscator a hard time to deduce. 11/22/2018

5 Opaque Construct If (5>1)T { S; } else { Sbug; } If (1>5)F {
Dead code insertion is most often used, and easiest to implement Ex: PT (5>1):predicate always evaluated to be true, PF (1>5):predicate always evaluated to be false If (5>1)T { S; } else { Sbug; } If (1>5)F { Sbug; } S; While (E and (5>1)T) { S; } So far we covered a lot of ad hoc transformations, but yet the ones that is easiest to implement, and often used are code reordering and dead code insertion. But these transformations are pretty useless themselves, because they can be removed easily. So we need some means to protect dead code insertion, so they won’t be removed. Ex1: if “always true” predicate, then execute correct code, else, execute the dead code Ex2: if “always false” predicate, then execute dead code, which never happens. else, execute the correct code Problem: dead code can be easily removed Solution: Opaque construct in point p of a program is the variable V or a fragment of program P, which has a value that is well known during the time of obfuscation, but is very hard to determine after obfuscation. 11/22/2018

6 Opaque Construct (Cont.)
Mathematical truth: ((x+x2) mod 2 = 0)T ((28x2-13x-5) mod 9 = 0)T Decent resilience, but not very potent and stealthy Pointer alias problem: NP hard to solve (g != h)T (f != h)T So we know that opaque construct will add resilience to the inserted code in our program. But how do we construct a good opaque construct? With the naïve approach, the opaque construct is again can be easily removed by simple static local analysis The second approach is a lot safer, the resilience of this approach depends on strength of theorem prover. But it may not be very potent, because it may not be all that difficult for human reader to recognize the well known math fact. The third one is relies on the difficulty of analyzing pointer alias problem. Pointer alias happens whenever two pointers in a program reference to the same memory address. And it’s known to be a NP hard problem to solve. So it’s hard for deobfuscator to break, and is very potent, because use of pointers is very common practice in programming. Strength of obfuscated program often relies on its opaque construct. In many ways, opaque construct is analogous to crypto key in crypto system. 11/22/2018

7 What goes wrong? Hard to debug May Promote Piracy 11/22/2018
Obfuscated program is hard to debug, because usually line numbers are removed, and variable names and function names are scrambled, so software vendors have to have two copy of the program, one for development and one for shipping. They also have make sure they behave the same, which adds extra work to testing. Obfuscation may even promote piracy given an pirate can buy a legal copy and obfuscate it themselves and resell it as a re-engineered copy legally. 11/22/2018

8 Conclusion/Questions?
Will play an important role in the future because of ANDF Microsoft already planned to ship their visual studio .NET with third party obfuscator Thank You!! With advent of dynamically linked, intermediate-compiled language like Java and C#, the research for code obfuscation is starting to intensify. Software giant Microsoft already envisioned this problem and will include third party obfuscator tool with their Visual Studio .NET. Code obfuscation is going to play a major role in software development in the future. 11/22/2018


Download ppt "Security by Obscurity: Code Obfuscation"

Similar presentations


Ads by Google