1 Obfuscation and Tamperproofing Clark Thomborson 19 March 2010
2 What Secrets are in Software? Source Code, Algorithms: competitors might provide similar functionality at less R&D cost. Constants: pirates might exploit knowledge of a decryption key or other compact secret. Internal function points: pirates might tamper with critical code e.g. if ( not licensed ) exit( ). External interfaces: competitors might exploit a “service entrance”; attackers might create a “backdoor”.
3 Security Boundary for Obfuscated Code Algorithm Function Points Secret Keys Secret Interface Source code P Obfuscated code O(X) Same behaviour as X Released to attackers who want to know secrets: source code P, algorithm, unobfuscated X, function points, … Obfuscator Executable X Compiler CPU GUI
4 Security Boundary for Encrypted Code Compiler Encrypted code E(X) Encrypter Executable X GUI Decrypter CPU Decrypted X Encryption requires a black-box CPU. Note: I/O must be limited. No debuggers allowed! Algorithm Function Points Secret Keys Secret Interface Source code P
5 Design Issues for Encrypted Code Key distribution –Tradeoff: security for expense & functionality. Branches into an undecrypted block will stall the CPU until the target is decrypted. This runtime penalty is proportional to block size. Stronger encryption larger blocks larger runtime penalty. Another tradeoff. The RAM buffer and the decrypter must be large and fast, to minimize the number of undecrypted blocks. A black-box system with a large and fast RAM (more than will fit in the caches of a single-chip CPU) will be either expensive or insecure. A third tradeoff.
6 Debugging Encrypted Code Compiler E(X) Encrypter Executable X GUI Decrypter CPU Decrypted X Usually, a secret interface is an Easter Egg: easy to find if you know where to look! A confidentiality risk. Mitigation: special hardware required to access the secret interface. Algorithm Function Points Secret Keys Secret Interface Source code P
7 Tampering Attack on Encrypted Code Compiler E(X) Encrypter Executable X GUI Decrypter CPU Decrypted X Random x86 code is likely to crash or loop (Barrantes, 2003). Mitigation: cryptographically signed code. The system should test the signature on an executable before running it. Algorithm Function Points Secret Keys Secret Interface Source code P E’(X)
8 Intrusion Attack on Encrypted Code Compiler E(X) Encrypter Executable X GUI Decrypter CPU Decrypted X The attacker might find a way to inject code through the GUI. Mitigations: secure programming techniques, type-safe programming languages, safety analysis on X, runtime intrusion detections, sandboxing, … Algorithm Function Points Secret Keys Secret Interface Source code P E(X)
9 Tampering Attack on Obfuscated Code Algorithm Function Points Secret Keys Secret Interface Source code P Mitigation 1: O(X) might check its own signature. Note: O’(X) might not include this check! Mitigation 2: obfuscate X so heavily that attacker is only able to inject random code. Obfuscator Executable X Compiler CPU GUI O(X) O’(X)
10 Typical Obfuscation Techniques Lexical obfuscations: –Obscure names of variables, methods, classes, interfaces, etc. (We obscure opcodes in our new framework.) Data obfuscations: –Obscure values of variables, e.g. encoding several booleans in one int, or encoding one int in several float s; –Obscure data structures, e.g. transforming 2-d arrays into vectors, and vice versa; Control obfuscations: –Inlining and outlining, to obscure procedural abstractions; –Opaque predicates, to obscure control flow. –(Control flow is obscured in our new obfuscation, because branching opcodes look like non-branching opcodes.)
Put a secret FSM in the CPU fetch-execute hardware, or in the interpreter. The FSM translates opcodes immediately after the decode. Software is “diversified” before it is obfuscated: basic blocks are subdivided, scrambled into random order, and instructions within blocks are reordered randomly (where possible). Diversified software must be custom-translated for each FSM. –This implies that the software producer must know the serial number of its customer’s FSM. –We cannot allow the attacker to learn this information. –This is a classic key-distribution problem. Unfortunately, the keying is symmetric, because our opcode translation is not a one-way function. Individualised FSMs could be distributed as obfuscated software or firmware, or might be hard-wired into CPU chips. Obfuscated Interpretation Fetch Unit Decode Unit FSM Unit Execute Unit Start / Stop
12 Obfuscated 2-op Ass’y Code Cleartext: Let x = n Let p = 1 Loop: if x = = 0 exit add p, x sub x, 1 goto Loop; Obfuscated text: Let x = n Let p = 1 Loop: if x = = 0 exit sub p, x add x, 1 add p, 0 goto Loop; FSM translator (in CPU pipeline): add/sub sub/add add/add sub/sub add sub Starting State “dummy instruction” to force FSM transition
13 Obfuscated Java Bytecode 1 iconst_0 2 istore_2 3 iload_1 4 istore_1 5 if_icmpne Label3 6 Label1: 7 irem 8 iload_2 9 iload_1 10 iload_1 11 Label4: 12 goto Label2 13 iadd 14 istore_2 15 bipush 1 16 bipush 1 17 iload_1 18 pop 19 Label2: 20 iinc bipush 1 22 goto Label4 23 Label3: 24 iconst_1 25 iload_0 26 if_icmple Label1 27 iadd 28 ireturn The translating FSM has 8 states, one for each opcode it translates: {goto, if_icmpne, iload_1, iconst_1, iconst_2, iadd, iload_2, irem} Could you de- obfuscate this? Could you develop a “class attack”? Note: each CPU has a different FSM.
14 Security Analysis Tampering: an attacker should not be able to modify the obfuscated code. –Level 1 Attack: an attacker makes a desired change in program behaviour with a small number of localized changes to representation and semantics, i.e. changing “ if (licensed) goto L ” into “ goto L ”. –Level 2 Attack: an attacker makes a large change in program representation, i.e. by decompiling and recompiling. This may obliterate a watermark, and it will facilitate other attacks.
15 Prohibited Actions (cont.) Reverse Engineering: an attacker should not be able to modify or re-use substantial portions (constants, objects, loops, functions) of an obfuscated code. –Level 3 Attack: an attacker makes large-scale changes in program behaviour, for example by de-obfuscating a decryption key to produce a “cracked” program. Automated De-obfuscation: “class attack”. –Level 4 Attack: an attacker makes large-scale changes to the behaviour of a large number of obfuscated programs, for example by publishing a cracking tool suitable for use by script-kiddies.
16 3-D Threat Model A.An adversary might have relevant knowledge & tools; B.An adversary might have relevant powers of observation; C.An adversary might have relevant control powers (i.e. causing the CPU to fetch and execute arbitrary codes). Goal of security analysis: what adversarial powers enable a level-k attack?
17 A. Knowledge and Tools Level A0: adversary has an obfuscated code X’ and a computer system with a FSM that correctly translates and executes X’. Level A1: adversary attended this seminar. Level A2: adversary knows how to use a debugger with a breakpoint facility. Level A3: adversary has tracing software that collects sequences of de-obfuscated instruction executions, correlated with sequences of obfuscated instructions; and adversary can do elementary statistical computations on these traces. Level A4: adversary has an implementation of every FSM F k (x), obfuscator F k -1 (x), and an efficient way to derive obfuscation key k from X’. Our framework seems secure against level-A1 adversaries. Level-A2 adversaries with sufficient motivation (and a debugger) will eventually progress to Level-A3 and then Level- A4 (which enables a level-4 “class attack”).
18 B. Observations Level-B0 observation: run X’ on a computer, observe output. Level-B1 observation: given X’’ and an input I, determine whether X’’(I) differs from X’(I) in its I/O behaviour. Level-B2 observation: record a few opcodes and operands before and after FSM translation. (Use level-A2 tool.) Level-B3 observation: record a complete trace of de-obfuscated instructions from a run of P’ Level-B4 observation: determine the index x of a FSM which could produce a given trace from a run of P’ We estimate that O(n 2 ) level-B2 observations are enough to promote a level-A2 adversary to level-A3, for FSMs with n states. (The adversary could look for commonly-repeated patterns immediately before branches; these are likely to be “dummy sequences”. Branches may be recognized by their characteristic operand values.) Level-B4 requires great cryptographic skill or level-C2 control.
19 C. Control Steps Level-C0 control: keyboard and mouse inputs for a program run. Level-C1 control: adversary makes arbitrary changes to the executable P’, then runs the resulting P’’ Level-C2 control: adversary injects a few (arbitrarily chosen) opcodes into the fetch unit of the CPU after it reaches an execution breakpoint that is chosen by the adversary. (Use level-A2 tool: debugger.) Level-C3 control: Adversary restarts the FSM, then injects arbitrary inputs into the fetch unit at full execution bandwidth. Level-C4 control: Adversary can inject arbitrary inputs into software implementations of FSM F(x) and obfuscator F -1 (x) for all x. Level-C2 adversaries will eventually reach Levels C3 and then C4.
20 Summary and Discussion New framework for obfuscated interpretation –Faster and cheaper than encryption schemes –Secure, unless an attacker is able to observe and control the FSM using a debugger (= a level-2 adversary). –We are still trying to develop an obfuscation-by- translation scheme that can be cracked only by a cryptographer who is also expert in compiler technology (= a level-4 adversary).
21 Future Work Prototype implementation for Java bytecode. Dummy insertions need not occur immediately before branches. –When translating a basic block, we will randomly choose among the efficiently-executable synonyms that end in the desired state. –This is the usual process of code optimization, plus randomization and a side-constraint. Operand obfuscation!! –Operand values leak information about opcodes.