1 Obfuscation and Tamperproofing Clark Thomborson 19 March 2010.

Slides:



Advertisements
Similar presentations
Chapter 16 Java Virtual Machine. To compile a java program in Simple.java, enter javac Simple.java javac outputs Simple.class, a file that contains bytecode.
Advertisements

1 CIS 461 Compiler Design and Construction Fall 2014 Instructor: Hugh McGuire slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module.
1 Lecture 10 Intermediate Representations. 2 front end »produces an intermediate representation (IR) for the program. optimizer »transforms the code in.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Programming Languages Marjan Sirjani 2 2. Language Design Issues Design to Run efficiently : early languages Easy to write correctly : new languages.
Wmobf.1 1/5/00 Clark Thomborson Watermarking, Tamper-Proofing and Obfuscation – Tools for Software Protection Christian Collberg & Clark Thomborson Computer.
DR. MIGUEL ÁNGEL OROS HERNÁNDEZ 9. Técnicas anti-ingeniería inversa.
Linear Obfuscation to Combat Symbolic Execution Zhi Wang 1, Jiang Ming 2, Chunfu Jia 1 and Debin Gao 3 1 Nankai University 2 Pennsylvania State University.
Lightweight Modeling of Java Virtual Machine Security Constraints using Alloy Mark Reynolds BU CS511 Midterm Report March 26, 2008.
Snick  snack A Working Computer Slides based on work by Bob Woodham and others.
EECC551 - Shaaban #1 Spring 2006 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.
EECC551 - Shaaban #1 Fall 2005 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
Chapter 16 Java Virtual Machine. To compile a java program in Simple.java, enter javac Simple.java javac outputs Simple.class, a file that contains bytecode.
EECC551 - Shaaban #1 Spring 2004 lec# Definition of basic instruction blocks Increasing Instruction-Level Parallelism & Size of Basic Blocks.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
3-1 3 Compilers and interpreters  Compilers and other translators  Interpreters  Tombstone diagrams  Real vs virtual machines  Interpretive compilers.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
1 Software Testing Techniques CIS 375 Bruce R. Maxim UM-Dearborn.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Software design and development Marcus Hunt. Application and limits of procedural programming Procedural programming is a powerful language, typically.
Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.
Dr. Pedro Mejia Alvarez Software Testing Slide 1 Software Testing: Building Test Cases.
David Evans CS201j: Engineering Software University of Virginia Computer Science Lecture 18: 0xCAFEBABE (Java Byte Codes)
P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Compilers, Interpreters and Debuggers Ruibin Bai (Room AB326) Division of Computer Science.
Compiler Construction Lecture 17 Mapping Variables to Memory.
Application Security Tom Chothia Computer Security, Lecture 14.
5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2.
High level & Low level language High level programming languages are more structured, are closer to spoken language and are more intuitive than low level.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
Software Overview. Why review software? Software is the set of instructions that tells hardware what to do The reason for hardware is to execute a program.
Research supported by IBM CAS, NSERC, CITO Context Threading: A flexible and efficient dispatch technique for virtual machine interpreters Marc Berndl.
Hardware Assisted Control Flow Obfuscation for Embedded Processors Xiaoton Zhuang, Tao Zhang, Hsien-Hsin S. Lee, Santosh Pande HIDE: An Infrastructure.
Code Optimization 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a.
Protecting Software Code By Guards The George Washington University Cs297 YU-HAO HU.
Programming Languages
Concurrency Properties. Correctness In sequential programs, rerunning a program with the same input will always give the same result, so it makes sense.
Information Leaks Without Memory Disclosures: Remote Side Channel Attacks on Diversified Code Jeff Seibert, Hamed Okhravi, and Eric Söderström Presented.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
Cryptography Against Physical Attacks Dana Dachman-Soled University of Maryland
Formal Refinement of Obfuscated Codes Hamidreza Ebtehaj 1.
Creating Security using Software and Hardware Bradley Herrup CS297- Security and Programming Languages.
Chap. 10, Intermediate Representations J. H. Wang Dec. 27, 2011.
HNDIT23082 Lecture 09:Software Testing. Validations and Verification Validation and verification ( V & V ) is the name given to the checking and analysis.
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
CS 536 © CS 536 Spring Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 15.
1 The user’s view  A user is a person employing the computer to do useful work  Examples of useful work include spreadsheets word processing developing.
Compilers and Security
Advanced Architectures
Assembly language.
Control Unit Lecture 6.
DDC 1023 – Programming Technique
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
Secure Processing On-Chip
CSc 453 Interpreters & Interpretation
Advanced Computer Architecture
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
CSc 453 Interpreters & Interpretation
Chapter 15 Debugging.
Presentation transcript:

1 Obfuscation and Tamperproofing Clark Thomborson 19 March 2010

2 What Secrets are in Software? Source Code, Algorithms: competitors might provide similar functionality at less R&D cost. Constants: pirates might exploit knowledge of a decryption key or other compact secret. Internal function points: pirates might tamper with critical code e.g. if ( not licensed ) exit( ). External interfaces: competitors might exploit a “service entrance”; attackers might create a “backdoor”.

3 Security Boundary for Obfuscated Code Algorithm Function Points Secret Keys Secret Interface Source code P Obfuscated code O(X) Same behaviour as X Released to attackers who want to know secrets: source code P, algorithm, unobfuscated X, function points, … Obfuscator Executable X Compiler CPU GUI

4 Security Boundary for Encrypted Code Compiler Encrypted code E(X) Encrypter Executable X GUI Decrypter CPU Decrypted X Encryption requires a black-box CPU. Note: I/O must be limited. No debuggers allowed! Algorithm Function Points Secret Keys Secret Interface Source code P

5 Design Issues for Encrypted Code Key distribution –Tradeoff: security for expense & functionality. Branches into an undecrypted block will stall the CPU until the target is decrypted. This runtime penalty is proportional to block size. Stronger encryption  larger blocks  larger runtime penalty. Another tradeoff. The RAM buffer and the decrypter must be large and fast, to minimize the number of undecrypted blocks. A black-box system with a large and fast RAM (more than will fit in the caches of a single-chip CPU) will be either expensive or insecure. A third tradeoff.

6 Debugging Encrypted Code Compiler E(X) Encrypter Executable X GUI Decrypter CPU Decrypted X Usually, a secret interface is an Easter Egg: easy to find if you know where to look! A confidentiality risk. Mitigation: special hardware required to access the secret interface. Algorithm Function Points Secret Keys Secret Interface Source code P

7 Tampering Attack on Encrypted Code Compiler E(X) Encrypter Executable X GUI Decrypter CPU Decrypted X Random x86 code is likely to crash or loop (Barrantes, 2003). Mitigation: cryptographically signed code. The system should test the signature on an executable before running it. Algorithm Function Points Secret Keys Secret Interface Source code P E’(X)

8 Intrusion Attack on Encrypted Code Compiler E(X) Encrypter Executable X GUI Decrypter CPU Decrypted X The attacker might find a way to inject code through the GUI. Mitigations: secure programming techniques, type-safe programming languages, safety analysis on X, runtime intrusion detections, sandboxing, … Algorithm Function Points Secret Keys Secret Interface Source code P E(X)

9 Tampering Attack on Obfuscated Code Algorithm Function Points Secret Keys Secret Interface Source code P Mitigation 1: O(X) might check its own signature. Note: O’(X) might not include this check! Mitigation 2: obfuscate X so heavily that attacker is only able to inject random code. Obfuscator Executable X Compiler CPU GUI O(X) O’(X)

10 Typical Obfuscation Techniques Lexical obfuscations: –Obscure names of variables, methods, classes, interfaces, etc. (We obscure opcodes in our new framework.) Data obfuscations: –Obscure values of variables, e.g. encoding several booleans in one int, or encoding one int in several float s; –Obscure data structures, e.g. transforming 2-d arrays into vectors, and vice versa; Control obfuscations: –Inlining and outlining, to obscure procedural abstractions; –Opaque predicates, to obscure control flow. –(Control flow is obscured in our new obfuscation, because branching opcodes look like non-branching opcodes.)

Put a secret FSM in the CPU fetch-execute hardware, or in the interpreter. The FSM translates opcodes immediately after the decode. Software is “diversified” before it is obfuscated: basic blocks are subdivided, scrambled into random order, and instructions within blocks are reordered randomly (where possible). Diversified software must be custom-translated for each FSM. –This implies that the software producer must know the serial number of its customer’s FSM. –We cannot allow the attacker to learn this information. –This is a classic key-distribution problem. Unfortunately, the keying is symmetric, because our opcode translation is not a one-way function. Individualised FSMs could be distributed as obfuscated software or firmware, or might be hard-wired into CPU chips. Obfuscated Interpretation Fetch Unit Decode Unit FSM Unit Execute Unit Start / Stop

12 Obfuscated 2-op Ass’y Code Cleartext: Let x = n Let p = 1 Loop: if x = = 0 exit add p, x sub x, 1 goto Loop; Obfuscated text: Let x = n Let p = 1 Loop: if x = = 0 exit sub p, x add x, 1 add p, 0 goto Loop; FSM translator (in CPU pipeline): add/sub sub/add add/add sub/sub add sub Starting State “dummy instruction” to force FSM transition

13 Obfuscated Java Bytecode 1 iconst_0 2 istore_2 3 iload_1 4 istore_1 5 if_icmpne Label3 6 Label1: 7 irem 8 iload_2 9 iload_1 10 iload_1 11 Label4: 12 goto Label2 13 iadd 14 istore_2 15 bipush 1 16 bipush 1 17 iload_1 18 pop 19 Label2: 20 iinc bipush 1 22 goto Label4 23 Label3: 24 iconst_1 25 iload_0 26 if_icmple Label1 27 iadd 28 ireturn The translating FSM has 8 states, one for each opcode it translates: {goto, if_icmpne, iload_1, iconst_1, iconst_2, iadd, iload_2, irem} Could you de- obfuscate this? Could you develop a “class attack”? Note: each CPU has a different FSM.

14 Security Analysis Tampering: an attacker should not be able to modify the obfuscated code. –Level 1 Attack: an attacker makes a desired change in program behaviour with a small number of localized changes to representation and semantics, i.e. changing “ if (licensed) goto L ” into “ goto L ”. –Level 2 Attack: an attacker makes a large change in program representation, i.e. by decompiling and recompiling. This may obliterate a watermark, and it will facilitate other attacks.

15 Prohibited Actions (cont.) Reverse Engineering: an attacker should not be able to modify or re-use substantial portions (constants, objects, loops, functions) of an obfuscated code. –Level 3 Attack: an attacker makes large-scale changes in program behaviour, for example by de-obfuscating a decryption key to produce a “cracked” program. Automated De-obfuscation: “class attack”. –Level 4 Attack: an attacker makes large-scale changes to the behaviour of a large number of obfuscated programs, for example by publishing a cracking tool suitable for use by script-kiddies.

16 3-D Threat Model A.An adversary might have relevant knowledge & tools; B.An adversary might have relevant powers of observation; C.An adversary might have relevant control powers (i.e. causing the CPU to fetch and execute arbitrary codes). Goal of security analysis: what adversarial powers enable a level-k attack?

17 A. Knowledge and Tools Level A0: adversary has an obfuscated code X’ and a computer system with a FSM that correctly translates and executes X’. Level A1: adversary attended this seminar. Level A2: adversary knows how to use a debugger with a breakpoint facility. Level A3: adversary has tracing software that collects sequences of de-obfuscated instruction executions, correlated with sequences of obfuscated instructions; and adversary can do elementary statistical computations on these traces. Level A4: adversary has an implementation of every FSM F k (x), obfuscator F k -1 (x), and an efficient way to derive obfuscation key k from X’.  Our framework seems secure against level-A1 adversaries.  Level-A2 adversaries with sufficient motivation (and a debugger) will eventually progress to Level-A3 and then Level- A4 (which enables a level-4 “class attack”).

18 B. Observations Level-B0 observation: run X’ on a computer, observe output. Level-B1 observation: given X’’ and an input I, determine whether X’’(I) differs from X’(I) in its I/O behaviour. Level-B2 observation: record a few opcodes and operands before and after FSM translation. (Use level-A2 tool.) Level-B3 observation: record a complete trace of de-obfuscated instructions from a run of P’ Level-B4 observation: determine the index x of a FSM which could produce a given trace from a run of P’  We estimate that O(n 2 ) level-B2 observations are enough to promote a level-A2 adversary to level-A3, for FSMs with n states. (The adversary could look for commonly-repeated patterns immediately before branches; these are likely to be “dummy sequences”. Branches may be recognized by their characteristic operand values.)  Level-B4 requires great cryptographic skill or level-C2 control.

19 C. Control Steps Level-C0 control: keyboard and mouse inputs for a program run. Level-C1 control: adversary makes arbitrary changes to the executable P’, then runs the resulting P’’ Level-C2 control: adversary injects a few (arbitrarily chosen) opcodes into the fetch unit of the CPU after it reaches an execution breakpoint that is chosen by the adversary. (Use level-A2 tool: debugger.) Level-C3 control: Adversary restarts the FSM, then injects arbitrary inputs into the fetch unit at full execution bandwidth. Level-C4 control: Adversary can inject arbitrary inputs into software implementations of FSM F(x) and obfuscator F -1 (x) for all x.  Level-C2 adversaries will eventually reach Levels C3 and then C4.

20 Summary and Discussion New framework for obfuscated interpretation –Faster and cheaper than encryption schemes –Secure, unless an attacker is able to observe and control the FSM using a debugger (= a level-2 adversary). –We are still trying to develop an obfuscation-by- translation scheme that can be cracked only by a cryptographer who is also expert in compiler technology (= a level-4 adversary).

21 Future Work Prototype implementation for Java bytecode. Dummy insertions need not occur immediately before branches. –When translating a basic block, we will randomly choose among the efficiently-executable synonyms that end in the desired state. –This is the usual process of code optimization, plus randomization and a side-constraint. Operand obfuscation!! –Operand values leak information about opcodes.