M IMIC : Computing Models for Opaque Code Stefan Heule, Manu Sridharan, Satish Chandra Stanford University, Samsung Research America September 4, 2015;

Slides:



Advertisements
Similar presentations
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Advertisements

1 Various Methods of Populating Arrays Randomly generated integers.
Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.
Slides prepared by Rose Williams, Binghamton University ICS201 Exception Handling University of Hail College of Computer Science and Engineering Department.
Concolic Modularity Testing Derrick Coetzee University of California, Berkeley CS 265 Final Project Presentation.
Chapter 3.5 Debugging Games
Online Performance Auditing Using Hot Optimizations Without Getting Burned Jeremy Lau (UCSD, IBM) Matthew Arnold (IBM) Michael Hind (IBM) Brad Calder (UCSD)
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
Finding the Weakest Characterization of Erroneous Inputs Dzintars Avots and Benjamin Livshits.
9/27/2006Prof. Hilfinger, Lecture 141 Parsing Odds and Ends Lecture 14 (P. N. Hilfinger, plus slides adapted from R. Bodik)
Register Allocation and Spilling via Graph Coloring G. J. Chaitin IBM Research, 1982.
2  Problem Definition  Project Purpose – Building Obfuscator  Obfuscation Quality  Obfuscation Using Opaque Predicates  Future Planning.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Programming Concepts MIT - AITI. Variables l A variable is a name associated with a piece of data l Variables allow you to store and manipulate data in.
ADT Stacks and Queues. Stack: Logical Level “An ordered group of homogeneous items or elements in which items are added and removed from only one end.”
CSE 341 Lecture 24 JavaScript arrays and objects slides created by Marty Stepp
Dataface API Essentials Steve Hannah Web Lite Solutions Corp.
REPETITION STRUCTURES. Topics Introduction to Repetition Structures The while Loop: a Condition- Controlled Loop The for Loop: a Count-Controlled Loop.
Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.
1 Chapter Eight Exception Handling. 2 Objectives Learn about exceptions and the Exception class How to purposely generate a SystemException Learn about.
Chapter 8: Systems analysis and design
SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.
CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Tevfik Bultan Lecture 12: Pointers continued, C strings.
Object-Oriented Program Development Using Java: A Class-Centered Approach, Enhanced Edition.
Inferring Specifications to Detect Errors in Code Mana Taghdiri Presented by: Robert Seater MIT Computer Science & AI Lab.
Testing. 2 Overview Testing and debugging are important activities in software development. Techniques and tools are introduced. Material borrowed here.
Applying Genetic Algorithm to the Knapsack Problem Qi Su ECE 539 Spring 2001 Course Project.
Control Structures. if-elsif-else semantically the same as C/C++ syntactically, slightly different. if ($a > 0){ print “\$a is positive\n”; } elsif ($a.
Pointers OVERVIEW.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
Coding Design Tools Rachel Gauci. What are Coding Design Tools? IPO charts (Input Process Output) Input- Make a list of what data is required (this generally.
Controlling Execution Dong Shao, Nanjing Unviersity.
Reasoning about programs March CSE 403, Winter 2011, Brun.
Computer Programming for Biologists Class 3 Nov 13 th, 2014 Karsten Hokamp
Automated Patch Generation Adapted from Tevfik Bultan’s Lecture.
Chapter 5 Array-Based Structures © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.
JavaScript, Fourth Edition
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
Designing Classes CS239 – Jan 26, Key points from yesterday’s lab  Enumerated types are abstract data types that define a set of values.  They.
1 Introduction  Algorithms  Data structures  Abstract data types  Programming with lists and sets © 2008 David A Watt, University of Glasgow Algorithms.
CSE 332: C++ Statements C++ Statements In C++ statements are basic units of execution –Each ends with ; (can use expressions to compute values) –Statements.
A BSTRACT D ATA T YPES (ADT S ) COMP1927 Computing 2 16x1 Sedgewick Chapter 4.
A Memory-hierarchy Conscious and Self-tunable Sorting Library To appear in 2004 International Symposium on Code Generation and Optimization (CGO ’ 04)
ICS3U_FileIO.ppt File Input/Output (I/O)‏ ICS3U_FileIO.ppt File I/O Declare a file object File myFile = new File("billy.txt"); a file object whose name.
Chapter 5 Array-Based Structures © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.
Working with Loops, Conditional Statements, and Arrays.
Chapter 10 Loops: while and for CSC1310 Fall 2009.
Recommending Adaptive Changes for Framework Evolution Barthélémy Dagenais and Martin P. Robillard ICSE08 Dec 4 th, 2008 Presented by EJ Park.
THRio Database Linkage and THRio Database Issues.
XP Tutorial 3 New Perspectives on JavaScript, Comprehensive1 Working with Arrays, Loops, and Conditional Statements Creating a Monthly Calendar.
ECA 225 Applied Interactive Programming ECA 225 Applied Online Programming javascript arrays.
JavaScript and Ajax (JavaScript Arrays) Week 5 Web site:
ADOBE FLEX Action Script Basics. MXML. Language Basics Action Script 3.0.
Functional Processing of Collections (Advanced) 6.0.
CSE 332: C++ Exceptions Motivation for C++ Exceptions Void Number:: operator/= (const double denom) { if (denom == 0.0) { // what to do here? } m_value.
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 17 – Specifications, error checking & assert.
By Sasikumar Palanisamy
CSE 374 Programming Concepts & Tools
Reasoning about code CSE 331 University of Washington.
CS1371 Introduction to Computing for Engineers
Control Statements Lecture 6 from Chapter 5.
Functional Processing of Collections (Advanced)
Objectives Create an array Work with array properties and methods
ARRAYS MIT-AITI Copyright 2001.
Lecture 12: Data Wrangling
COS 260 DAY 10 Tony Gauvin.
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.
Exceptions.
Problems Debugging is fine and dandy, but remember we divided problems into compile-time problems and runtime problems? Debugging only copes with the former.
For loops Taken from notes by Dr. Neil Moore
Presentation transcript:

M IMIC : Computing Models for Opaque Code Stefan Heule, Manu Sridharan, Satish Chandra Stanford University, Samsung Research America September 4, 2015; FSE; Bergamo, Italy 1

Computing Models for Opaque Code Opaque code – Code is executable – Source not available, or hard to process Challenge: Program analysis in the presence of opaque code Model – Representation suitable for program analysis 2

Setting Opaque code in JavaScript – Standard library has native implementation Arrays, Regex, Date, etc. – Code obfuscated before deployment var arr = ['a','b','c','d']; var x = arr.shift(); // x is 'a' // arr is now ['b','c','d'] var _0x4240=["\x61","\x62","\x63","\x64", "\x73\x68\x69\x66\x74"]; var arr=[_0x4240[0],_0x4240[1], _0x4240[2],_0x4240[3]]; var x=arr[_0x4240[4]](); 3

Goals 4

Observing Opaque Code Opaque code is executable – Observe return values – Observe heap accesses on shared objects How can we get such detailed execution traces? – Ideally without having to change the JavaScript runtime ['a','b','c','d'].shift(); read field 'length' of arg0 // 4 read field 0 of arg0 // 'a' has field 1 of arg0 read field 1 of arg0 // 'b' write 'b' to field 0 of arg0 has field 2 of arg0 read field 2 of arg0 // 'c' write 'c' to field 1 of arg0 has field 3 of arg0 read field 3 of arg0 // 'd' write 'd' to field 2 of arg0 delete field 3 of arg0 write 3 to field 'length' of arg0 return 'a' 5

Execution Traces in JavaScript ECMAScript 6 will introduce proxies Proxies are objects of JavaScript with programmer-defined semantics – Intercept field reads, writes, enumerations of fields, etc. var proxy = new Proxy(target, handler); 6

Example of Proxies in ECMAScript 6 Strategy: proxy arguments to opaque code, record interactions var handler = { get: function(target, name) { return name in target? target[name] : 42; } }; var p = new Proxy({}, handler); p.a = 1; console.log(p.a) // prints 1 console.log(p.b) // prints 42 7

Traces Don’t Tell the Full Story Traces contain partial information only – Where do values come from? – What is the program counter? – What non-heap- manipulating computation is happening? → Input generation → Control flow reconstruction → Random search read field 'length' of arg0 // 4 read field 0 of arg0 // 'a' has field 1 of arg0 read field 1 of arg0 // 'b' write 'b' to field 0 of arg0 has field 2 of arg0 read field 2 of arg0 // 'c' write 'c' to field 1 of arg0 has field 3 of arg0 read field 3 of arg0 // 'd' write 'd' to field 2 of arg0 delete field 3 of arg0 write 3 to field 'length' of arg0 return 'a' 8

Approach Overview Given opaque function + initial inputs Initial Inputs Input Gen Input Gen All Inputs Loop Detect Loop Detect Loop Structure Final Model Random Search 9

Goals of Input Generation 1.Flow of values 2.Dependencies on input values – e.g., Array.prototype.indexOf 3.Expose corner cases – e.g., empty array 10

Input Generation Iterate Until Fixpoint or Enough Inputs 1.Start with initial inputs 2.Record traces for inputs 3.Extract locations from traces that are being read 4.Generate inputs that differ in those locations 5.Also, generate heuristically interesting inputs ['a','b','c','d'] [], ['b','b','c','d'], ['a','b','c'], ['b','foo','bar','def'], … 11

Control Structure Reconstruction What statement did a trace event originate from? – Trivial for straight-line code – Less clear for loops Abstract trace to skeleton read field 'length' of arg0 // 4 read field 0 of arg0 // 'a' has field 1 of arg0 read field 1 of arg0 // 'b' write 'b' to field 0 of arg0 has field 2 of arg0 read field 2 of arg0 // 'c' write 'c' to field 1 of arg0 has field 3 of arg0 read field 3 of arg0 // 'd' write 'd' to field 2 of arg0 delete field 3 of arg0 write 3 to field 'length' of arg0 return 'a' read; read; has; read; write; has; read; write; has; read; write; delete; write; return; 12

Control Structure Reconstruction (2) Problem can be viewed as learning a regular language From only positive examples – Theoretical result: impossible [Gold ’67] read; read; (has; (delete;| read; write;))* delete; write; read; read; has; read; write; has; read; write; has; read; write; delete; write; return; 13

Control Structure Reconstruction (3) Limit ourselves to at most one loop There still might be multiple possible loop structures – Generate many proposals – Rank them based on how many traces they explain Heuristic to break ties read; read; (has; (delete;| read; write;))* delete; write; read; read; (has; delete;| has; read; write;)* delete; write; read; read; (has; (read; write;| delete; has; read; write;))* delete; write; read; read; (has; (read; write;| delete;))*has; read; write; delete; write; 14

Control Structure Reconstruction (4) 15

Control Structure Reconstruction (5) Given the a loop proposal, we get an initial model var n0 = arg0.length var n1 = arg0[0] for (var i = 0; i < ?; i += 1) { var n2 = ? in arg0 if (?) { delete arg0[?] } else { var n4 = arg0[?] arg0[?] = ? } delete arg0[?] arg0.length = ? return ? read; ( has; ( delete; | read; write; ) )* delete; write; return; var n0 = arg0.length var n1 = arg0[0] for (var i = 0; i < 0; i += 1) { var n2 = 1 in arg0 if (false) { delete arg0[0] } else { var n4 = arg0[1] arg0[0] = 'b' } delete arg0[4] arg0.length = 4 return 'a' 16

Randomized Search Then, apply random search (Markov Chain Monte-Carlo (MCMC) sampling inspired) – Randomly mutate the current program – Evaluate it with a fitness function – Accept “better” programs, and sometimes worse ones, too 17

Randomized Search (2) 18

Fitness Function Fitness function – Run model on all inputs – Compare all traces against real traces Score – Zero: if trace is matching perfectly – Partial score if only parts of trace are matching 19

Randomized Search (3) 20

Random Search Result After some number of iteration, score goes to zero – This is a model What about the empty array?? – Doesn’t actually match the control flow structure var n0 = arg0.length var n1 = arg0[0] for (var i = 0; i < (n0-1); i += 1) { var n2 = (i+1) in arg0 if (n2) { var n3 = arg0[i+1] arg0[i] = n3 } else { delete arg0[i] } delete arg0[i] arg0.length = i return n1 21

Full Overview Initial Inputs Input Gen Input Gen All Inputs Loop Detect Loop Detect Loop Structure Categorizer Input Cat 1 Input Cat 2 Search Model 1 Model 2 Merge Unknown Conditions Model Final Model … Input Cat n Model n Search Repeat until success: Cleanup 22

Input Categories for shift For shift – Category for empty array – Category for non-empty arrays var n0 = arg0.length if (false) { /* model for non-empty arr */ } else { arg0.length = 0 } var n0 = arg0.length if (n0) { /* model for non-empty arr */ } else { arg0.length = 0 } 23

Implementation Notes Randomly generate models might not terminate – Stop execution if trace get too long Newly allocated objects – Don’t show up in trace (only when returned) – Approach: Allocate at beginning of model, then randomly search for population code if (false) { result[0] = 0; } 24

Optimizations Only use subset of inputs, not all inputs – Heuristic to choose 20 diverse inputs How long is the trace? How does the initial model score? – At the end, validate with all inputs If it fails, restart with failed inputs added Embarrassingly parallel search: exploit multiple cores Don’t propose nonsensical programs – Type analysis var n0 = arg0[arg0]; 25

Evaluation JavaScript Array Standard Library Problems 1.Multiple loops 2.Bugs in proxy implementation (not officially released) 3.Missing program mutations ✓ every ✓ filter ✓ forEach ✓ indexOf ✓ lastIndexOf ✓ map ✓ pop ✓ push ✓ reduce ✓ reduceRight ✓ shift ✓ some ✗ concat 1,2 ✗ join 2 ✗ reverse 1 ✗ slice 2 ✗ sort 1 ✗ splice 1,2 ✗ toString 3 ✗ unshift 1 26

Evaluation (2) We contributed some of our models to WALA, a static analysis library for JavaScript – New models increase analysis precision – Also found a previous model to be wrong, and several to be incomplete (sparse arrays) 27

Evaluation (3) 28

Summary Opaque code problematic for analysis Automatically synthesize models – Using MCMC random search – Program traces to evaluate models Source code and replication package 29