Differential Slicing: Identifying Causal Execution Differences for Security Applications Noah M. Johnson 1, Juan Caballero 2, Kevin Zhijie Chen 1, Stephen.

Slides:



Advertisements
Similar presentations
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
Advertisements

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Intermediate Code Generation
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
SSA.
CS412/413 Introduction to Compilers Radu Rugina Lecture 37: DU Chains and SSA Form 29 Apr 02.
Ensuring Operating System Kernel Integrity with OSck By Owen S. Hofmann Alan M. Dunn Sangman Kim Indrajit Roy Emmett Witchel Kent State University College.
KLIMAX: Profiling Memory Write Patterns to Detect Keystroke-Harvesting Malware Stefano Ortolani 1, Cristiano Giuffrida 1, and Bruno Crispo 2 1 Vrije Universiteit.
SMU SRG reading by Tey Chee Meng: Automatic Patch-Based Exploit Generation is Possible: Techniques and Implications by David Brumley, Pongsin Poosankam,
Linear Obfuscation to Combat Symbolic Execution Zhi Wang 1, Jiang Ming 2, Chunfu Jia 1 and Debin Gao 3 1 Nankai University 2 Pennsylvania State University.
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
TaintCheck and LockSet LBA Reading Group Presentation by Shimin Chen.
Program Representations Xiangyu Zhang. CS590F Software Reliability Why Program Representations  Initial representations Source code (across languages).
4 July 2005 overview Traineeship: Mapping of data structures in multiprocessor systems Nick de Koning
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
PSUCS322 HM 1 Languages and Compiler Design II Basic Blocks Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU Spring.
3/17/2008Prof. Hilfinger CS 164 Lecture 231 Run-time organization Lecture 23.
1 CS 201 Compiler Construction Lecture 1 Introduction.
Code Generation for Basic Blocks Introduction Mooly Sagiv html:// Chapter
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
Program Representations Xiangyu Zhang. CS590Z Software Defect Analysis Program Representations  Static program representations Abstract syntax tree;
Achieving Trusted Systems by Providing Security and Reliability Ravishankar K. Iyer, Zbigniew Kalbarczyk, Jun Xu, Shuo Chen, Nithin Nakka and Karthik Pattabiraman.
Run-time Environment and Program Organization
C++ for Engineers and Scientists Third Edition
1 RISE: Randomization Techniques for Software Security Dawn Song CMU Joint work with Monica Chew (UC Berkeley)
Handouts Software Testing and Quality Assurance Theory and Practice Chapter 5 Data Flow Testing
Automated Diagnosis of Software Configuration Errors
Automated malware classification based on network behavior
Vulnerability-Specific Execution Filtering (VSEF) for Exploit Prevention on Commodity Software Authors: James Newsome, James Newsome, David Brumley, David.
Secure Web Applications via Automatic Partitioning Stephen Chong, Jed Liu, Andrew C. Meyers, Xin Qi, K. Vikram, Lantian Zheng, Xin Zheng. Cornell University.
Presented By Dr. Shazzad Hosain Asst. Prof., EECS, NSU
Heng Yin, Dawn Song, Manuel Egele, Christopher Kruegel, and Engin Kirda Presentation by Mridula Menon N.
Behavior-based Spyware Detection By Engin Kirda and Christopher Kruegel Secure Systems Lab Technical University Vienna Greg Banks, Giovanni Vigna, and.
Computer Security and Penetration Testing
Programmer's view on Computer Architecture by Istvan Haller.
1 A Static Analysis Approach for Automatically Generating Test Cases for Web Applications Presented by: Beverly Leung Fahim Rahman.
KEVIN COOGAN, GEN LU, SAUMYA DEBRAY DEPARTMENT OF COMUPUTER SCIENCE UNIVERSITY OF ARIZONA 報告者:張逸文 Deobfuscation of Virtualization- Obfuscated Software.
Automatic Diagnosis and Response to Memory Corruption Vulnerabilities Authors: Jun Xu, Peng Ning, Chongkyung Kil, Yan Zhai, Chris Bookholt In ACM CCS’05.
Compiler Construction
Introduction to Software Testing Chapter 8.1 Building Testing Tools –Instrumentation Paul Ammann & Jeff Offutt
Using Memory Management to Detect and Extract Illegitimate Code for Malware Analysis Carsten Willems 1, Thorsten Holz 1, Felix Freiling 2 1 Ruhr-University.
Christopher Kruegel University of California Engin Kirda Institute Eurecom Clemens Kolbitsch Thorsten Holz Secure Systems Lab Vienna University of Technology.
Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.
1 CS 201 Compiler Construction Introduction. 2 Instructor Information Rajiv Gupta Office: WCH Room Tel: (951) Office.
Hassen Grati, Houari Sahraoui, Pierre Poulin DIRO, Université de Montréal Extracting Sequence Diagrams from Execution Traces using Interactive Visualization.
Detecting Equality of Variables in Programs Bowen Alpern, Mark N. Wegman, F. Kenneth Zadeck Presented by: Abdulrahman Mahmoud.
Amit Malik SecurityXploded Research Group FireEye Labs.
1 Data Flow Analysis Data flow analysis is used to collect information about the flow of data values across basic blocks. Dominator analysis collected.
Efficient Computing k-Coverage Paths in Multihop Wireless Sensor Networks XuFei Mao, ShaoJie Tang, and Xiang-Yang Li Dept. of Computer Science, Illinois.
Dead Code Elimination This lecture presents the algorithm Dead from EaC2e, Chapter 10. That algorithm derives, in turn, from Rob Shillner’s unpublished.
CS 614: Theory and Construction of Compilers Lecture 15 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software Paper by: James Newsome and Dawn Song.
Copyright © 2014 Pearson Addison-Wesley. All rights reserved. Chapter 9 Pointers and Dynamic Arrays.
Error Explanation with Distance Metrics Authors: Alex Groce, Sagar Chaki, Daniel Kroening, and Ofer Strichman International Journal on Software Tools for.
/ PSWLAB Evidence-Based Analysis and Inferring Preconditions for Bug Detection By D. Brand, M. Buss, V. C. Sreedhar published in ICSM 2007.
1 Test Coverage Coverage can be based on: –source code –object code –model –control flow graph –(extended) finite state machines –data flow graph –requirements.
Automatic Diagnosis and Response to Memory Corruption Vulnerabilities Authors: Jun Xu, Peng Ning, Chongkyung Kil, Yan Zhai, Chris Bookholt Cyber Defense.
CS223: Software Engineering Lecture 26: Software Testing.
Memory Protection through Dynamic Access Control Kun Zhang, Tao Zhang and Santosh Pande College of Computing Georgia Institute of Technology.
Ghent University Veerle Desmet Lieven Eeckhout Koen De Bosschere Using Decision Trees to Improve Program-Based and Profile-Based Static Branch Prediction.
Phoenix Based Dynamic Slicing Debugging Tool Eric Cheng Lin Xu Matt Gruskin Ravi Ramaseshan Microsoft Phoenix Intern Team (Summer '06)
Automated Pattern Based Mobile Testing
Program Slicing Baishakhi Ray University of Virginia
UNIT V Run Time Environments.
CSC-682 Advanced Computer Security
FIGURE Illustration of Stack Buffer Overflow
Software Testing and QA Theory and Practice (Chapter 5: Data Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
Presentation transcript:

Differential Slicing: Identifying Causal Execution Differences for Security Applications Noah M. Johnson 1, Juan Caballero 2, Kevin Zhijie Chen 1, Stephen McCamant 1, Pongsin Poosankam 1, 3, Daniel Reynaud 1, and Dawn Song 1 1 University of California, Berkeley 2 IMDEA Software Institute 3 Carnegie Mellon University 2011 IEEE Symposium on Security and Privacy 左昌國 ADLab, NCU-CSIE

Introduction Problem Definition and Overview Trace Alignment Slice-Align Evaluation Related Work Conclusion Outline 2

Why does the program crash? At what situation does the malware do malicious behaviors? How do you solve above problems if you don’t have the source code? Introduction 3 Static analysis Dynamic analysis … Too much time spent

This paper, proposes “Differential Slicing” Given 2 execution traces of a program with a target difference Automatically finds the input and environment differences that caused the target difference Generates a causal difference graph Simply expressed what happened Introduction 4

The goal is to “understand” the target difference To identify the input differences that caused the target difference. To understand the sequence of events that let from the input differences to the target difference.  To build the causal difference graph Problem Definition and Overview 5

6 $ vuln_cmp bar bazaar Strings are not equal $ vuln_cmp “” foo > Passing trace Failing trace Then the passing trace and the failing trace can be used for Trace Alignment. Input differences? (byte level) Target difference

Problem Definition and Overview 7 Aligned region Disaligned region

8 Divergence point Flow difference Value difference Flow differences = disaligned statements

Causal difference graph The causal difference graph contains the sequences of execution differences leading from the input differences to the target differences. Problem Definition and Overview 9

6k lines of Objective Caml codeObjective Caml Trace alignment and post-dominator module : 4k lines Slice-Align module : 2k lines Problem Definition and Overview 10

Dominate A node d dominates node n iff every path from entry node to n passes through d. (node d is a dominator of node n) Node id immediately dominates n if id dominates n, and no other node p such that id dominates p and p dominates n. (id is the only immediate dominator of n) Post Dominate Same as dominate, from node n to the exit node Immediate post dominator Trace Alignment 11 A B C D E F G

Execution Indexing Execution Indexing captures the structure of the program at any given point in the execution, identifying the execution point, and uses that structure to establish a correspondence between execution points across multiple executions of the program. Xin et al. use an indexing stack to deal with branch or method call. Trace Alignment 12 A B C D E F G G F Current node A C D F G stack

Trace Alignment 13 A B C D E F G G F Current node A C D F G stack G A B G

worklist A pool of instructions to be operated Slice-Align 14

Slice-Align 15 worklist Input difference

Edge pruning and address normalization Pruning edges in the graph when an operand of an aligned instruction has the same value in both execution traces. Heap pointer pruning The pointer is pruned if 1.The allocation site for the live buffers that contain the pointed-to addresses are aligned 2.The offset of those pointed-to addresses, with respect to the start address of the live buffer they belong to, is the same Stack pointer pruning (in the thread stack range) normalized by subtracting the stack base address Data section pointer pruning (in the same module) normalized by subtracting the module base address Slice-Align 16

Evaluation 17

Evaluating the Causal Difference Graph Evaluation 18

Graph size #IDiff = number of input differences Evaluation 19

Performance Less than 1 hour to generate a graph evaluation 20

User Study(informal) Subject A: an analyst at a commercial security research company Subject B: a research scientist Evaluation 21

Identifying input differences in malware analysis W32/Conficker.A Keyboard layout: Ukrainian(failing trace), US-English(passing trace) Target difference: CreateThread API call Result: Input difference: user32.dll::GetKeyboardLayoutList function return value W32/Netsky.C Makes the computer speaker beep continuously if the system time between 6am and 9pm on Feb. 26, 2004 Target Difference: Beep function call Resault: Input difference: kernel32.dll::GetLocalTime system call Evaluation 22

Producing causal difference graph Input difference information Execution difference from input difference to target difference Reducing the graph size Reducing the input difference candidates Conclusion 23