TAintscope A Checksum-Aware Directed fuzzing Tool for Automatic Software Vulnerability Detection Tielei Wang1, Tao Wei1, Guofei Gu2, Wei Zou1 1Peking.

Slides:



Advertisements
Similar presentations
Designing a Program & the Java Programming Language
Advertisements

INSTRUCTION SET ARCHITECTURES
Smita Thaker 1 Polymorphic & Metamorphic Viruses Presented By : Smita Thaker Dated : Nov 18, 2003.
Fuzzing and Patch Analysis: SAGEly Advice. Introduction.
TaintScope: A Checksum-Aware Directed Fuzzing Tool for Automatic Software Vulnerability Detection Tielei Wang 1, Tao Wei 1, Guofei Gu 2, Wei Zou 1 1 Peking.
Programming Languages Marjan Sirjani 2 2. Language Design Issues Design to Run efficiently : early languages Easy to write correctly : new languages.
Dec 5, 2007University of Virginia1 Efficient Dynamic Tainting using Multiple Cores Yan Huang University of Virginia Dec
Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software Paper by: James Newsome and Dawn Song.
David Brumley, Pongsin Poosankam, Dawn Song and Jiang Zheng Presented by Nimrod Partush.
Bouncer securing software by blocking bad input Miguel Castro Manuel Costa, Lidong Zhou, Lintao Zhang, and Marcus Peinado Microsoft Research.
Hastings Purify: Fast Detection of Memory Leaks and Access Errors.
Malicious Logic What is malicious logic Types of malicious logic Defenses Computer Security: Art and Science © Matt Bishop.
CMSC 414 Computer and Network Security Lecture 24 Jonathan Katz.
Fuzzing Dan Fleck CS 469: Security Engineering Sources:
TaintCheck and LockSet LBA Reading Group Presentation by Shimin Chen.
Assembly Language for Intel-Based Computers, 4 th Edition Chapter 6: Conditional Processing (c) Pearson Education, All rights reserved. You may modify.
1 ICS 51 Introductory Computer Organization Fall 2006 updated: Oct. 2, 2006.
LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan.
Chapter 3: Introducing the Microsoft.NET Framework and Visual Basic.NET Visual Basic.NET Programming: From Problem Analysis to Program Design.
Starting Out with C++: Early Objects 5/e © 2006 Pearson Education. All Rights Reserved Starting Out with C++: Early Objects 5 th Edition Chapter 1 Introduction.
Chapter 8.  Cryptography is the science of keeping information secure in terms of confidentiality and integrity.  Cryptography is also referred to as.
Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.
David Evans CS201j: Engineering Software University of Virginia Computer Science Lecture 18: 0xCAFEBABE (Java Byte Codes)
CSC 125 Introduction to C++ Programming Chapter 1 Introduction to Computers and Programming.
A New Fuzzing Technique for Software Vulnerability Testing IEEE CONSEG 2009 Zhiyong Wu 1 J. William Atwood 2 Xueyong Zhu 3 1,3 Network Information Center.
Dynamic Test Generation To Find Integer Bugs in x86 Binary Linux Programs David Molnar Xue Cong Li David Wagner.
1/2002JNM1 Basic Elements of Assembly Language Integer Constants –If no radix is given, the integer is assumed to be decimal. Int 21h  Int 21 –A hexadecimal.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Computer Security and Penetration Testing
TaintScope: A Checksum-Aware Directed Fuzzing Tool for Automatic Software Vulnerability Detection Tielei Wang 1,2, Tao Wei 1,2, Guofei Gu 3, Wei Zou 1,2.
Attacking Applications: SQL Injection & Buffer Overflows.
Input, Output, and Processing
1 C - Memory Simple Types Arrays Pointers Pointer to Pointer Multi-dimensional Arrays Dynamic Memory Allocation.
EECS 354 Network Security Reverse Engineering. Introduction Preventing Reverse Engineering Reversing High Level Languages Reversing an ELF Executable.
COMPUTER SECURITY MIDTERM REVIEW CS161 University of California BerkeleyApril 4, 2012.
Jose Sanchez 1 o Tielei Wang†, TaoWei†, Zhiqiang Lin‡, Wei Zou†. o Purdue University & Peking University o Proceedings of NDSS'09: Network and Distributed.
Dr. José M. Reyes Álamo 1.  Review: ◦ Statement Labels ◦ Unconditional Jumps ◦ Conditional Jumps.
Part 2: Packet Transmission Packets, frames Local area networks (LANs) Wide area networks (LANs) Hardware addresses Bridges and switches Routing and protocols.
Christopher Kruegel University of California Engin Kirda Institute Eurecom Clemens Kolbitsch Thorsten Holz Secure Systems Lab Vienna University of Technology.
TaintScope Presented by: Hector M Lugo-Cordero, MS CAP 6135 April 12, 2011.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
CSCE 548 Integer Overflows Format String Problem.
Buffer Overflow Proofing of Code Binaries By Ramya Reguramalingam Graduate Student, Computer Science Advisor: Dr. Gopal Gupta.
CNIT 127: Exploit Development Ch 1: Before you begin.
Chapter 1 : Overview of Computer and Programming By Suraya Alias
Argos Emulator Georgios Portokalidis Asia Slowinska Herbert Bos Vrije Universiteit Amsterdam.
Assembly 03. Outline inc, dec movsx jmp, jnz Assembly Code Sections Labels String Variables equ $ Token 1.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
Formal Refinement of Obfuscated Codes Hamidreza Ebtehaj 1.
Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software Paper by: James Newsome and Dawn Song.
Introduction to Computer Programming using Fortran 77.
C++ Programming: From Problem Analysis to Program Design, Fifth Edition Chapter 1: An Overview of Computers and Programming Languages.
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Paradyn Project Safe and Efficient Instrumentation Andrew Bernat.
Cryptographic Hash Function. A hash function H accepts a variable-length block of data as input and produces a fixed-size hash value h = H(M). The principal.
Application Communities
Topics Designing a Program Input, Processing, and Output
Automatic Network Protocol Analysis
Assembly Language Ms. V.Anitha AP/CSE SCT
Techniques, Tools, and Research Issues
Introduction to Information Security
Topics Designing a Program Input, Processing, and Output
CSC-682 Advanced Computer Security
Topics Designing a Program Input, Processing, and Output
CS5123 Software Validation and Quality Assurance
IntScope: Automatically Detecting Integer overflow vulnerability in X86 Binary Using Symbolic Execution Tielei Wang, TaoWei, ZhingiangLin, weiZou Purdue.
Review of Previous Lesson
Instructor Materials Chapter 5: Ensuring Integrity
Computer Architecture and System Programming Laboratory
Return-to-libc Attacks
Presentation transcript:

TAintscope A Checksum-Aware Directed fuzzing Tool for Automatic Software Vulnerability Detection Tielei Wang1, Tao Wei1, Guofei Gu2, Wei Zou1 1Peking University, China 2Texas A&M University, US

2 terms Checksum – a way to check the integrity of data. Used in network protocols and files. Fuzzing – generating malformed inputs and feeding them to the application. Dynamic Taint Analysis – runs a program and observes which computations are affected by predefined taint sources (e.g. input) data Checksum field Checksum function

The problem The input mutation space is enormous . 3 The problem The input mutation space is enormous . Most malformed inputs dropped at an early stage, if the program employs a checksum mechanism.

The problem 1 void decode_image(FILE* fd){ 2 ... 4 The problem 1 void decode_image(FILE* fd){ 2 ... 3 int length = get_length(fd); 4 int recomputed_chksum = checksum(fd, length); 5 int chksum_in_file = get_checksum(fd); //line 6 is used to check the integrity of inputs 6 if(chksum_in_file != recomputed_chksum) 7 error(); 8 int Width = get_width(input_file); 9 int Height = get_height(input_file); 10 int size = Width*Height*sizeof(int); 11 int* p = malloc(size); 12 ... 13 for(i=0; i<Height; i++){// read ith row to p 14 read_row(p+Width*i, i, fd);

The IDEA 5 To infer whether/where a program checks the integrity of input. Identify which input bytes can flow into sensitive points: Taint analysis at byte level – monitors how application uses the input data. Create malformed input focusing the “hot bytes”. Repair checksum fields in input, to expose vulnerability. Fully automatic Found 27 new vulnerability – acrobat reader, google picasa and more.

How does it work? Dynamic taint tracing Detecting checksum 6 How does it work? Dynamic taint tracing Detecting checksum Directed fuzzing Repairing crashed samples

How does it work? Checksum Locator Directed Fuzzer Checksum Repairer 7 How does it work? Modified Program Crashed Samples Checksum Locator Directed Fuzzer Checksum Repairer Instruction Profile Hot Bytes Info Reports Execution Monitor

How does it work? Dynamic taint tracing 8 How does it work? Dynamic taint tracing Runs the program with well-formed input. Execution monitor records: Which input bytes related to arguments of API functions (e.g. malloc, strcpy) – “hot bytes” report. Which bytes each conditional jump instruction depends on (e.g. JZ, JE, JB) – checksum report. Considering only data flow (no control flow).

How does it work? Dynamic taint tracing eax {0x6, 0x7}, ebx {0x8, 0x9} Instruments instructions – movement (e.g. MOV, PUSH), arithmetic (e.g. SUB, ADD), logic (e.g. AND, XOR) Taints all values written by an instruction with union of all taint labels associated with values used by that instruction. Considering also eflags register. eax {0x6, 0x7}, ebx {0x8, 0x9} add eax, ebx eax {0x6, 0x7, 0x8, 0x9}, eflags {0x6, 0x7, 0x8, 0x9}

How does it work? Dynamic taint tracing - EXAMPLE 10 How does it work? Dynamic taint tracing - EXAMPLE Input size is 1024 bytes “hot bytes” report: 8 int Width = get_width(input_file); 9 int Height = get_height(input_file); 10 int size = Width*Height*sizeof(int); 11 int* p = malloc(size); … 0x8048d5b: invoking malloc: [0x8,0xf]

How does it work? Dynamic taint tracing - EXAMPLE 11 How does it work? Dynamic taint tracing - EXAMPLE Input size is 1024 bytes checksum report: 6 if(chksum_in_file != recomputed_chksum) 7 error(); … 0x8048d4f: JZ: 1024: [0x0,0x3ff]

How does it work? 2. Detecting checksum Checksum detector: 12 How does it work? 2. Detecting checksum Checksum detector: identify potential checksum check points the recomputed checksum value depends on many input bytes Instruments conditional jump. Before execution, checks whether the number of marks associated with eflags register exceeds a threshold. Problem with decompressed bytes.

How does it work? 2. Detecting checksum Refinement: 13 How does it work? 2. Detecting checksum Refinement: Well-formed inputs can pass the checksum test, but most malformed inputs cannot

How does it work? 2. Detecting checksum Refinement: 14 How does it work? 2. Detecting checksum Refinement: Well-formed inputs can pass the checksum test, but most malformed inputs cannot Run well-formed inputs, identify the always-taken and always-not-taken instructions.

How does it work? 2. Detecting checksum Refinement: 15 How does it work? 2. Detecting checksum Refinement: Well-formed inputs can pass the checksum test, but most malformed inputs cannot Run well-formed inputs, identify the always-taken and always-not-taken instructions. Run malformed inputs, also identify the always-taken and always-not-taken instructions.

How does it work? 2. Detecting checksum Refinement: 16 How does it work? 2. Detecting checksum Refinement: Well-formed inputs can pass the checksum test, but most malformed inputs cannot Run well-formed inputs, identify the always-taken and always-not-taken instructions. Run malformed inputs, also identify the always-taken and always-not-taken instructions. Identify the conditional jump instructions that behaves completely different when processing well-formed and malformed inputs.

How does it work? 2. Detecting checksum Checksum detector: 17 How does it work? 2. Detecting checksum Checksum detector: Creates bypass rules – always-taken, always-not-taken 6 if(chksum_in_file != recomputed_chksum) 7 error(); … 0x8048d4f: JZ: 1024: [0x0,0x3ff] 0x8048d4f: JZ: always-taken

How does it work? 2. Detecting checksum Checksum detector: 18 How does it work? 2. Detecting checksum Checksum detector: Checksum field identification Input bytes that affects chksum_in_file are the checksum field. 6 if(chksum_in_file != recomputed_chksum) 7 error();

How does it work? 3. Directed fuzzing 19 How does it work? 3. Directed fuzzing Generates malformed test cases – feeds them to the original or instrumented program. According to the bypass rules, alters the execution traces at check points – sets the eflags register.

How does it work? 3. Directed fuzzing 20 How does it work? 3. Directed fuzzing All malformed test cases are constructed based on the “hot bytes” information Using attack heuristics: bytes that influence memory allocation are set to small, large or negative. bytes that flow into string functions are replaced by characters such as %n, %p. Output – test cases that could cause to crash or consume 100% CPU.

How does it work? 3. Directed fuzzing 21 How does it work? 3. Directed fuzzing 6 if(chksum_in_file != recomputed_chksum) 7 error(); 8 int Width = get_width(input_file); 9 int Height = get_height(input_file); 10 int size = Width*Height*sizeof(int); 11 int* p = malloc(size); … 0x8048d4f: JZ: 1024: [0x0,0x3ff] Checksum report … 0x8048d5b: invoking malloc: [0x8,0xf] “hot bytes” report Bypass info 0x8048d4f: JZ: always-taken

How does it work? 3. Directed fuzzing 22 How does it work? 3. Directed fuzzing 6 if(chksum_in_file != recomputed_chksum) 7 error(); 8 int Width = get_width(input_file); 9 int Height = get_height(input_file); 10 int size = Width*Height*sizeof(int); 11 int* p = malloc(size); Before executing 0x8048d4f, the fuzzer sets the flag ZF in eflags to an opposite value … 0x8048d4f: JZ: 1024: [0x0,0x3ff] Checksum report … 0x8048d5b: invoking malloc: [0x8,0xf] “hot bytes” report Bypass info 0x8048d4f: JZ: always-taken

How does it work? 4. Repairing crashed samples 23 How does it work? 4. Repairing crashed samples Fixing is expensive - fixes checksum fields only in test cases that caused crashing. How? Cr – row data in the checksum field D – input data protected by checksum filed Checksum() – the complete checksum algorithm T – transformation We want to pass the constraint: Checksum(D) == T(Cr)

How does it work? 4. Repairing crashed samples 24 How does it work? 4. Repairing crashed samples Using symbolic execution to solve: Checksum(D) is a runtime determinable constant: Only Cr is a symbolic value. Common transformations (e.g. converting from hex/oct to decimal), can be solved by existing solvers (STP). Checksum(D) == T(Cr) c== T(Cr)

How does it work? 4. Repairing crashed samples 25 How does it work? 4. Repairing crashed samples If the new test case cause the original program to crash, a potential vulnerability is detected!

26 evaluation An incomplete list of applications:

27 evaluation “hot bytes” identification results – memory allocation

28 evaluation Checksum identification results: Threshold = 16

29 evaluation Correct checksum fields:

evaluation 27 previous unknown Vulnerabilities: MS Paint ImageMagick 30 27 previous unknown Vulnerabilities: MS Paint ImageMagick Adobe Acrobat Google Picasa irfanview gstreamer Winamp XEmacs Amaya dillo wxWidgets PDFlib

31 evaluation Vulnerabilities detected by TaintScope:

32 Discussion TaintScope cannot deal with secure integrity check schemes (e.g. cryptographic hash algorithms, digital signature) – impossible to generate valid test cases. Limited effectiveness when all input data are encrypted (tracking decrypted data). Checksum check points identification can be affected by the quality of inputs. Not tracks control flow propagation. Not all instructions of x86 are instrumented by the execution monitor.

Conclusion TaintScope can perform: Directed fuzzing 33 Conclusion TaintScope can perform: Directed fuzzing Identify which bytes flow into system/library calls. dramatically reduce the mutation space. Checksum-aware fuzzing Disable checksum checks by control flow alternation. Generate correct checksum fields in invalid inputs.

34 questions