Convicting Exploitable Software Vulnerabilities: An Efficient Input Provenance Based Approach Zhiqiang Lin Xiangyu Zhang, Dongyan Xu Purdue University June 27 th, 2008 The 38th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
Motivation Internet Worms (CodeRed, Slammer) Denial of Service (DoS) User DoS Viruses, Trojan Horses, Bots (Botnet) FC Vulnerability In Software Accidental Breaches in Security
Related Work Dynamic analysis Program shepherding (V. Kiriansky et al.) TaintCheck (J. Newsome et al.) Control Flow Integrity (M. Abadi et al.) Data Flow Integrity (M. Castro et al.)… Run-time overhead, and waiting for attack Static analysis BOON (D. Wagner et al.), Splint (D. Larochelle et al.), Archer (Y. Xie et al.), RATS, Flawfinder False positive Recent automated multi-path exploration DART (P. Godefroid et al.), Cute (K. Sen et al.), EXE (C. Cadar et al.), SAGE (P. Godefroid et al.) Low Efficiency
Problem Statement and Our Technique How to more efficiently discover/convict software vulnerability An Efficient Input Provenance Based Approach Conservative static analysis => Suspect Dynamic analysis => Convicting the suspect and pruning false positives Randomly mutation is avoided No symbolic execution (can handle long execution) Key idea Data lineage tracing (Input Provenance)
Basic Idea fread(&imagehed,sizeof(imagehed),1,in);... width=(imagehed.wide_lo+256*imagehed.wide_hi) height=(imagehed.high_lo+256*imagehed.high_hi);... if((...(byte *)malloc(width*height))...) { fclose(in); return(_PICERR_NOMEM); } Input a.gif (256x128):xx...0x00 0x01 0x80 0x00... Input Data label (Offset): An image viewer: Zgv-5.8/readgif.c Integer Overflow
Architecture Static- front End Input Lineage Tracer Input Mutator Run-time Detector Program/ binary Lineage Program Input Evidence Suspect New Input A piece of instruction which is exploitable to trigger the vulnerability
Component 1. Input Lineage Tracer Label the input stream (using the offset) Track their propagation mov 0xfffffffc(%ebp),%eax mov %eax, 0xfffffff8(%ebp) add %eax, %ecx mov %ecx, %edx
Component 1. Input Lineage Tracer Key concept Data Dependency (direct propagation) Control dependency (indirect propagation) 1. b=a; 1. if (a==1) 2. b=1; 3. else 4. c=0; mov 0xfffffffc(%ebp),%eax mov %eax,0xfffffff8(%ebp) b=a cmpl $0x1,0xfffffffc(%ebp) jne d movl $0x1,0xfffffff8(%ebp) movl $0x0,0xfffffff4(%ebp) jmp a==1 b=1 c=0
Component 1. Data Lineage Tracer DL(S i i ) i ) = get_new_id() if def is an input value U DL(use i )otherwise Input data tracking (labeled with its offset in the input stream) DL Representation: reduced ordered Binary Decision Diagram (roBDD)
Component 1. Data Lineage Tracer An Example fread(&imagehed,sizeof(imagehed),1,in);... width=(imagehed.wide_lo+256*imagehed.wide_hi) height=(imagehed.high_lo+256*imagehed.high_hi);... if((...(byte *)malloc(width*height))...) { fclose(in); return(_PICERR_NOMEM); } READ (buf,size,...), 0<= i < size, buf[i], = get_new_id() = = {7} = U = {6; 7} = U = {8; 9} = {6;7;8;9}
Component 2. Input Mutator Program Input Data Lineage Evidence Heuristics#1: Buffer overflow mutation (double buffer size …) Heuristics#2: Format string mutation (replace %s in format string argument) Heuristics#3: Integer overflow mutation (Boundary integer value: 0xffffffff,0,0x0fffffff) … Suspect
Implementation Diablo: Control flow graph Statically generate Control dependency to facilitate Valgrind instrumentation Valgrind: Lineage tracing RoBDD (Reduced ordered Binary Decision Diagram) to represent the data lineage.
Evaluation - Effectiveness Static Detector Known vulnerability CVE (ncompress 4.2.4, SO) CVE (gzip 1.2.4, SO) CVE (Nullhttpd 0.50, HO) CVE (lhttpd 0.1, SO) CVE (wu-ftpd-2.6.0, Format String) CVE (cfingerd-1.4.3, Format String) CVE (ngircd-0.8.2, Format String) CVE (xzgv-0.8, IO & HO) CVE (GnuPG 1.4.3, IO & HO) RATS (Unknown) Make extension to catch: buffer overflow, integer overflow (ipgrab-0.99, epstool-3.3, dcraw-7.94)
Evaluation - CVE (GnuPG 1.4.3) GnuPG Parse_User_ID Remote Buffer Overflow Vulnerability pktlen=in[2,3,4,5] =0x ff ff ff ff
Evaluation - CVE (Cfingerd-1.4.3) syslog(LOG_NOTICE, "%s", (char *) syslog_str);
Evaluation - Ipgrab-0.99 (A New VUL)
Evaluation – Performance (Lineage Tracing) Platform: two 2.13 Ghz Pentium processors and 2G RAM running the Linux kernel
Evaluation - Performance
Evaluation - Space
Summary An input lineage tracing and mutation system: Capable of convicting known and unknown vulnerability. Has reasonable overhead for the scenario of offline vulnerability conviction. Static-front End Data Lineage Tracer Input Mutator Run-time Detector Program/ binary Lineage New Input Program Input Evidence Suspect
Thank you For more information: {zlin, xyzhang, Q & A