ASIACCS 2007 AutoPaG: Towards Automated Software Patch Generation with Source Code Root Cause Identification and Repair Zhiqiang Lin 1,3 Xuxian Jiang 2, Dongyan Xu 3, Bing Mao 1, Li Xie 1 1Nanjing University 2George Mason University 3Purdue University March 22nd, 2007
Agenda Motivation Design & Implementation Evaluation Related Work Conclusion
Lifecycle of a vulnerability time I. Vulnerability Introduced II. Vulnerability Discovered III. Official Patch released IV. Patch Installed A rather lengthy process
Manual process is too slow time I. Vulnerability Introduced II. Vulnerability Discovered III. Official Patch released IV. Patch Installed 28 days http://www.symantec.com /enterprise/threatreport/index.jsp 75 The time-lines of 10 recent Microsoft patches (MS06-045 to MS06-054) that are released between August and September 2006
Goal of AutoPaG For fast and spreading attack (e.g., zero-day) time I. Vulnerability Introduced II. Vulnerability Discovered III. Official Patch released IV. Patch Installed time
I. Vulnerability Introduced Goal of AutoPaG For fast and spreading attack (e.g., zero-day) Make the whole thing automated (1) Find/Identify the root cause of the vulnerability (2) Fix/repair it automatically Generate temporary source code patch (3) Facilitate official patch development time I. Vulnerability Introduced II (III) (IV)
Overview of AutoPaG Note: we currently focus on the out-of-bound vulnerability, the most common and severe one, but our system is also practical to other vulnerabilities, e.g, format string
1. Out-of-Bound Detector (1/2) Challenges: Detect exploitation Provide root cause context information Where is the direct root cause statement? Which variable or data is overflowed? A toy example The statement (source code) or instructions (binary code) which directly causes the attack or memory corruption 1 #include <string.h> 2 int main(int argc, char **argv) { 3 char buf[4]; 4 char *p; 5 p = buf; 6 strcpy(p, argv[1]); 7 return 0; 8 } Root Cause
1. Out-of-Bound Detector (2/2) How Modify CCured + Call Stack #0 0x0804b0fb in ccured_fail_str (str=0x805cc73 "Ubound", file=0x805cc12 "lib/ccuredlib.c", line=3941, function=0x805daa5 "__read_at_least_f") at lib/ccuredlib.c:909 #1 0x0804b15d in ccured_fail (msgId=3, file=0x805cc12 "lib/ccuredlib.c", line=3941, function=0x805daa5 "__read_at_least_f") at lib/ccuredlib.c:923 #2 0x0804fa0f in __read_at_least_f (ptr={_p = 0xbfaa9f90, _e = 0xbfaa9f94}, n=11) at lib/ccuredlib.c:3941 #3 0x0804fa75 in __copytags_ff (dest={_p = 0xbfaa9f90, _e = 0xbfaa9f94}, src={_p = 0xbfaabed2, _e = 0xbfaabedd}, n=11) t lib/ccuredlib.c:3947 #4 0x0804a0dc in strcpy_wrapper_sff (dest=0xbfaa9f90 "", dest_e=0xbfaa9f94, src=0xbfaabed2 "aaaaaaaaaa", src_e=0xbfaabedd) at string_wrappers.h:79 #5 0x0804a006 in main (argc=2, __argv_input=0xbfaaa014) at test.c:6 1 #include <string.h> 2 int main(int argc, char **argv) { 3 char buf[4]; 4 char *p; 5 p = buf; 6 strcpy(p, argv[1]); 7 return 0; 8 }
2. Root Cause Locator s0Set: v0Set: sSet: vSet: Challenge: How: Find out those statements (in source code) that contribute to the computation of the overflow Catch the transitive closure of the overflowed data How: Backward data dependency analysis strcpy(p, argv[1]); s0Set: sSet: vSet: v0Set: main:p 1 #include <string.h> 2 int main(int argc, char **argv) { 3 char buf[4]; 4 char *p; 5 p = buf; 6 strcpy(p, argv[1]); 7 return 0; 8 } For the root cause locator, Challenge: Find out those statements (in source code) that contribute to the computation of the overflow. Catch the transitive of the overflowed data How: Backward data dependency analysis. Also, for the given example, our algorithm work as follows: based on the information provided by detector, we know that we should examine the code from line 6, and then the algorithm begins to work. Initially, the s0Set contains the identified statement, and the v0set contains the overflowed variable. When the algorithm examines line 6, since argv[1] has a dependency to p, it adds the argv[1] to vSet, and then for line 5 p=buf, we know buf has a backward dependency to p, and then we add this statement as well as buf to the relevant set. And then it goes through to line 4, since it is a declaration statement for p, we add it.. At last, we will get the sSet as this… And only these code is related to the detected vulnerability, and should be examined by our generator. General, powerful, large scale…. Simple p = buf; char *p; char buf[4]; main:argv[1] main:buf
3. Patch Generator Attempt to automatically repair the vulnerability. Challenges Determining vulnerable buffer boundaries Keep track of the meta-data with the identified variables Fixing out-of-bound access
Generated Patch: An example 1 #include <string.h> 2 int main(int argc, char **argv) { 3 char (__FSEQ buf)[4]; 4 char * __FSEQ p; 5 unsigned int __cil_tmp6; 6 char *__FSEQ __cil_tmp7; 7 void *p_e14; 8 void *__cil_tmp7_e15; 9 p_e14=(void*)0; 10 p=(char*) 0; 11 __cil_tmp7=buf; 12 __cil_tmp7_e15=buf+4; 13 p=__cil_tmp7; 14 __cil_tmp6 = cil_tmp7_e15 - __cil_tmp7; 15 strncpy(p, argv[1], __cil_tmp6) 16 return 0; 17} 5: p = buf; Transform… These piece of code is for the fixing, and meta data tracking. And some of them are for keeping the original program semantics. E.g., p=buf. We implement the transformation via CIL, C Intermediate Language
Effectiveness
Performance of generated patch
Related Work Proactive Source Transformation FOC[Rinard04], DIRA[Smirnov & Chiueh04] Just-In-Time Execution Filtering TaintCheck[Newsome&Song05], DACODA[Crandall05], VSEF[Newsome&Song06], Argos[Portokalidis06] … Reactive Runtime Patching DYBOC[Sidiroglou & Keromytis 04], STEM[Sidiroglou & Keromytis 05] OK, let’s examine the related work. We divided the most related work to 3 categories. (1) Proactive source transformation (2) Just in time execution filtering (3) Reactive runtime patching For proactive source transformation, they do a heavily instrumentation and impose considerably high performance overhead (e.g., 1X-8X slowdown in FOC). They did not to investigate the vulnerability behind the attack . For just in time execution filtering,. A number of systems have been developed in this category. AutoPaG takes a different approach from these systems .Instead of focusing on the detection and prevention of an attack at the machine instruction level, AutoPaG is more intended to automatically walk through the program source code and then identify and patch those relevant source state-ments that directly or indirectly \contribute" to the de-tected vulnerability. For reactive runtime patching.. AutoPaG has a different goal. Instead of patching current execution during runtime to recovery from an attack, Au-toPaG focuses on the vulnerability exploited by the attack by locating those relevant source code statements and gener-ating a patch at the source code level. Note that an existing software vulnerability will ultimately require a source patch to fix it, which is the intended goal of AutoPaG. AutoPaG generates source level patch…
Conclusion Towards automated source code patch generation AutoPaG Effective Fast Low overhead
{zlin, dxu}@cs.purdue.edu Q & A Thank you For more information: {zlin, dxu}@cs.purdue.edu xjiang@gmu.edu Google: “AutoPaG”