Presentation is loading. Please wait.

Presentation is loading. Please wait.

Error Checking with Client-Driven Pointer Analysis

Similar presentations


Presentation on theme: "Error Checking with Client-Driven Pointer Analysis"— Presentation transcript:

1 Error Checking with Client-Driven Pointer Analysis
Original paper by Samuel Z. Guyer, Calvin Lin. Presentation for 2007/a Pointer Analysis seminar by Alex Shapiro

2 Plan Rationale Existing solutions Our client-driven algorithm
Applications to error checking Results

3 Rationale Pointer analysis is slow
Performance requires accuracy tradeoffs Accurate results are desired Existing approaches provide coarse granularity Context & Flow sensitivity Lack of precision for specific scenarios

4 Rationale – continued Software is complex
The rule is often in effect. “10% of the program require 90% of the effort to analyze” A good algorithm would recognize these 10% Different requirements need different levels of precision Example: char* safe_string_copy(char* s) { if( s != 0) return strdup(s); else return 0; } p = safe_string_copy(“Foo”) q = safe_string_copy(“Bar”) r = safe_string_copy(NULL);

5 Iterative Flow Analysis
Adjusts precision based on quality of results Handles polymorphic functions def power(x, y): if y > 0: return x*power(x,y-1) else: return 1 power(1, 2) # int, int power(1.0, 2) # int, float Function splitting: context sensitivity Similar approach to polymorphic object creation Drawbacks No flow sensitivity

6 Demand-driven Pointer Analysis
Computes points-to set for a subset of program variables Only this subset is considered Drawbacks: If nothing else is required, none Nothing is learned about the rest of the variables

7 Demand Interprocedural Dataflow Analysis
Produces precise results for a subset of dataflow analysis problems Solves liveness, reachability, constant propagation and others Drawback: does not solve the general case of pointer analysis

8 Combined Pointer Analysis
Combines different algorithms All assignments are split to classes Heuristic is used to pick the analysis algorithm Flow-Sensitive, Context-Sensitive aliasing Flow-Insensitive, Context-Insensitive aliasing Flow-Insensitive, Context-Insensitive points-to Slower, more precise algorithms used for ‘easier’ parts Faster algorithms used for parts where slower algorithms fail Drawback: heuristics do not mirror the real-world demands

9 Client-Driven Pointer Analysis
Couple pointer analysis with the error-checking client First analysis pass produces coarse results Client determines places where precision is needed Second analysis pass focuses on these places Client produces more precise results

10 Client-Driven Pointer Analysis
Initial Policy Memory Accesses Query Pointer analysis Client analysis Adaptor Information Loss Dependence Graph Monitor Adjusted Policy Results Memory Accesses Pointer analysis Client analysis

11 Taster of Things to Come
A short program x_0 = x_1 = … = x_n = &a; y = NULL; copy_0(x_0); … copy_0(x_n); copy_0(y); copy_1(x_0); … copy_1(x_1); copy_m(x_0); … copy_m(x_n); Problem: copy should not be called on NULL pointer copy_k is complex Context insensitive algorithm: “copy_0 is unsafe” Context sensitive algorithm: N*M copies of functions, “copy_0(y) is unsafe” Client-driven algorithm: “copy_0 is unsafe” Client: “I need more information about copy_0, use CS” Makes N copies of copy_0 only, returns “copy_0(y) is unsafe”

12 Program Representation
Control-flow graph Function context sensitivity through cloning Example: remote execution vulnerability read main stdin socket execl socket main stdin execl execl read read execl execl read execl read read execl

13 Use-Def Chains Short reminder (maybe) How to represent value flows?
Chain(s) of assignments stemming from a source 0: x = 5; // x: {0} 1: y = x + 3; // x: {0->{1}}, y:{1} 2: z = y + x; // x: {0->{1,2}}, y:{1->{2}} 3: x = 7; // x: {0->{1,2}, 3} 4: t = x; // x: {0->{1,2}, 3->{4}}, t:{4} What do we now know? Value of t must have come from value of x at line 3 Value of y at line 4 is independent from value of x at 3 Simple way to track dependencies for flow-sensitivity

14 Object Representation
Each structure is a node Each field in a structure is a node Each instantiation contains a copy of all the nodes Arrays are represented with a single node If flow-sensitivity is required, use-def chains are computed Separate chains for each flow

15 Context and Flow Sensitivity
Default to context-insensitivity and flow-insensitivity Decision made by client feedback Context-sensitive procedures retain a separate copy of local variables Flow-insensitive parts are still followed in program statement order to provide more precision p = &x; // p -> {x} q = p; // q -> {x} p = &y; // p -> {x,y}, // q -> {x,y} p = &x; // p -> {x} q = p; // q -> {x} p = &y; // p -> {x,y}, // q -> {x}

16 Clients Client-specific requirements are described using a lattice
Lattice defines types and a combining (‘meet’) function Colors = { “Red”, “Green”, “Blue” , “Yellow”, “Purple”, “Aqua”} “Red” + “Blue” = “Purple” ‘Meet’ function allows us to lose precision where required Real world example – file access errors property FileState : { Open, Closed } initially Closed procedure fopen(path, mode) { on_exit { return --> new file_stream --> new file_handle } analyze FileState { file_handle <- Open } } procedure fgets(s, size, f) { on_entry { f --> file_stream --> handle } error if (FileState : handle could-be Closed) "Error: file might be closed"; Open Closed

17 Analysis Builds the graph as described before Computes flow values
Flow values come in two flavors: Points-to set are sets of nodes in the program graph Client flow values are a lattice of custom types Pointers are represented using points-to sets Other variables use client flow values

18 Assignments and Procedure calls
For x = *p, compute points-to set R of p. Flow value of x is equal to the ‘meet’ of flow values of R For *p = y, merge flow values of y to the points-to set of p. Flow sensitivity allows us to avoid using the ‘meet’ function. Handle procedure call using series of assignments If procedure is context-insensitive, happens only once

19 Determining Context and Flow Sensitivity
Monitor the analysis, track all uses of ‘meet’ Three scenarios of precision loss Context-insensitive call Flow-insensitive assignment Control flow merge of use-def chains Pointer ambiguity scenarios x = *p, p is ambiguous *p = y, p is ambiguous Monitor the scenarios and create a dependency graph

20 Dependency graph – polluting assignments
Code Imprecision Effect Action foo(5); foo(6); Context-insensitive Param to foo = ? Add CS node ‘foo’ bar(&a); bar(&b); Param to bar  a,b Add CS node ‘bar’ x = 5; x = 6; Flow-insensitive x = ? Add FS node ‘x’ p = &a; p = &b; p  a,b Add FS node ‘p’ if(c) x = 5; else x=6; Path-insensitive No action Code Initially Effect Action x = y; y = ? x = ? Add node ‘x’ Add edge x  y p = q; q  a,b p  a,b Add node ‘p’ Add edge p  q Helps when precision loss cause needs to be identified

21 Client Query Dependency graph passes to adaptor
Client notifies the adaptor about values that contained ‘?’ Only the values that the client required for analysis! Adaptor constructs an updated context and flow sensitivity policy based on dependency graph Only the reachable nodes are included Some optimizations are done to verify need for sensitivity All nodes along the path to imprecise values are considered

22 First and Second Analysis Passes
Coarse first pass – flow and context insensitive Client query determined a set of nodes ‘of interest to us’ How to run sensitive analysis on this small set of nodes? Solution: run a coarse pass again When a node in the set is reached, switch to required level of precision Total cost: 2 runs of FI/CI algorithm Cost of FS/CS analysis on small subset of nodes Probably less than full FS/CS Client requirements are satisfied

23 Error Detection Library routines affect flow values
Simple model of the problem domain Analysis uses the model to propagate flow values, client checks Example of more complex lattices: sockets and FTP behavior property Trust : { Remote { External { Internal }}} procedure socket(domain, type, protocol) { on_exit { return --> new file_handle } analyze Trust { if (domain == AF_UNIX) file_handle <- External if (domain == AF_INET) file_handle <- Remote } property FDKind : { File, Client, Server, Pipe, Command, StdIO } procedure write(fd, buffer_ptr, size) { on_entry { buffer_ptr --> buffer; fd --> file_handle } error if ((FDKind : buffer could-be File) && (Trust : buffer could-be Remote) && (FDKind : file_handle could-be Client) && (Trust : file_handle could-be Remote)) "Error: possible FTP behavior";

24 Results Only real-world open source programs used
Bugs reported to maintainer and fixed versions compared Program Description Priv LOC CFG nodes Procedures stunnel 3.8 Secure TCP wrapper yes 2K / 13K 2264 42 pfingerd 0.7.8 Finger daemon 5K / 30K 3638 47 muh 2.05c IRC proxy 5K / 25K 5191 84 muh 2.05d 5390 pure-ftpd FTP server 13K / 45K 11,239 116 crond (fcron-2.9.3) cron daemon 9K / 40K 11,310 100 apache (core only) Web server 30K / 67K 16,755 313 make 3.75 make 21K / 50K 18,581 167 BlackHole 1.0.9 filter 12K / 244K 21,370 71 wu-ftpd 2.6.0 21K / 64K 22,196 183 openssh client 3.5p1 Secure shell client 38K / 210K 22,411 441 privoxy 3.0.0 Web server proxy 27K / 48K 22,608 223 wu-ftpd 2.6.2 22K / 66K 23,107 205 named (BIND 4.9.4) DNS server 26K / 84K 25,452 210 openssh daemon 3.5p1 Secure shell server 50K / 299K 29,799 601 cfengine 1.5.4 System admin tool 34K / 350K 36,573 421 sqlite 2.7.6 SQL database 36K / 67K 43,333 387 nn 6.5.6 News reader 36K / 116K 46,336 494

25 Results – cont’d String format vulnerabilities that could be remotely exploited

26 Condensed Results – Client Driven
At least as accurate as best fixed-precision At least as fast in most cases, small penalty factor in others – compared to best fixed-precision For smaller programs, at least as accurate as full CS-FS algorithm Adapts well to different scenarios Only a small portion of the program requires sensitive analysis – below 0.5% in most cases Found real errors in muh, wu-ftp, sshd, apache and others

27 Open Challenges Function tables – indexed by strings, pollute the call graph Library wrappers – made context sensitive, source of overhead Custom memory allocators – models differ from analysis Internal state – workaround describes ‘global’ variables in client description, analysis uses flow-sensitivity Path sensitivity not handled

28 Questions?


Download ppt "Error Checking with Client-Driven Pointer Analysis"

Similar presentations


Ads by Google