Presentation is loading. Please wait.

Presentation is loading. Please wait.

Segmented Symbolic Analysis Wei Le Rochester Institute of Technology.

Similar presentations


Presentation on theme: "Segmented Symbolic Analysis Wei Le Rochester Institute of Technology."— Presentation transcript:

1 Segmented Symbolic Analysis Wei Le Rochester Institute of Technology

2 Motivation Symbolic analysis has many important applications in software tools [Sen, Marinov, Agha ‘05] [Godefroid, Klarlund, Sen ‘05] [Le, Soffa ’08] [Chipounov, Kuznetsov, Candea ‘12] Compared to testing with concrete input: better coverage Compared to other static techniques: more precise Will continue being a powerful tool due to improved scalability [Chipounov, Kuznetsov, Candea ‘12]

3 Challenges of Symbolic Analysis Loops: Can have an statically unknown bound Library calls: the source code of a library is typically not available at compile time

4 Previous Solutions Loops - very small state space is covered – Iterate once [Cadar, Dunbar, Engler ‘ 08] [Chipounov, Kuznetsov, Candea ‘12] – Report unknown [Xie, Chou, Engler ‘03] – Pattern matching [Saxena, Poosankam, McCamant, Song ‘00] Library calls – imprecise and manual effort – A concrete value [Sen, Marinov, Agha ‘05] [Godefroid, Klarlund, Sen ‘05] – Manually constructed models (e.g., simplified C implementation) [Bush, Pincus, Sielaff ‘00] [Chipounov, Kuznetsov, Candea ‘12]

5 Segmented Symbolic Analysis - Insights Code is not uniformly easy to analyze We should leverage the structural and semantic relations between statements to partition a program and apply different analyses accordingly The capabilities of static analysis are limited; we should introduce dynamic analysis to supply information that a pure static symbolic analyzer is slow or unable to produce

6 Overall Approach Perform symbolic analysis When an unknown occurs, identify code segments that cause unknown Construct unit tests and automatically generate inputs Run tests, perform dynamic inference to generate symbolic rules and symbolic values (transfer functions) Resume symbolic analysis using inferred rules

7 Novelty of the Work Weave static and dynamic analyses on demand on a concurrent framework Dynamic analysis is fully automatic (not running the entire program but on code segments) Aggregated information from multiple runs: regression analysis 1.Programs mostly consist of linear operations [Knuth’71] [Halbwachs, Proy, P. Roumanoff ‘97] 2.Determining program properties often only requires linear constraints [Halbwachs, Y.-E. Proy, and P. Roumanoff ’97] [[Xie, Chou, Engler ‘03] 3.We assume that linear relations can characterize relevant behavior of small code segments

8 Overview using an Example

9 struct stat s char filename[32] char* temp = argv[1] int i = 0 *temp != ‘\0’ filename[i] = *temp++ i++ strcat(filename, “, ”) t == 0 t =_stat64i32(filename,&s) yes no yes Library Loop 32 > Len(filename)+1 1 2 3 4 5 6 7 8 9 10 Segmented SA Traditional SA Library Unknown Traditional SA with Library Models 32 > Len(filename)+1 Loop Unknown 32 > Len(filename)+1 Len(filename’) = Len (temp) 32 > Len(temp)+1 32 > Len(argv[1])+1 Buffer Overflow Len (filename’) = Len(filename)

10 //initialize with test inputs char* temp = _GenChars(test_buf); char* filename = _GenChars(test_buf); //code segment for the loop int i = 0; while(*temp != '\0'){ filename[i] = *temp++; i++; } //output Len(filename) char* _result = _GenChars(g_buf); int _rint = strlen(filename); itoa(_rint, _result, 10); fputs(_result, fp); // cleanup … Unit Test to Infer the Loop

11 Reduce to Regression Analysis TestTest InputTest Input Transformed for RAOutput tempfilenameLen(temp)Len(filenname)Len(filename’) 1acdepiidaf464 2tazipadqdd737 3addafdalfll292

12 Internal Design and Components

13 qqq …… Solved Solving q Unknown Solved Not FoundNew Rules Symbolic Analysis & Partition Program for Unknown Test Synthesizer Inference Engine Inference Repository Request Respond Dynamic Inference On Demand The Helium framework

14 Components on the Helium Framework Static component: - Perform demand-driven, path-sensitive symbolic analysis - Isolate the code segment that causes unknown - Determine the environment for the code segment

15 V: Inquiry Transfer Func Test Input Code Unit Test Output Request Respond Inference Dynamic Inference E: Env C: Code Symbolic Analysis Interaction Protocol

16 Test Synthesizer Construct a Unit Test from Program Segment Code Segment Determine Test Input Variables Determine Test Output Variables Construct Runnable Code Select Code Segment

17 Inference via Regression Input TransformationModel Selection Simple, Multiple, Polynomial Linear Piecewise Linear Data for Explanatory Variables Data for Response Variables Linear Symbolic Rules Dynamic Inference as Regression Analysis Y = X 0 + a 1 X 1 + a 2 X 2 … + a n X n

18 Explanatory Models for Representing Code Semantics (SUPPOSE a: OUTPUT VAR, b, c, d: INPUT VARS) ModelsExamples Constanta = 0 Simple Lineara = b Multiple Lineara = 2*b + c Polynomial Lineara = b^2 + c*d Piece-wise Linearif b > 0 a = b, else a = 3

19 Experimental Setup 1.Implementation - Phoenix and Disolver, analyzing C/C++/C# A traditional symbolic analysis that gives up in loops and library calls Segmented symbolic analysis Applications of both symbolic analyses to detect infeasible paths and buffer overflows 2.Research Questions: Can we find useful symbolic rules and values? Are we improving the detection capabilities for infeasible paths and buffer overflows? What are the capabilities of segmented symbolic analysis? Is the technique still scalability and practical?

20 Experimental Results: Compare the two ProgramOverflowUnknownInfeasibleUnknown SAS-SASAS-SASAS-SASAS-SA wu-ftpd03751243 sendmail0318161166 polymorph27623454 gzip1525219112422 grep116614151917 tightvnc001211553432 putty01605430317270 snort01353455967147124

21 Dynamic Inference for Buffer Overflow ProgramSegmentsRunnableAnalyzableInferred Rules LoopLibLoopLibLoopLib wu-ftpd6350 033112 sendmail72672571679 polymorph01901801762 gzip563361357197 grep27050511 tightvnc81105024 putty18421029102282 snort37471140925148

22 Performance ProgramsizeSymbolic AnalysisSegmented Symbolic Analysis klocT-infT-bufT-infThread-infT-bufThread-buf wu-ftpd0.40.7 s1.5 s2.7 s6523.8 s206 sendmail0.91.0 s1.8 s11.2 s26228.9 s166 polymorph0.92.2 s1.0 s3.9 s6143.3 s96 gzip5.1358.7 s3. 0 s1679.3 s271508.6 s341 grep16.921.1 s3.4 s470.1 s7179.9 s46 tightvnc45.4490.5 s18.4 s1149.9 s126596.6 s96 putty60.1331.4 s81.4 s508.4 s1011213.6 s281 snort98.8124.6 s465.6 s2009.4 s6511472.3 s421

23 Experimental Summary Improved the detection capabilities: 5 times more buffer overflows Inferred 1135 models 2/3 of the loops are eligible for size, 29.3% yields runnable unit tests, inferred models from 23.8% loops Unit tests for 81.4% library calls are runnable and models are inferred for 70.4% library calls Scalability is still practical We can handle loops that traditional symbolic analysis cannot

24 Capabilities of Segmented Symbolic Analysis Lib YesExample Stringstrcpy, strcat, strlen, strncpy, strdup File Systemschdir, getcwd, rename, unlink, stat I/Oprintf, fgets, fgetc, read Miscperror, utime, inet addr,atoi Lib NoExample String Contentstrrchr, getenv Compiler Unknownmalloc Networkrecv, gethostbyname Interactive Inputgetchar Loop NoExample Complex loopnested loop Networkrecv Interactive inputgetchar Invalid contextInvalid loop index

25 Loops We can Handle //loop handled by segment symbolic analysis for (p = name; *p != '\0'; p++){ if (isascii((int)*p) && isupper((int)*p)){ *p = tolower(*p); tryagain = TRUE; }

26 Loops We cannot Handle Yet for (n = 7; n >= 8 - pfburh->r.w % 8; n--) { rcSource[i++] = rcolors [m_netbuf[y * bytesPerRow + x] >> n & 1] ; }

27 Related Work Various symbolic analyses for bug finding, debugging [Sen, Marinov, Agha ‘05] [Godefroid, Klarlund, Sen ‘05] [Le, Soffa ’08] [Chipounov, Kuznetsov, Candea ‘12] Hybrid symbolic analysis [Sen, Marinov, Agha ‘05] [Godefroid, Klarlund, Sen ‘05] [Chipounov, Kuznetsov, Candea ‘12] Dynamic invariants discovery [Ernst, Czeisler, Griswold, Notkin ‘ 00]

28 Conclusions A novel hybrid technique that flexibly weaves static and dynamic analyses on demand for their maximum capabilities of discovering program semantic information Addressed the two key challenges : 1) partitioning a program to construct valid unit tests, and 2) mapping the problems of discovering symbolic relations between program variables to regression analysis. Fully automatic and can be generally applied for determining different program properties and for different programs.

29 Thank you and Questions?


Download ppt "Segmented Symbolic Analysis Wei Le Rochester Institute of Technology."

Similar presentations


Ads by Google