1 A Plethora of Paths Eric Larson May 18, 2009 Seattle University.

Slides:



Advertisements
Similar presentations
compilers and interpreters
Advertisements

Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
MATH 224 – Discrete Mathematics
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
Analysis of Algorithms
Inpainting Assigment – Tips and Hints Outline how to design a good test plan selection of dimensions to test along selection of values for each dimension.
Fundamentals of Python: From First Programs Through Data Structures
Bouncer securing software by blocking bad input Miguel Castro Manuel Costa, Lidong Zhou, Lintao Zhang, and Marcus Peinado Microsoft Research.
Learning Objectives Explain similarities and differences among algorithms, programs, and heuristic solutions List the five essential properties of an algorithm.
Complexity Analysis (Part I)
Aho-Corasick String Matching An Efficient String Matching.
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
Complexity (Running Time)
An Approach to Safe Object Sharing Ciaran Bryce & Chrislain Razafimahefa University of Geneva, Switzerland.
Dominance Fault Collapsing of Combinational Circuits By Kalpesh Shetye & Kapil Gore ELEC 7250, Spring 2004.
Programming Fundamentals (750113) Ch1. Problem Solving
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.
Testing Static Analysis Tools using Exploitable Buffer Overflows from Open Source Code Zitser, Lippmann & Leek Presented by: José Troche.
Software Testing Sudipto Ghosh CS 406 Fall 99 November 9, 1999.
CS102 Introduction to Computer Programming
Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.
INTRODUCTION TO COMPUTING CHAPTER NO. 06. Compilers and Language Translation Introduction The Compilation Process Phase 1 – Lexical Analysis Phase 2 –
1 Computing Software. Programming Style Programs that are not documented internally, while they may do what is requested, can be difficult to understand.
Computer Security and Penetration Testing
Chapters 7, 8, & 9 Quiz 3 Review 1. 2 Algorithms Algorithm A set of unambiguous instructions for solving a problem or subproblem in a finite amount of.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
C++ Programming: Program Design Including Data Structures, Fourth Edition Chapter 19: Searching and Sorting Algorithms.
CSC-682 Cryptography & Computer Security Sound and Precise Analysis of Web Applications for Injection Vulnerabilities Pompi Rotaru Based on an article.
Problem Solving Techniques. Compiler n Is a computer program whose purpose is to take a description of a desired program coded in a programming language.
Advanced Computer Architecture Lab University of Michigan USENIX Security ’03 Slide 1 High Coverage Detection of Input-Related Security Faults Eric Larson.
Static Program Analyses of DSP Software Systems Ramakrishnan Venkitaraman and Gopal Gupta.
CSC 211 Data Structures Lecture 13
Data Structure Introduction.
Chapter 18: Searching and Sorting Algorithms. Objectives In this chapter, you will: Learn the various search algorithms Implement sequential and binary.
An Undergraduate Course on Software Bug Detection Tools and Techniques Eric Larson Seattle University March 3, 2006.
1 Control Flow Analysis Topic today Representation and Analysis Paper (Sections 1, 2) For next class: Read Representation and Analysis Paper (Section 3)
Liang, Introduction to Java Programming, Sixth Edition, (c) 2007 Pearson Education, Inc. All rights reserved Chapter 23 Algorithm Efficiency.
27-Jan-16 Analysis of Algorithms. 2 Time and space To analyze an algorithm means: developing a formula for predicting how fast an algorithm is, based.
Searching Topics Sequential Search Binary Search.
/ PSWLAB Evidence-Based Analysis and Inferring Preconditions for Bug Detection By D. Brand, M. Buss, V. C. Sreedhar published in ICSM 2007.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Chapter 1 Software Development Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2008.
Lecture #1: Introduction to Algorithms and Problem Solving Dr. Hmood Al-Dossari King Saud University Department of Computer Science 6 February 2012.
Software. Introduction n A computer can’t do anything without a program of instructions. n A program is a set of instructions a computer carries out.
1 Program Analysis Too Loopy? Set the Loops Aside Eric Larson September 25, 2011 Seattle University.
Memory management The main purpose of a computer system is to execute programs. These programs, together with the data they access, must be in main memory.
C++ Memory Management – Homework Exercises
Advanced Computer Systems
Advanced Compiler Design
Introduction to Compiler Construction
High Coverage Detection of Input-Related Security Faults
Program Slicing Baishakhi Ray University of Virginia
Learning to Program in Python
SUDS: An Infrastructure for Creating Bug Detection Tools
Algorithm An algorithm is a finite set of steps required to solve a problem. An algorithm must have following properties: Input: An algorithm must have.
Programming Fundamentals (750113) Ch1. Problem Solving
Programming Fundamentals (750113) Ch1. Problem Solving
File I/O in C Lecture 7 Narrator: Lecture 7: File I/O in C.
Lecture 2- Query Processing (continued)
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
Data Flow Analysis Compiler Design
Programming Fundamentals (750113) Ch1. Problem Solving
Chapter 3: Selection Structures: Making Decisions
Programming Fundamentals (750113) Ch1. Problem Solving
Advanced Compiler Design
Introduction to Computer Science
Basic Concepts of Algorithm
COMP755 Advanced Operating Systems
Software Development Techniques
Presentation transcript:

1 A Plethora of Paths Eric Larson May 18, 2009 Seattle University

2 Paths Paths are commonly used in static analysis techniques. Symbolic path simulation: Simulate each path with symbolic data values Issues:  Path explosion  Illegal paths A BC D G EF

3 Format of Talk Research Questions Implementation  Analysis framework  Program slicing  Path counting algorithm  Shortcomings Results  Quantitative  Qualitative Conclusion  Answers to the research questions  Future work

4 Research Questions: Single Run / Individual Operations 1.When employing high-quality static software bug detection techniques, is it better to analyze the entire program in a single run or to look at dangerous operations individually? High-quality static software bug detection techniques:  catches most (ideally all) bugs  reports few (ideally none) false bug reports Dangerous operation: Any operation that needs to be checked for potential errors.  In this study, we consider operations that access memory to be dangerous operations

5 Single Run / Individual Operations: Tradeoffs Entire program:  Only one run  Most of the program is relevant  Big-O: 2 n Individual operations:  Many runs  More of the program is irrelevant (can be ignored)  Big-O: s x 2 m Key question: To what extent is m < n?

6 Research Questions: Program Slicing 2.What is the effectiveness of program slicing in reducing the number of paths? Program slicing removes statements not relevant to the property. Obtain path counts with different slicing criterion:  all statements (no slicing)  all dangerous statements  all dangerous statements within a function  one individual dangerous statement

7 Research Questions: Path Explosion 3.What types of tasks lead to path explosion? Is slicing more or less effective on particular tasks? Quantitative and qualitative analysis across 15 different programs.

8 Analysis Framework Uses modified version of SUDS (SCAM 2007)  Operates on the whole program  Analyzes programs written in C 1.Performs traditional analyses  Simplification  Control flow graph / call graph  Pointer analysis (flow-sensitive)  Data flow analysis 2.Program slicing (next slide) 3.Path counting (slide after next)

9 Program Slicing Backwards, context-insensitive slicing algorithm  Prevents the slice from propagating into a function that is clearly not in the slice Indirect uses from control statements are not part of the slice  Path counting will follow both directions regardless of condition No attempt to make slice executable  Used for analysis only Slicing criterion varies by experiment:  No slicing  All dangerous statements  All dangerous statements in a function  One dangerous statement

10 Path Counting Control flow graph is collapsed after slicing Path count is computed interprocedurally  Total paths is the sum of each function Loops introduce two new paths:  One for the loop not taken  One for the loop taken once  Assumes fixed-point analysis summarizes the loop Goto statements end a path  Not too many gotos in the programs used  Functions with gotos have a lot of paths even with this simplification

11 Shortcomings Processing of loops and goto statements Not all paths are equal  length of path  complexity of state Intraprocedural path count depends on how the program is divided into functions Amount of work to reduce the number of paths varies widely  Depends on factors such as loop depth

12 Results: Programs Used DescriptionFunctionsStatements bccalculator10514,491 betaftpdfile transfer daemon734,791 diff3compares three files324,016 findfile finder39831,098 flexlexical analyzer14022,453 ftspanning tree331,879 ghttpdweb server192,663 gnuchesschess game24339,443 gzipcompression utility10611,380 indentsource code indenter11419,605 ksgraph partitioning161,325 othelloothello game111,055 spacespecialized interpreter12711,652 thttpdweb server13012,500 yacr2channel router595,606

13 Results: Single Run, No Slicing Total paths Paths in Worst Function Functions with  100 paths Functions with >100,000 paths bc 2,653,0072,144,737 (80.8%)87 (82.9%)3 (2.9%) betaftpd 68,36555,297 (80.9%)66 (90.4%)0 (0.0%) diff3 2,067,3451,558,324 (75.4%)23 (71.9%)3 (9.4%) find 22,453,01121,748,720 (96.9%)366 (92.0%)3 (0.8%) flex 7.33E E+11 (98.4%)123 (87.9%)7 (5.0%) ft 10,49810,082 (96.0%)31 (93.9%)0 (0.0%) ghttpd 91,58091,082 (99.5%)16 (84.2%)0 (0.0%) gnuchess 2.35E E+16 (98.9%)202 (83.1%)12 (4.9%) gzip 3.49E E+11 (98.8%)80 (75.5%)9 (8.5%) indent 2.12E E+17 (100.0%)94 (82.5%)7 (6.1%) ks 25,37123,100 (91.0%)14 (87.5%)0 (0.0%) othello 42,8022,5057 (58.5%)6 (54.5%)0 (0.0%) space 5,8533,900 (66.6%)123 (96.9%)0 (0.0%) thttpd 1.57E E+14 (100.0%)108 (83.1%)3 (2.3%) yacr2 3,666,9002,991,744 (81.6%)40 (67.8%)2 (3.4%)

14 Results: Single Run, Slicing Total paths % Decr. Paths in Worst Function Funcs with  100 paths Funcs with >100,000 paths bc2,268, %2,144,736 (94.5%)91 (86.7%)1 (1.0%) betaftpd5, %1,980 (38.0%)70 (95.9%)0 (0.0%) diff340, %20,412 (50.5%)26 (81.3%)0 (0.0%) find4,146, %4,057,361 (97.8%)382 (96.0%)1 (0.3%) flex7.22E+111.6%7.22E+11 (100.0%)128 (91.4%)4 (2.9%) ft %194 (75.5%)32 (97.0%)0 (0.0%) ghttpd2, %2,520 (93.3%)18 (94.7%)0 (0.0%) gnuchess3.41E %2.66E+14 (77.9%)214 (88.1%)11 (4.5%) gzip8.26E %8.26E+08 (100.0%)91 (85.8%)1 (0.9%) indent8.00E+13100%8.00E+13 (100.0%)96 (84.2%)6 (5.3%) ks1, %1,400 (92.2%)15 (93.8%)0 (0.0%) othello3, %3,249 (93.8%)10 (90.9%)0 (0.0%) space1, %346 (18.3%)124 (97.6%)0 (0.0%) thttpd4.19E %4.19E+12 (100.0%)111 (85.4%)2 (1.5%) yacr2287, %259,328 (90.2%)46 (78.0%)1 (1.7%)

15 Results: Individual Statement Runs One run for each dangerous operation The runs are sorted by the number of paths from smallest to largest Graphs show cumulative percentage of runs that have fewer than n paths

16 Results: Individual Statement Runs

17 Results: Individual Function Runs

18 Results: Worst Case Comparison Total paths (slicing - all) Worst Case Run Total paths (slicing - stmt) Total paths (slicing - func) bc2,268,432617,9921,106,152 betaftpd5, ,341 diff340,4233,25620,788 find4,146,604171,3944,058,603 flex7.22E E+10 ft ghttpd2, ,614 gnuchess3.41E E E+14 gzip8.26E E+08 indent8.00E E E+13 ks1,519761,467 othello3,4623,2863,290 space1,8921,231 thttpd4.19E E E+11 yacr2287, ,518

19 Qualitative Analysis Look deeper at each program  What tasks lead to path explosion?  What does slicing do? Example analysis – find  Function quotearg_buffer_restyled has the most paths (21 million)  Modifies and buffers a string  Many options and special character processing  After slicing, 4 million paths remain  Function consider_visiting has the second most paths  Individual runs effective for operations not either of the above two functions See the paper for analysis of the other 14 programs.

20 Qualitative Analysis Common tasks for path explosion:  Input processing functions (often not sliced away)  Parsing functions (often not sliced away)  Stylized output functions (often sliced away) Other program-specific tasks suffered from path explosion:  divide in bc  finite state automata conversion in flex  finding the best move in gnuchess

21 Conclusions 1.When employing high-quality static software bug detection techniques, is it better to attempt to use the entire program in a single run or to look at dangerous operations individually? Worst case individual run ≈ single run  But there are exceptions Individual runs were effective for many operations  Especially those that were not from a function that suffered from path explosion

22 Conclusion 2.What is the effectiveness of program slicing in reducing the number of paths? Slicing did reduce the number of paths. Not enough in the worst cases of path explosion. 3.What types of tasks lead to path explosion? Is slicing more or less effective on particular tasks? Input processing, parsing, and stylized output functions often suffered from path explosion. Path explosion still existed in these functions after slicing. Slicing was helpful for stylized output functions since little to no code was dependent on its results.

23 Future Work Use the results to improve static bug detection:  Looking at task-specific techniques to address path explosion.  Incorporate some level of guidance from the user Extend the study  Address shortcomings: loops, interprocedural analysis  Programs in different languages

24 Questions