Presentation is loading. Please wait.

Presentation is loading. Please wait.

STATIC CODE ANALYSIS. OUTLINE  INTRODUCTION  BACKGROUND o REGULAR EXPRESSIONS o SYNTAX TREES o CONTROL FLOW GRAPHS  TOOLS AND THEIR WORKING  ERROR.

Similar presentations


Presentation on theme: "STATIC CODE ANALYSIS. OUTLINE  INTRODUCTION  BACKGROUND o REGULAR EXPRESSIONS o SYNTAX TREES o CONTROL FLOW GRAPHS  TOOLS AND THEIR WORKING  ERROR."— Presentation transcript:

1 STATIC CODE ANALYSIS

2 OUTLINE  INTRODUCTION  BACKGROUND o REGULAR EXPRESSIONS o SYNTAX TREES o CONTROL FLOW GRAPHS  TOOLS AND THEIR WORKING  ERROR EXAMPLES AND CASE STUDIES  LIMITATIONS  THE STYLISTIC MODULE

3 INTRODUCTION  Static Analysis is the process of analyzing a program's code, without executing it to find out how the program will behave at runtime  Applied to the analysis performed by an automated tool  Manual analysis is referred to as program understanding, program comprehension, or code review

4 REGULAR EXPRESSIONS  Concise notation for specifying sets of strings  Equivalent to Finite State Machines (FSMs)  Expressions for matching phone numbers, words, email addresses etc. can be defined  Used in lexical analysis

5 ANALOGY TO MATHEMATICAL EXPRESSIONS Math ExpressionPossible values for ‘x’ Regular ExpressionMatches or denotes [1-3]“1”, “2” or “3” ab?c“ac” or “abc” a*, “a”, “aa”, “aaa”, …

6 CRASH COURSE ON REGULAR EXPRESSIONS Regular ExpressionMatchesDoes Not Match [a-z0-9]a, b, k, z, 9, 1, 5, aa, b2, 5t, 05 ab*cac, abc, abbc, a, c, ab, bc ab+cabc, abbc, abbbc, a, c, ab, bc, ac -?[0-9]+1, -1, -273, 448, a, 4a [a-z]+|[0-9]+apple, 34, 0, m, 4b, t5, Apple [^ ]+a, a-b, -79.5, boY, kung fu (?:do|re|mi)*, do, re, mido, midore doo \([a-z]*\)(), (a), (word), ( a ), (a,b)

7 ABSTRACT SYNTAX TREES  A tree representation of the syntactic structure of source code written in a programming language  Any ambiguity has been resolved o E.g., a + b + c produces the same AST as (a + b) + c  They don’t contain all the information in the program o E.g., spacing, comments, brackets, parentheses  Used in syntactic analysis

8 ABSTRACT SYNTAX TREE EXAMPLE while (b != 0) { if (a > b) { a = a – b; } else { b = b – a; } return a;

9 CONTROL FLOW GRAPHS  A representation, using graph notation, of all paths that might be traversed through a program during its execution  A directed graph where o Each node represents a statement o Edges represent control flow  Used in data flow analysis

10 CONTROL FLOW GRAPH EXAMPLE x := a + b; y := a * b; while (y > a) { a := a + 1; x := a + b }

11 STATIC ANALYSIS TOOLS LanguageToolUses C++ CppLintRegex CppCheckCFG Vera++AST Java CheckstyleAST PMDAST, DFG FindBugsCFG Python Flake8Regex, AST PyLintAST PHP CodeSnifferRegex PHPMDAST, DFG

12 HOW DOES PMD WORK?

13 SOME EXAMPLES OF ERRORS  Excessive Method Length  Excessive Parameter List  Unused Variables  Dead Code  Object Creation in a Loop  Short Variable Names  Infinite Loops  Too Many Blank Lines

14 CASE STUDY: SHORT METHOD NAMES def short_method_names(source): lines = remove_comments(source).split("\n") # Split the source into lines lines = [line.strip() for line in lines] # Remove trailing whitespaces from the lines pairs = zip(range(1, len(lines) + 1), lines) # Pair the lines with line numbers method_declarations = filter(lambda pair: re.match(r'(public +|private +|protected +|internal +|protected +internal +)?(static)? +[a-zA-Z0-9_]+ +[a-zA-Z0-9_]+ *\(.*\)', pair[1]) is not None, pairs) violations = list() for line, declaration in method_declarations: left, right = declaration.split('(') tokens = re.findall(r'[a-zA-Z0-9_]+', left) # Split the method declaration into tokens method_name = tokens[-1] # Since 'left' contains the part before '(', the last token would be the method name if len(method_name) < MINIMUM_IDENTIFIER_LENGTH: # Check if the method name is of appropriate length violations.append((line, declaration)) return violations

15 CASE STUDY: DUPLICATE IMPORTS def duplicate_imports(source): lines = remove_comments(source).split("\n") # Split the source into lines lines = [line.strip() for line in lines] # Remove trailing whitespaces from the lines pairs = zip(range(1, len(lines) + 1), lines) # Pair the lines with line numbers containing_using = filter(lambda pair: re.match(r'using +[a-zA-Z0-9_.]+ *;', pair[1]) is not None, pairs) # Detect duplicates containing_using = [(pair[0], re.sub(r' +', '', pair[1])) for pair in containing_using] duplicates = list() for i in xrange(len(containing_using)): for j in xrange(i + 1, len(containing_using)): if containing_using[i][1] == containing_using[j][1]: duplicates.append((containing_using[i][0], 'using ' + containing_using[i][1][5:])) return duplicates

16 CASE STUDY: SHORT VARIABLE NAMES

17 CASE STUDY: DEAD CODE int x = 2 + 1; if (x == 4) { do_something(); } do_something_else();

18 CASE STUDY: INFINITE LOOP int x = a + b; while (1 == 1) { do_something(); } do_something_else();

19 LIMITATIONS  Nontrivial properties of programs are undecidable o E.g., the halting problem, semantic equivalence  We can never determine all possible program behaviors

20 AN EXAMPLE OF THE SECOND LIMITATION int x = 2 + 1; if (x == 4) { do_something(); } do_something_else(); int x = sqrt(1); if (x == 4) { do_something(); } do_something_else();

21 THE STYLISTIC MODULE (A BIRD’S EYE VIEW)  Get the tools for a particular language  Analyze the errors flagged by the tool  Filter out unnecessary errors  Categorize errors into their respective bins  Run the tools on a sample of programs  Determine the normalization factors  Perform a face-value analysis of programs against normed scores  Make changes to the error list if necessary  Determine thresholds for each bin  Score programs based on those thresholds

22 CONCLUSION  Software is hard to get right o Complex library APIs o Difficult language features: e.g., threads  Nobody is perfect 100% of the time  Result: bugs  The tools can never determine all possible bugs  However, they are a useful first line of defense

23 THANK YOU


Download ppt "STATIC CODE ANALYSIS. OUTLINE  INTRODUCTION  BACKGROUND o REGULAR EXPRESSIONS o SYNTAX TREES o CONTROL FLOW GRAPHS  TOOLS AND THEIR WORKING  ERROR."

Similar presentations


Ads by Google