Download presentation
Presentation is loading. Please wait.
Published byArthur Carroll Modified over 8 years ago
1
STATIC CODE ANALYSIS
2
OUTLINE INTRODUCTION BACKGROUND o REGULAR EXPRESSIONS o SYNTAX TREES o CONTROL FLOW GRAPHS TOOLS AND THEIR WORKING ERROR EXAMPLES AND CASE STUDIES LIMITATIONS THE STYLISTIC MODULE
3
INTRODUCTION Static Analysis is the process of analyzing a program's code, without executing it to find out how the program will behave at runtime Applied to the analysis performed by an automated tool Manual analysis is referred to as program understanding, program comprehension, or code review
4
REGULAR EXPRESSIONS Concise notation for specifying sets of strings Equivalent to Finite State Machines (FSMs) Expressions for matching phone numbers, words, email addresses etc. can be defined Used in lexical analysis
5
ANALOGY TO MATHEMATICAL EXPRESSIONS Math ExpressionPossible values for ‘x’ Regular ExpressionMatches or denotes [1-3]“1”, “2” or “3” ab?c“ac” or “abc” a*, “a”, “aa”, “aaa”, …
6
CRASH COURSE ON REGULAR EXPRESSIONS Regular ExpressionMatchesDoes Not Match [a-z0-9]a, b, k, z, 9, 1, 5, aa, b2, 5t, 05 ab*cac, abc, abbc, a, c, ab, bc ab+cabc, abbc, abbbc, a, c, ab, bc, ac -?[0-9]+1, -1, -273, 448, a, 4a [a-z]+|[0-9]+apple, 34, 0, m, 4b, t5, Apple [^ ]+a, a-b, -79.5, boY, kung fu (?:do|re|mi)*, do, re, mido, midore doo \([a-z]*\)(), (a), (word), ( a ), (a,b)
7
ABSTRACT SYNTAX TREES A tree representation of the syntactic structure of source code written in a programming language Any ambiguity has been resolved o E.g., a + b + c produces the same AST as (a + b) + c They don’t contain all the information in the program o E.g., spacing, comments, brackets, parentheses Used in syntactic analysis
8
ABSTRACT SYNTAX TREE EXAMPLE while (b != 0) { if (a > b) { a = a – b; } else { b = b – a; } return a;
9
CONTROL FLOW GRAPHS A representation, using graph notation, of all paths that might be traversed through a program during its execution A directed graph where o Each node represents a statement o Edges represent control flow Used in data flow analysis
10
CONTROL FLOW GRAPH EXAMPLE x := a + b; y := a * b; while (y > a) { a := a + 1; x := a + b }
11
STATIC ANALYSIS TOOLS LanguageToolUses C++ CppLintRegex CppCheckCFG Vera++AST Java CheckstyleAST PMDAST, DFG FindBugsCFG Python Flake8Regex, AST PyLintAST PHP CodeSnifferRegex PHPMDAST, DFG
12
HOW DOES PMD WORK?
13
SOME EXAMPLES OF ERRORS Excessive Method Length Excessive Parameter List Unused Variables Dead Code Object Creation in a Loop Short Variable Names Infinite Loops Too Many Blank Lines
14
CASE STUDY: SHORT METHOD NAMES def short_method_names(source): lines = remove_comments(source).split("\n") # Split the source into lines lines = [line.strip() for line in lines] # Remove trailing whitespaces from the lines pairs = zip(range(1, len(lines) + 1), lines) # Pair the lines with line numbers method_declarations = filter(lambda pair: re.match(r'(public +|private +|protected +|internal +|protected +internal +)?(static)? +[a-zA-Z0-9_]+ +[a-zA-Z0-9_]+ *\(.*\)', pair[1]) is not None, pairs) violations = list() for line, declaration in method_declarations: left, right = declaration.split('(') tokens = re.findall(r'[a-zA-Z0-9_]+', left) # Split the method declaration into tokens method_name = tokens[-1] # Since 'left' contains the part before '(', the last token would be the method name if len(method_name) < MINIMUM_IDENTIFIER_LENGTH: # Check if the method name is of appropriate length violations.append((line, declaration)) return violations
15
CASE STUDY: DUPLICATE IMPORTS def duplicate_imports(source): lines = remove_comments(source).split("\n") # Split the source into lines lines = [line.strip() for line in lines] # Remove trailing whitespaces from the lines pairs = zip(range(1, len(lines) + 1), lines) # Pair the lines with line numbers containing_using = filter(lambda pair: re.match(r'using +[a-zA-Z0-9_.]+ *;', pair[1]) is not None, pairs) # Detect duplicates containing_using = [(pair[0], re.sub(r' +', '', pair[1])) for pair in containing_using] duplicates = list() for i in xrange(len(containing_using)): for j in xrange(i + 1, len(containing_using)): if containing_using[i][1] == containing_using[j][1]: duplicates.append((containing_using[i][0], 'using ' + containing_using[i][1][5:])) return duplicates
16
CASE STUDY: SHORT VARIABLE NAMES
17
CASE STUDY: DEAD CODE int x = 2 + 1; if (x == 4) { do_something(); } do_something_else();
18
CASE STUDY: INFINITE LOOP int x = a + b; while (1 == 1) { do_something(); } do_something_else();
19
LIMITATIONS Nontrivial properties of programs are undecidable o E.g., the halting problem, semantic equivalence We can never determine all possible program behaviors
20
AN EXAMPLE OF THE SECOND LIMITATION int x = 2 + 1; if (x == 4) { do_something(); } do_something_else(); int x = sqrt(1); if (x == 4) { do_something(); } do_something_else();
21
THE STYLISTIC MODULE (A BIRD’S EYE VIEW) Get the tools for a particular language Analyze the errors flagged by the tool Filter out unnecessary errors Categorize errors into their respective bins Run the tools on a sample of programs Determine the normalization factors Perform a face-value analysis of programs against normed scores Make changes to the error list if necessary Determine thresholds for each bin Score programs based on those thresholds
22
CONCLUSION Software is hard to get right o Complex library APIs o Difficult language features: e.g., threads Nobody is perfect 100% of the time Result: bugs The tools can never determine all possible bugs However, they are a useful first line of defense
23
THANK YOU
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.