STATIC CODE ANALYSIS. OUTLINE  INTRODUCTION  BACKGROUND o REGULAR EXPRESSIONS o SYNTAX TREES o CONTROL FLOW GRAPHS  TOOLS AND THEIR WORKING  ERROR.

Slides:



Advertisements
Similar presentations
CPSC 388 – Compiler Design and Construction
Advertisements

COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
Introduction To Compilers And Phase 1 Inside a compiler. Inside a C-- compiler. The compilation process. Example C-- code. Extended Backus-Naur.
1 Mooly Sagiv and Greta Yorsh School of Computer Science Tel-Aviv University Modern Compiler Design.
Chapter 7 User-Defined Methods. Chapter Objectives  Understand how methods are used in Java programming  Learn about standard (predefined) methods and.
Reverse Engineering © SERG Code Cloning: Detection, Classification, and Refactoring.
Automated creation of verification models for C-programs Yury Yusupov Saint-Petersburg State Polytechnic University The Second Spring Young Researchers.
1 Static Testing: defect prevention SIM objectives Able to list various type of structured group examinations (manual checking) Able to statically.
CPSC 411, Fall 2008: Set 12 1 CPSC 411 Design and Analysis of Algorithms Set 12: Undecidability Prof. Jennifer Welch Fall 2008.
Program analysis Mooly Sagiv html://
Chapter3: Language Translation issues
1 Chapter 7 User-Defined Methods Java Programming from Thomson Course Tech, adopted by kcluk.
Program analysis Mooly Sagiv html://
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Overview of program analysis Mooly Sagiv html://
Overview of program analysis Mooly Sagiv html://
Language Evaluation Criteria
CS 540 Spring CS 540 Spring 2013 GMU2 The Course covers: Lexical Analysis Syntax Analysis Semantic Analysis Runtime environments Code Generation.
CONTROL FLOW IN C++ Satish Mishra PGT CS KV Trimulgherry.
Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. address.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
{ Graphite Grigory Arashkovich, Anuj Khanna, Anirban Gangopadhyay, Michael D’Egidio, Laura Willson.
Mining and Analysis of Control Structure Variant Clones Guo Qiao.
Chapter 6 Programming Languages (2) Introduction to CS 1 st Semester, 2015 Sanghyun Park.
Chapter Twenty-ThreeModern Programming Languages1 Formal Semantics.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
Use of Coverity & Valgrind in Geant4 Gabriele Cosmo.
D. M. Akbar Hussain: Department of Software & Media Technology 1 Compiler is tool: which translate notations from one system to another, usually from source.
Formal Semantics Chapter Twenty-ThreeModern Programming Languages, 2nd ed.1.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
May 31, May 31, 2016May 31, 2016May 31, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa Pacific University,
1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:
CPS 506 Comparative Programming Languages Syntax Specification.
Joey Paquet, 2000, Lecture 10 Introduction to Code Generation and Intermediate Representations.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
1 / 48 Formal a Language Theory and Describing Semantics Principles of Programming Languages 4.
Compiler Introduction 1 Kavita Patel. Outlines 2  1.1 What Do Compilers Do?  1.2 The Structure of a Compiler  1.3 Compilation Process  1.4 Phases.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.
Software Engineering Prof. Dr. Bertrand Meyer March 2007 – June 2007 Chair of Software Engineering Automatic code inspection.
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
BY: JAKE TENBERG & CHELSEA SHIPP PROJECT REVIEW: JGIBBERISH.
Module 13: Properties and Indexers. Overview Using Properties Using Indexers.
Chapter 4 Static Analysis. Summary (1) Building a model of the program:  Lexical analysis  Parsing  Abstract syntax  Semantic Analysis  Tracking.
Code improvement: Coverity static analysis Valgrind dynamic analysis GABRIELE COSMO CERN, EP/SFT.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Chapter 3 – Describing Syntax
Chapter 7 User-Defined Methods.
Chapter 1 Introduction.
Introduction to Compiler Construction
Types for Programs and Proofs
Lecture 2 Lexical Analysis
Compiler Construction (CS-636)
C# and the .NET Framework
Chapter 1 Introduction.
Compiler Lecture 1 CS510.
CS 536 / Fall 2017 Introduction to programming languages and compilers
Control Structures – Selection
Program Slicing Baishakhi Ray University of Virginia
Human Complexity of Software
Lecture 15 (Notes by P. N. Hilfinger and R. Bodik)
CSE401 Introduction to Compiler Construction
Lecture 4: Lexical Analysis & Chomsky Hierarchy
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Discrete Maths 13. Grammars Objectives
Presentation transcript:

STATIC CODE ANALYSIS

OUTLINE  INTRODUCTION  BACKGROUND o REGULAR EXPRESSIONS o SYNTAX TREES o CONTROL FLOW GRAPHS  TOOLS AND THEIR WORKING  ERROR EXAMPLES AND CASE STUDIES  LIMITATIONS  THE STYLISTIC MODULE

INTRODUCTION  Static Analysis is the process of analyzing a program's code, without executing it to find out how the program will behave at runtime  Applied to the analysis performed by an automated tool  Manual analysis is referred to as program understanding, program comprehension, or code review

REGULAR EXPRESSIONS  Concise notation for specifying sets of strings  Equivalent to Finite State Machines (FSMs)  Expressions for matching phone numbers, words, addresses etc. can be defined  Used in lexical analysis

ANALOGY TO MATHEMATICAL EXPRESSIONS Math ExpressionPossible values for ‘x’ Regular ExpressionMatches or denotes [1-3]“1”, “2” or “3” ab?c“ac” or “abc” a*, “a”, “aa”, “aaa”, …

CRASH COURSE ON REGULAR EXPRESSIONS Regular ExpressionMatchesDoes Not Match [a-z0-9]a, b, k, z, 9, 1, 5, aa, b2, 5t, 05 ab*cac, abc, abbc, a, c, ab, bc ab+cabc, abbc, abbbc, a, c, ab, bc, ac -?[0-9]+1, -1, -273, 448, a, 4a [a-z]+|[0-9]+apple, 34, 0, m, 4b, t5, Apple [^ ]+a, a-b, -79.5, boY, kung fu (?:do|re|mi)*, do, re, mido, midore doo \([a-z]*\)(), (a), (word), ( a ), (a,b)

ABSTRACT SYNTAX TREES  A tree representation of the syntactic structure of source code written in a programming language  Any ambiguity has been resolved o E.g., a + b + c produces the same AST as (a + b) + c  They don’t contain all the information in the program o E.g., spacing, comments, brackets, parentheses  Used in syntactic analysis

ABSTRACT SYNTAX TREE EXAMPLE while (b != 0) { if (a > b) { a = a – b; } else { b = b – a; } return a;

CONTROL FLOW GRAPHS  A representation, using graph notation, of all paths that might be traversed through a program during its execution  A directed graph where o Each node represents a statement o Edges represent control flow  Used in data flow analysis

CONTROL FLOW GRAPH EXAMPLE x := a + b; y := a * b; while (y > a) { a := a + 1; x := a + b }

STATIC ANALYSIS TOOLS LanguageToolUses C++ CppLintRegex CppCheckCFG Vera++AST Java CheckstyleAST PMDAST, DFG FindBugsCFG Python Flake8Regex, AST PyLintAST PHP CodeSnifferRegex PHPMDAST, DFG

HOW DOES PMD WORK?

SOME EXAMPLES OF ERRORS  Excessive Method Length  Excessive Parameter List  Unused Variables  Dead Code  Object Creation in a Loop  Short Variable Names  Infinite Loops  Too Many Blank Lines

CASE STUDY: SHORT METHOD NAMES def short_method_names(source): lines = remove_comments(source).split("\n") # Split the source into lines lines = [line.strip() for line in lines] # Remove trailing whitespaces from the lines pairs = zip(range(1, len(lines) + 1), lines) # Pair the lines with line numbers method_declarations = filter(lambda pair: re.match(r'(public +|private +|protected +|internal +|protected +internal +)?(static)? +[a-zA-Z0-9_]+ +[a-zA-Z0-9_]+ *\(.*\)', pair[1]) is not None, pairs) violations = list() for line, declaration in method_declarations: left, right = declaration.split('(') tokens = re.findall(r'[a-zA-Z0-9_]+', left) # Split the method declaration into tokens method_name = tokens[-1] # Since 'left' contains the part before '(', the last token would be the method name if len(method_name) < MINIMUM_IDENTIFIER_LENGTH: # Check if the method name is of appropriate length violations.append((line, declaration)) return violations

CASE STUDY: DUPLICATE IMPORTS def duplicate_imports(source): lines = remove_comments(source).split("\n") # Split the source into lines lines = [line.strip() for line in lines] # Remove trailing whitespaces from the lines pairs = zip(range(1, len(lines) + 1), lines) # Pair the lines with line numbers containing_using = filter(lambda pair: re.match(r'using +[a-zA-Z0-9_.]+ *;', pair[1]) is not None, pairs) # Detect duplicates containing_using = [(pair[0], re.sub(r' +', '', pair[1])) for pair in containing_using] duplicates = list() for i in xrange(len(containing_using)): for j in xrange(i + 1, len(containing_using)): if containing_using[i][1] == containing_using[j][1]: duplicates.append((containing_using[i][0], 'using ' + containing_using[i][1][5:])) return duplicates

CASE STUDY: SHORT VARIABLE NAMES

CASE STUDY: DEAD CODE int x = 2 + 1; if (x == 4) { do_something(); } do_something_else();

CASE STUDY: INFINITE LOOP int x = a + b; while (1 == 1) { do_something(); } do_something_else();

LIMITATIONS  Nontrivial properties of programs are undecidable o E.g., the halting problem, semantic equivalence  We can never determine all possible program behaviors

AN EXAMPLE OF THE SECOND LIMITATION int x = 2 + 1; if (x == 4) { do_something(); } do_something_else(); int x = sqrt(1); if (x == 4) { do_something(); } do_something_else();

THE STYLISTIC MODULE (A BIRD’S EYE VIEW)  Get the tools for a particular language  Analyze the errors flagged by the tool  Filter out unnecessary errors  Categorize errors into their respective bins  Run the tools on a sample of programs  Determine the normalization factors  Perform a face-value analysis of programs against normed scores  Make changes to the error list if necessary  Determine thresholds for each bin  Score programs based on those thresholds

CONCLUSION  Software is hard to get right o Complex library APIs o Difficult language features: e.g., threads  Nobody is perfect 100% of the time  Result: bugs  The tools can never determine all possible bugs  However, they are a useful first line of defense

THANK YOU