Basic Program Analysis

Slides:



Advertisements
Similar presentations
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
Advertisements

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
Intermediate Representations Saumya Debray Dept. of Computer Science The University of Arizona Tucson, AZ
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2011 More Control Flow John Cavazos University.
1 Code Optimization. 2 The Code Optimizer Control flow analysis: control flow graph Data-flow analysis Transformations Front end Code generator Code optimizer.
Control Flow Analysis (Chapter 7) Mooly Sagiv (with Contributions by Hanne Riis Nielson)
Program Representations. Representing programs Goals.
U NIVERSITY OF M ASSACHUSETTS, A MHERST D EPARTMENT OF C OMPUTER S CIENCE Advanced Compilers CMPSCI 710 Spring 2003 Lecture 2 Emery Berger University of.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
PSUCS322 HM 1 Languages and Compiler Design II Basic Blocks Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU Spring.
ECE1724F Compiler Primer Sept. 18, 2002.
1 Intermediate representation Goals: encode knowledge about the program facilitate analysis facilitate retargeting facilitate optimization scanning parsing.
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
CS 412/413 Spring 2007Introduction to Compilers1 Lecture 29: Control Flow Analysis 9 Apr 07 CS412/413 Introduction to Compilers Tim Teitelbaum.
Intermediate Code. Local Optimizations
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
1 Code Optimization Chapter 9 (1 st ed. Ch.10) COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
Software (Program) Analysis. Automated Static Analysis Static analyzers are software tools for source text processing They parse the program text and.
Prof. Bodik CS 164 Lecture 51 Building a Parser I CS164 3:30-5:00 TT 10 Evans.
10/1/2015© Hal Perkins & UW CSEG-1 CSE P 501 – Compilers Intermediate Representations Hal Perkins Autumn 2009.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 10, 10/30/2003 Prof. Roy Levow.
1 Semantic Analysis Aaron Bloomfield CS 415 Fall 2005.
1 CS 201 Compiler Construction Introduction. 2 Instructor Information Rajiv Gupta Office: WCH Room Tel: (951) Office.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 1 Developed By:
Introduction to Parsing
Intermediate Code Representations
1 Control Flow Analysis Topic today Representation and Analysis Paper (Sections 1, 2) For next class: Read Representation and Analysis Paper (Section 3)
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
CS 614: Theory and Construction of Compilers Lecture 15 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.
Control Flow Analysis Compiler Baojian Hua
LECTURE 3 Compiler Phases. COMPILER PHASES Compilation of a program proceeds through a fixed series of phases.  Each phase uses an (intermediate) form.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Loops Simone Campanoni
Lecture 9 Symbol Table and Attributed Grammars
Announcements/Reading
Chapter 3 – Describing Syntax
Compiler Design (40-414) Main Text Book:
Data Flow Analysis Suman Jana
A Simple Syntax-Directed Translator
Constructing Precedence Table
Introduction to Parsing
CS510 Compiler Lecture 4.
Introduction to Parsing (adapted from CS 164 at Berkeley)
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Ch. 4 – Semantic Analysis Errors can arise in syntax, static semantics, dynamic semantics Some PL features are impossible or infeasible to specify in grammar.
Compiler Lecture 1 CS510.
CS 536 / Fall 2017 Introduction to programming languages and compilers
Basic Program Analysis: AST
Syntax-Directed Translation
Intermediate Representations Hal Perkins Autumn 2011
Program Slicing Baishakhi Ray University of Virginia
University Of Virginia
Control Flow Analysis CS 4501 Baishakhi Ray.
Code Optimization Chapter 10
Code Optimization Chapter 9 (1st ed. Ch.10)
Optimizing Compilers CISC 673 Spring 2009 More Control Flow
Chapter 6 Intermediate-Code Generation
Lecture 15 (Notes by P. N. Hilfinger and R. Bodik)
CSE401 Introduction to Compiler Construction
COP4020 Programming Languages
Code Optimization Overview and Examples Control Flow Graph
Control Flow Analysis (Chapter 7)
Data Flow Analysis Compiler Design
CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019
Faculty of Computer Science and Information System
Presentation transcript:

Basic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik

Our Goal Program Analyzer Security bugs Source code Program analyzer must be able to understand program properties (e.g., can a variable be NULL at a particular program point? ) Must perform control and data flow analysis

Do we need to implement control and data flow analysis from scratch? Most modern compilers already perform several types of such analysis for code optimization We can hook into different layers of analysis and customize them We still need to understand the details LLVM (http://llvm.org/) is a highly customizable and modular compiler framework Users can write LLVM passes to perform different types of analysis Clang static analyzer can find several types of bugs Can instrument code for dynamic analysis

Compiler Overview Abstract Syntax Tree : Source code parsed to produce AST Control Flow Graph: AST is transformed to CFG Data Flow Analysis: operates on CFG

The Structure of a Compiler Source code (stream of characters) scanner stream of tokens parser Abstract Syntax Tree (AST) checker AST with annotations (types, declarations) code gen Machine/byte code

Adopted From UC Berkeley: Prof. Bodik CS 164 Lecture 5 Syntactic Analysis Input: sequence of tokens from scanner Output: abstract syntax tree Actually, parser first builds a parse tree AST is then built by translating the parse tree parse tree rarely built explicitly; only determined by, say, how parser pushes stuff to stack Adopted From UC Berkeley: Prof. Bodik CS 164 Lecture 5

Example Source Code 4*(2+3) Parser input Parser output (AST): * NUM(4) NUM(4) TIMES LPAR NUM(2) PLUS NUM(3) RPAR Parser output (AST): * NUM(4) + NUM(2) NUM(3) Adopted From UC Berkeley: Prof. Bodik CS 164 Lecture 5

Parse tree for the example: 4*(2+3) EXPR EXPR EXPR NUM(4) TIMES LPAR NUM(2) PLUS NUM(3) RPAR leaves are tokens Adopted From UC Berkeley: Prof. Bodik CS 164 Lecture 5

IF LPAR ID EQ ID RPAR LBR ID AS INT SEMI RBR Another example Source Code if (x == y) { a=1; } Parser input IF LPAR ID EQ ID RPAR LBR ID AS INT SEMI RBR Parser output (AST): IF-THEN == ID = INT Adopted From UC Berkeley: Prof. Bodik CS 164 Lecture 5

Parse tree for example: if (x==y) {a=1;} STMT BLOCK STMT EXPR EXPR IF LPAR ID == ID RPAR LBR ID = INT SEMI RBR leaves are tokens Adopted From UC Berkeley: Prof. Bodik CS 164 Lecture 5

Parse Tree Representation of grammars in a tree-like form. Is a one-to-one mapping from the grammar to a tree-form. A parse tree pictorially shows how the start symbol of a grammar derives a string in the language. … Dragon Book

C Statement: return a + 2 a very formal representation that strictly shows how the parser understands the statement return a + 2;

Abstract Syntax Tree (AST) Simplified syntactic representations of the source code, and they're most often expressed by the data structures of the language used for implementation Without showing the whole syntactic clutter, represents the parsed string in a structured way, discarding all information that may be important for parsing the string, but isn't needed for analyzing it. ASTs differ from parse trees because superficial distinctions of form, unimportant for translation, do not appear in syntax trees.. … Dragon Book

AST C Statement: return a + 2

Disadvantages of ASTs AST has many similar forms E.g., for, while, repeat...until E.g., if, ?:, switch Expressions in AST may be complex, nested (x * y) + (z > 5 ? 12 * z : z + 20) Want simpler representation for analysis ...at least, for dataflow analysis int x = 1 // what’s the value of x ? // AST traversal can give the answer, right? What about int x; x = 1; or int x= 0; x += 1; ?

Control Flow Graph & Analysis

Representing Control Flow High-level representation Control flow is implicit in an AST Low-level representation: Use a Control-flow graph (CFG) Nodes represent statements (low-level linear IR) Edges represent explicit flow of control Adopted From U Penn CIS 570: Modern Programming Language Implementation (Autumn 2006)

What Is Control-Flow Analysis? b := a * b e := b / c f : e + 1 g := f h := t – g If e > 0 ? goto return c := b / d c < x? 1 3 5 7 11 10 Yes No 1 2 a := 0 b := a * b 3 L1: c := b/d 4 5 6 if c < x goto L2 e := b / c f := e + 1 7 L2: g := f 8 9 h := t - g if e > 0 goto L3 10 goto L1 11 L3: return Adopted From U Penn CIS 570: Modern Programming Language Implementation (Autumn 2006)

Basic Blocks A basic block is a sequence of straight line code that can be entered only at the beginning and exited only at the end g := f h := t – g If e > 0 ? Building basic blocks – Identify leaders The first instruction in a procedure, or The target of any branch, or An instruction immediately following a branch (implicit target) – Gobble all subsequent instructions until the next leader Adopted From U Penn CIS 570: Modern Programming Language Implementation (Autumn 2006)

Basic Block Example 1 2 a := 0 b := a * b 3 L1: c := b/d 4 5 6 if c < x goto L2 e := b / c f := e + 1 7 L2: g := f 8 9 h := t - g if e > 0 goto L3 10 goto L1 11 L3: return Leaders? – {1, 3, 5, 7, 10, 11} Blocks? – {1, 2} – {3, 4} – {5, 6} – {7, 8, 9} – {10} – {11} Adopted From U Penn CIS 570: Modern Programming Language Implementation (Autumn 2006)

Building a CFG From Basic Block Construction Each CFG node represents a basic block There is an edge from node i to j if Last statement of block i branches to the first statement of j, or Block i does not end with an unconditional branch and is immediately followed in program order by block j (fall through) a := 0 b := a * b e := b / c f : e + 1 g := f h := t – g If e > 0 ? goto return c := b / d c < x? 1 3 5 7 11 10 Yes No Adopted From U Penn CIS 570: Modern Programming Language Implementation (Autumn 2006)

Looping backedge Why? backedges indicate that we might need to traverse the CFG more than once for data flow analysis preheader entry edge head Loop tail exit edge Exit edge Adopted From U Penn CIS 570: Modern Programming Language Implementation (Autumn 2006)

Looping Not all loops have preheaders – Sometimes it is useful to create them backedge preheader Without preheader node – There can be multiple entry edges entry edge head Loop With single preheader node – There is only one entry edge tail exit edge Exit edge Adopted From U Penn CIS 570: Modern Programming Language Implementation (Autumn 2006)

Dominators d dom i if all paths from entry to node i include d Strict Dominator (d sdom i) If d dom i, but d != i Immediate dominator (a idom b) a sdom b and there does not exist any node c such that a != c, c != b, a dom c, c dom b Post dominator (p pdom i) If every possible path from i to exit includes p Adopted From U Penn CIS 570: Modern Programming Language Implementation (Autumn 2006)

Identifying Natural Loops and Dominators Back Edge A back edge of a natural loop is one whose target dominates its source Natural Loop The natural loop of a back edge (mn), where n dominates m, is the set of nodes x such that n dominates x and there is a path from x to m not containing n Adopted From U Penn CIS 570: Modern Programming Language Implementation (Autumn 2006)

Reducibility A CFG is reducible (well-structured) if we can partition its edges into two disjoint sets, the forward edges and the back edges, such that The forward edges form an acyclic graph in which every node can be reached from the entry node The back edges consist only of edges whose targets dominate their sources Structured control-flow constructs give rise to reducible CFGs Value of reducibility: Dominance useful in identifying loops Simplifies code transformations (every loop has a single header) Permits interval analysis Adopted From U Penn CIS 570: Modern Programming Language Implementation (Autumn 2006)

Handling Irreducible CFG’s Node splitting Can turn irreducible CFGs into reducible CFGs a b c d e d General idea Reduce graph (iteratively remove self edges, merge nodes with single pred) More than one node => irreducible – Split any multi-parent node and start over Adopted From U Penn CIS 570: Modern Programming Language Implementation (Autumn 2006)

Why go through all this trouble? Modern languages provide structured control flow Shouldn’t the compiler remember this information rather than throw it away and then re-compute it? Answers? We may want to work on the binary code Most modern languages still provide a goto statement Languages typically provide multiple types of loops. This analysis lets us treat them all uniformly We may want a compiler with multiple front ends for multiple languages; rather than translating each language to a CFG, translate each language to a canonical IR and then to a CFG Adopted From U Penn CIS 570: Modern Programming Language Implementation (Autumn 2006)