ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

Slides:



Advertisements
Similar presentations
CPSC 388 – Compiler Design and Construction
Advertisements

Intermediate Representations CS 671 February 12, 2008.
1 Lecture 10 Intermediate Representations. 2 front end »produces an intermediate representation (IR) for the program. optimizer »transforms the code in.
Intermediate Code Generation
Course Outline Traditional Static Program Analysis Software Testing
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /15/2013 Lecture 11: MIPS-Conditional Instructions Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER.
Intermediate Code Generation. 2 Intermediate languages Declarations Expressions Statements.
Intermediate Representations Saumya Debray Dept. of Computer Science The University of Arizona Tucson, AZ
Lecture 08a – Backpatching & Recap Eran Yahav 1 Reference: Dragon 6.2,6.3,6.4,6.6.
Control Flow Analysis (Chapter 7) Mooly Sagiv (with Contributions by Hanne Riis Nielson)
8 Intermediate code generation
1 Compiler Construction Intermediate Code Generation.
PSUCS322 HM 1 Languages and Compiler Design II IR Code Generation I Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU.
Intermediate Representation I High-Level to Low-Level IR Translation EECS 483 – Lecture 17 University of Michigan Monday, November 6, 2006.
Chapter 14: Building a Runnable Program Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
Abstract Syntax Trees Compiler Baojian Hua
Intermediate Code CS 471 October 29, CS 471 – Fall Intermediate Code Generation Source code Lexical Analysis Syntactic Analysis Semantic.
Cse321, Programming Languages and Compilers 1 6/19/2015 Lecture #18, March 14, 2007 Syntax directed translations, Meanings of programs, Rules for writing.
1 CS 201 Compiler Construction Lecture 1 Introduction.
Compiler Design Lexical Analysis Syntactical Analysis Semantic Analysis Optimization Code Generation.
CSC 8505 Compiler Construction Intermediate Representations.
Topic 6 -Code Generation Dr. William A. Maniatty Assistant Prof. Dept. of Computer Science University At Albany CSI 511 Programming Languages and Systems.
Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2008/2009.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
CS412/413 Introduction to Compilers Radu Rugina Lecture 15: Translating High IR to Low IR 22 Feb 02.
Software (Program) Analysis. Automated Static Analysis Static analyzers are software tools for source text processing They parse the program text and.
What is Three Address Code? A statement of the form x = y op z is a three address statement. x, y and z here are the three operands and op is any logical.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
CSC 338: Compiler design and implementation
Compiler Chapter# 5 Intermediate code generation.
Chapter 6 Programming Languages (2) Introduction to CS 1 st Semester, 2015 Sanghyun Park.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 8: Semantic Analysis and Symbol Tables.
Advanced Compiler Design An Introduction to the Javali Compiler Framework Zoltán Majó 1.
1 CS 201 Compiler Construction Introduction. 2 Instructor Information Rajiv Gupta Office: WCH Room Tel: (951) Office.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 1 Developed By:
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 1, 08/28/03 Prof. Roy Levow.
Introduction Lecture 1 Wed, Jan 12, The Stages of Compilation Lexical analysis. Syntactic analysis. Semantic analysis. Intermediate code generation.
Compilers: Overview/1 1 Compiler Structures Objective – –what are the main features (structures) in a compiler? , Semester 1,
. n COMPILERS n n AND n n INTERPRETERS. -Compilers nA compiler is a program thatt reads a program written in one language - the source language- and translates.
Programming Languages
Winter Compilers Software Eng. Dept. – Ort Braude Compiling Assignments and Expressions Lecturer: Esti Stein brd4.ort.org.il/~esti2.
Intermediate Code Representations
Compiler Introduction 1 Kavita Patel. Outlines 2  1.1 What Do Compilers Do?  1.2 The Structure of a Compiler  1.3 Compilation Process  1.4 Phases.
Syntax (2).
1 Control Flow Analysis Topic today Representation and Analysis Paper (Sections 1, 2) For next class: Read Representation and Analysis Paper (Section 3)
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Compiler Construction CPCS302 Dr. Manal Abdulaziz.
1 Structure of a Compiler Source Language Target Language Semantic Analyzer Syntax Analyzer Lexical Analyzer Front End Code Optimizer Target Code Generator.
Language Implementation Overview John Keyser Spring 2016.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 10 Ahmed Ezzat.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Road Map Regular Exprs, Context-Free Grammars Regular Exprs, Context-Free Grammars LR parsing algorithm LR parsing algorithm Building LR parse tables Building.
CS 404 Introduction to Compiler Design
Intermediate code Jakub Yaghob
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Compiler Chapter 9. Intermediate Languages
An Overview to Compiler Design
Chapter 6 Intermediate-Code Generation
CSE401 Introduction to Compiler Construction
CS 201 Compiler Construction
Compilers B V Sai Aravind (11CS10008).
Three-address code A more common representation is THREE-ADDRESS CODE . Three address code is close to assembly language, making machine code generation.
Compiler Structures 1. Overview Objective
Review: What is an activation record?
Intermediate Code Generating machine-independent intermediate form.
CS 201 Compiler Construction
Presentation transcript:

ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1

Motivation(s)  Where do you see PA in your everyday life?  How does PA “work”?  What is PA anyway? 2

Auto-completion 3

Pre-compilation error detection  Ex: missing parenthesis 4

How do you know... int a; increment_a() { a ++; } while(true) { String a = “hello”; increment_a(); } This “a” is not that “a” 5

How do you remember... int a; increment_a() { a ++; } while(true) { String a = “hello”; increment_a(); } Wait, what’s the type of “a” again? “a” is of type int (FYI...) 6

Outline  Introduction/motivations  Program representation  AST  3-address code  Control flow analysis  Data flow 7

Intermediate Representation (IR)  Initial Point  Abstract Syntax Tree  Abstract vs Concrete Syntax  Parse Tree vs Abstract Syntax Tree  Three-address Codes 8

IR-1 Starting Point Parsing, Lexical Analysis Code Generation, Optimization Code Execution Source code Intermediate representation Target code Analyze IR – Perform analysis on the results Use this information for applications 9

IR-2. Abstract Syntax Tree (AST)  Concrete vs Abstract Syntax  Concrete show structure and is language-specific  Abstract shows structure  Representations  Parse Tree represents Concrete Syntax  Abstract Syntax Tree represents Abstract Syntax 10

IR-2. Example : Grammar  Example  a:= b+c (Language 1)  a = b+c; (Language 2)  Grammar for 1 Ÿstmtlist  stmt | stmt stmtlist stmt  assign | if-then | … assign  ident “:=“ ident binop ident binop  “+” | “-” | …  Grammar for 2 Ÿstmtlist  stmt “;”| stmt “;” stmtlist stmt  assign | if-then | … assign  ident “=“ ident binop ident binop  “+” | “-” | … 11

IR-2. Example: Parse Tree stmtlist stmt assign Ident := ident binop ident a b “+” c Parse Tree for a:=b+cParse Tree for a=b+c; stmtlist stmt “;” assign Ident = ident binop ident a b “+” c 12

IR-2 Example: Abstract Syntax Tree Example 1. a:=b+c 2. a=b+c;  Abstract Syntax Tree for 1 and 2 assign a add b c 13

IR-3. Three Address Code  General form: x = y op z  More generally: (operator, operand1, operand2, result)  (at most 3 spots besides the operator)  May include temporary variables  Examples  Assignment Binary x:= y op z (op, y, z, x) Unary x := op y (op, v, _, x)  Copy x:=y (_, y, _, x)  Jumps Unconditional goto L (goto, L, _, _) Conditional if x relop y goto L (relop, x, y, L)  …. 14

IR-3. Example: Three Address Code if a>10 then x=y+z else x=y-z  1. if a>10 goto 4  2. x = y-z  3. goto 5  4. x = y + z  5. ….. 15

Analysis Levels  Local  within a single basic block or statement  Intraprocedural  within a single procedure, function, or method  Interprocedural  across procedure boundaries, procedure call, shared globals, etc  Intraclass  within a single class  Interclass  across class boundaries  ….. 16

Outline  Introduction/motivations  Program representation  Control flow analysis  Computing Control Flow (analysis and representation)  Search and Traversals  Applications  Data flow 17

Computing Control flow (example) Procedure AVG S1count=0; S2 fread(fptr, n) S3 while(not EOF) do S4 if(n<0) S5 return(error) else S6 nums[count]=n S7 count++ endif S8 fread(fptr, n); endwhile S9 avg= mean(nums, count) S10 return (avg) S1 S2 S3 S4 S5 S10 S6 S9 S8 S7 EXIT entry 18

CF1: Control Flow (Basic Blocks)  A basic block is a sequence of consecutive statements in which flow of control enters at the beginning and leaves at the end without halt of possibility of branch except at the end  A basic block may or may not be maximal  For compiler optimizations, maximal blocks are desirable  For software engineering tasks, basic blocks that represent one source code statement are often used 19

Computing Control flow (example) Procedure AVG S1count=0; S2 fread(fptr, n) S3 while(not EOF) do S4 if(n<0) S5 return(error) else S6 nums[count]=n S7 count++ endif S8 fread(fptr, n); endwhile S9 avg= mean(nums, count) S10 return (avg) S1 S2 S3 S4 S5 S10 S6 S9 S8 S7 EXIT entry 20

CF1: Computing Control Flow  Input: A list of program statements in some form  Output: A list of CFG nodes and edges  Procedure:  Construct basic blocks  Create entry exit nodes; create edge (entry, B1); create (exit, Bk) for each Bk that represents an exit from program  Add CFG edge from Bi to Bj if Bj can immediately follow Bi in some execution i.e., There is conditional or unconditional goto from last statement of Bi to first statement of Bj or Bj immediately follows Bi in the order of the program and Bi does not end in unconditional goto statement  Label edges that represent conditional transfers of control 21

CF2: Search and Ordering  Many ways to visit the nodes in the graph  Depth First Search: Visits descendants of the node before visiting any of its siblings  Breadth First Search: All of the node’s immediate descendants are processed before any of their unprocessed children  Preorder Traversal: A node is processed before its descendants  Postorder Traversal: A node is processed after its descendants 22

CF2: Search and Ordering (cont’d) (DFS)  One DFS of CFG 1  3  4  6  7  8  10,back to 8,  9, back to 8, 7,6,4,  5, back to 4,3,1,  2,back to 1  The number assigned to a node during DFS is its depth first number  Depth first ordering of nodes is the reverse of the order in which nodes are visited in DFS  For the DFS, nodes are visited 1,3,4,6,7,8,10,8,9,8,7,6,5,4,3,1,2,1  Depth first ordering is 1,2,3,4,5,6,7,8,9, S3 S4 S5 S10 S6 S9 S8 S7

CF: Types of Edges  Depth first representation is depth first spanning tree along with other edges not part of the tree; tree edges, other edges  Three kinds of edges  Advanced (forward) edges: go from a node to one of its proper descendants in the tree; these include tree edges  Back edges: go from a node to one of its ancestor in the tree  Cross edges: connect nodes such that neither is an ancestor of the other 24

Applications of Control Flow  Complexity – Pointers to refactoring  Testing  Branch, Path, Basis Path  Branch: Must test 1-2, 1-3, 4-5, 4-8, 5-6, 5-7  Path: Infinite, due to loop  Basis Path: Set of paths which covers all the edges at least once e.g. 1,2,4,8; 1,3,4,5,6,7,4,8  Program Understanding  Recover program structure  Impact analysis  …

Outline  Introduction/motivations  Program representation  Control flow  Data flow  Introduction  Reaching definitions 26

Data flow - Introduction  Flow of various data throughout the program  Obtained from AST or CFG  Used in software engineering tasks  Exact solutions to most data flow problems are undecidable  May depend on input  May depend on the outcome of a conditional statement  May depend on termination of loop  Thus we compute approximations of the exact solution 27

Data flow - Introduction  Some Approximations “overestimate” the solution  Approximations contain actual information plus some spurious information but does not omit any actual information  Conservative and safe approach  Some Approximations “underestimate” the solution  Approximations may not contain all the information of the actual solution  Unsafe  Research challenge: Providing safe but precise information in an efficient way  Uses of data flow:  Compiler optimization requires conservative analysis  Software engineering tasks may only need unsafe info 28

Data flow – Compiler Optimization  Common subexpression elimination c=a+b =a e=a+b =a d=a+b =a 29

Data flow – Compiler Optimization  Common subexpression elimination Need to know available expressions: which expressions have been computed at that point before this statement c=a+b =a e=a+b =a d=a+b =a t=a+b c=t c=a t=a+b d=t c=a e=t =a 30

Data Flow - Compiler Optimization  Register (de)allocation  When assigning memory locations to registers, if a value in a register (ie a memory location) is not used again, no need to keep it in a register   Is R2 needed after this statement?  Need to know “live variables”: which variables are still used after current line R1=R2+10 =a 31

Data Flow - Compiler Optimization  Suppose every assignment that reaches this statement assigns 5 to c  then ‘a’ can be replaced by 15  But: Need to know reaching definitions: which definition(s) of variable c reach this statement a=c+10 // need 3 registers =a 32 a=15 //need 2 registers /a

Data Flow - Sw Eng Tasks  Data-Flow testing  Suppose that a statement assigns a value but the use of that value is never executed under test a never used on this path  Need to know definition use pairs: link between definition(s) and use(s) of a variable (or a memory location) a=c+10 =a d=a+y =a 33

Data Flow - Sw Eng Tasks  Debugging  Suppose that ‘a’ has an incorrect value in the statement Eg int overflow  Need data dependence information: some statements produce erroneous values, others are affected by those values a=c+y =a d=a+y =a 34

Data flow - Example  Compute the flow of data throughout the program  Where does the assignment to i in statement 1 reach?  Where does the expression computed in statement 2 reach?  Which uses of variable are reachable from the end of Block1?  Is the value of variable i live after statement 2? 1.i=2 2.k=i+1 3. i=1 4. k=k+1 5. k=k-4 35 B1 B2 B3 B4

Reaching definitions analysis  Definition = statement where a variable is assigned a value (e.g. input statement, assignment statement)  A definition of ‘a’ reaches a point ‘p’ if there exists a control flow path in the CFG from the definition to ‘p’ with no other definitions of ‘a’ on the path  Such a path may exist in the graph but may not be possible – infeasible path 36 1.i=2 2.k=i+1 3. i=1 4. k=k+1 5. k=k-4 B1 B2 B3 B4

Reaching definitions analysis  What are the definitions in the program?  Of variable i:  Of variable k:  Which basic blocks (before block) do these definitions reach?  Def 1 reaches:  Def 2 reaches:  Def 3 reaches:  Def 4 reaches:  Def 5 reaches: 37 1.i=2 2.k=i+1 3. i=1 4. k=k+1 5. k=k-4 B1 B2 B3 B4

Reaching definitions analysis  What are the definitions in the program?  Of variable i: 1,3  Of variable k: 2,4,5  Which basic blocks (before block) do these definitions reach?  Def 1 reaches: B2  Def 2 reaches: B1, B2, B3  Def 3 reaches: B1, B3, B4  Def 4 reaches: B4  Def 5 reaches: exit 38 1.i=2 2.k=i+1 3. i=1 4. k=k+1 5. k=k-4 B1 B2 B3 B4

Reaching definitions analysis  Method  Compute two kinds of basic information (within the block) Gen[B]: set of definitions generated within B Kill[B]: set of definitions that, if they reach the point before B, won’t reach end of B  Compute two other sets by propagation IN[B]: set of definitions the reach the beginning of B OUT[B]: set of definitions that reach the end of B 39 1.i=2 2.k=i+1 3. i=1 4. k=k+1 5. k=k-4 B1 B2 B3 B4

Reaching definitions analysis Init GEN Init KILL Init IN Init OUT INOUT 11,23,4,5--1,22,31, ,22,3 342,5--42,33,4 452,4--53,43, i=2 2.k=i+1 3. i=1 4. k=k+1 5. k=k-4 B1 B2 B3 B4

Iterative Data-Flow analysis algorithm  Algorithm for Reaching Definitions  Input: CFG with GEN[B], KILL[B] for all B  Output: IN[B], OUT[B] for all B Begin RD IN[B]=empty, OUT[B]=GEN[B] for all B; change = true While change do begin change=false For each B do begin IN[B]=union OUT[P] (P is a predecessor of B) OLDOUT=OUT[B] OUT[B]=GEN[B] union (IN[B]-KILL[B]) if (OUT[B]!=OLDOUT) then change = true; End for End while End RD 41

Tools 42  Eclipse JDT/AST (APIs to construct, traverse and manipulate AST)  Sourcerer  Crystal (Data Analysis Framework, mostly for academic purposes)

Mandatory Reading List 43  Representation and Analysis of Software – Rep- Analysis.pdf  Crystal Notes – CrystalTutorialNotes.pdf, CrystalTutorial.ppt  Eclipse JDT - AST -

More (optional) Reading List 44  Principles of Program Analysis, Nielson and Hankin  Invariant Detection using Daikon – daikon.pdf  More optional readings available at Program Analysis course material at CMU