Download presentation
Presentation is loading. Please wait.
Published byGalilea Hildreth Modified over 9 years ago
1
ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1
2
Motivation(s) Where do you see PA in your everyday life? How does PA “work”? What is PA anyway? 2
3
Auto-completion 3
4
Pre-compilation error detection Ex: missing parenthesis 4
5
How do you know... int a; increment_a() { a ++; } while(true) { String a = “hello”; increment_a(); } This “a” is not that “a” 5
6
How do you remember... int a; increment_a() { a ++; } while(true) { String a = “hello”; increment_a(); } Wait, what’s the type of “a” again? “a” is of type int (FYI...) 6
7
Outline Introduction/motivations Program representation AST 3-address code Control flow analysis Data flow 7
8
Intermediate Representation (IR) Initial Point Abstract Syntax Tree Abstract vs Concrete Syntax Parse Tree vs Abstract Syntax Tree Three-address Codes 8
9
IR-1 Starting Point Parsing, Lexical Analysis Code Generation, Optimization Code Execution Source code Intermediate representation Target code Analyze IR – Perform analysis on the results Use this information for applications 9
10
IR-2. Abstract Syntax Tree (AST) Concrete vs Abstract Syntax Concrete show structure and is language-specific Abstract shows structure Representations Parse Tree represents Concrete Syntax Abstract Syntax Tree represents Abstract Syntax 10
11
IR-2. Example : Grammar Example a:= b+c (Language 1) a = b+c; (Language 2) Grammar for 1 stmtlist stmt | stmt stmtlist stmt assign | if-then | … assign ident “:=“ ident binop ident binop “+” | “-” | … Grammar for 2 stmtlist stmt “;”| stmt “;” stmtlist stmt assign | if-then | … assign ident “=“ ident binop ident binop “+” | “-” | … 11
12
IR-2. Example: Parse Tree stmtlist stmt assign Ident := ident binop ident a b “+” c Parse Tree for a:=b+cParse Tree for a=b+c; stmtlist stmt “;” assign Ident = ident binop ident a b “+” c 12
13
IR-2 Example: Abstract Syntax Tree Example 1. a:=b+c 2. a=b+c; Abstract Syntax Tree for 1 and 2 assign a add b c 13
14
IR-3. Three Address Code General form: x = y op z More generally: (operator, operand1, operand2, result) (at most 3 spots besides the operator) May include temporary variables Examples Assignment Binary x:= y op z (op, y, z, x) Unary x := op y (op, v, _, x) Copy x:=y (_, y, _, x) Jumps Unconditional goto L (goto, L, _, _) Conditional if x relop y goto L (relop, x, y, L) …. 14
15
IR-3. Example: Three Address Code if a>10 then x=y+z else x=y-z 1. if a>10 goto 4 2. x = y-z 3. goto 5 4. x = y + z 5. ….. 15
16
Analysis Levels Local within a single basic block or statement Intraprocedural within a single procedure, function, or method Interprocedural across procedure boundaries, procedure call, shared globals, etc Intraclass within a single class Interclass across class boundaries ….. 16
17
Outline Introduction/motivations Program representation Control flow analysis Computing Control Flow (analysis and representation) Search and Traversals Applications Data flow 17
18
Computing Control flow (example) Procedure AVG S1count=0; S2 fread(fptr, n) S3 while(not EOF) do S4 if(n<0) S5 return(error) else S6 nums[count]=n S7 count++ endif S8 fread(fptr, n); endwhile S9 avg= mean(nums, count) S10 return (avg) S1 S2 S3 S4 S5 S10 S6 S9 S8 S7 EXIT entry 18
19
CF1: Control Flow (Basic Blocks) A basic block is a sequence of consecutive statements in which flow of control enters at the beginning and leaves at the end without halt of possibility of branch except at the end A basic block may or may not be maximal For compiler optimizations, maximal blocks are desirable For software engineering tasks, basic blocks that represent one source code statement are often used 19
20
Computing Control flow (example) Procedure AVG S1count=0; S2 fread(fptr, n) S3 while(not EOF) do S4 if(n<0) S5 return(error) else S6 nums[count]=n S7 count++ endif S8 fread(fptr, n); endwhile S9 avg= mean(nums, count) S10 return (avg) S1 S2 S3 S4 S5 S10 S6 S9 S8 S7 EXIT entry 20
21
CF1: Computing Control Flow Input: A list of program statements in some form Output: A list of CFG nodes and edges Procedure: Construct basic blocks Create entry exit nodes; create edge (entry, B1); create (exit, Bk) for each Bk that represents an exit from program Add CFG edge from Bi to Bj if Bj can immediately follow Bi in some execution i.e., There is conditional or unconditional goto from last statement of Bi to first statement of Bj or Bj immediately follows Bi in the order of the program and Bi does not end in unconditional goto statement Label edges that represent conditional transfers of control 21
22
CF2: Search and Ordering Many ways to visit the nodes in the graph Depth First Search: Visits descendants of the node before visiting any of its siblings Breadth First Search: All of the node’s immediate descendants are processed before any of their unprocessed children Preorder Traversal: A node is processed before its descendants Postorder Traversal: A node is processed after its descendants 22
23
CF2: Search and Ordering (cont’d) (DFS) One DFS of CFG 1 3 4 6 7 8 10,back to 8, 9, back to 8, 7,6,4, 5, back to 4,3,1, 2,back to 1 The number assigned to a node during DFS is its depth first number Depth first ordering of nodes is the reverse of the order in which nodes are visited in DFS For the DFS, nodes are visited 1,3,4,6,7,8,10,8,9,8,7,6,5,4,3,1,2,1 Depth first ordering is 1,2,3,4,5,6,7,8,9,10 23 1 2 S3 S4 S5 S10 S6 S9 S8 S7
24
CF: Types of Edges Depth first representation is depth first spanning tree along with other edges not part of the tree; tree edges, other edges Three kinds of edges Advanced (forward) edges: go from a node to one of its proper descendants in the tree; these include tree edges Back edges: go from a node to one of its ancestor in the tree Cross edges: connect nodes such that neither is an ancestor of the other 24
25
Applications of Control Flow Complexity – Pointers to refactoring Testing Branch, Path, Basis Path Branch: Must test 1-2, 1-3, 4-5, 4-8, 5-6, 5-7 Path: Infinite, due to loop Basis Path: Set of paths which covers all the edges at least once e.g. 1,2,4,8; 1,3,4,5,6,7,4,8 Program Understanding Recover program structure Impact analysis ….. 25 1 23 4 8 6 5 7
26
Outline Introduction/motivations Program representation Control flow Data flow Introduction Reaching definitions 26
27
Data flow - Introduction Flow of various data throughout the program Obtained from AST or CFG Used in software engineering tasks Exact solutions to most data flow problems are undecidable May depend on input May depend on the outcome of a conditional statement May depend on termination of loop Thus we compute approximations of the exact solution 27
28
Data flow - Introduction Some Approximations “overestimate” the solution Approximations contain actual information plus some spurious information but does not omit any actual information Conservative and safe approach Some Approximations “underestimate” the solution Approximations may not contain all the information of the actual solution Unsafe Research challenge: Providing safe but precise information in an efficient way Uses of data flow: Compiler optimization requires conservative analysis Software engineering tasks may only need unsafe info 28
29
Data flow – Compiler Optimization Common subexpression elimination c=a+b =a e=a+b =a d=a+b =a 29
30
Data flow – Compiler Optimization Common subexpression elimination Need to know available expressions: which expressions have been computed at that point before this statement c=a+b =a e=a+b =a d=a+b =a t=a+b c=t c=a t=a+b d=t c=a e=t =a 30
31
Data Flow - Compiler Optimization Register (de)allocation When assigning memory locations to registers, if a value in a register (ie a memory location) is not used again, no need to keep it in a register Is R2 needed after this statement? Need to know “live variables”: which variables are still used after current line R1=R2+10 =a 31
32
Data Flow - Compiler Optimization Suppose every assignment that reaches this statement assigns 5 to c then ‘a’ can be replaced by 15 But: Need to know reaching definitions: which definition(s) of variable c reach this statement a=c+10 // need 3 registers =a 32 a=15 //need 2 registers /a
33
Data Flow - Sw Eng Tasks Data-Flow testing Suppose that a statement assigns a value but the use of that value is never executed under test a never used on this path Need to know definition use pairs: link between definition(s) and use(s) of a variable (or a memory location) a=c+10 =a d=a+y =a 33
34
Data Flow - Sw Eng Tasks Debugging Suppose that ‘a’ has an incorrect value in the statement Eg int overflow Need data dependence information: some statements produce erroneous values, others are affected by those values a=c+y =a d=a+y =a 34
35
Data flow - Example Compute the flow of data throughout the program Where does the assignment to i in statement 1 reach? Where does the expression computed in statement 2 reach? Which uses of variable are reachable from the end of Block1? Is the value of variable i live after statement 2? 1.i=2 2.k=i+1 3. i=1 4. k=k+1 5. k=k-4 35 B1 B2 B3 B4
36
Reaching definitions analysis Definition = statement where a variable is assigned a value (e.g. input statement, assignment statement) A definition of ‘a’ reaches a point ‘p’ if there exists a control flow path in the CFG from the definition to ‘p’ with no other definitions of ‘a’ on the path Such a path may exist in the graph but may not be possible – infeasible path 36 1.i=2 2.k=i+1 3. i=1 4. k=k+1 5. k=k-4 B1 B2 B3 B4
37
Reaching definitions analysis What are the definitions in the program? Of variable i: Of variable k: Which basic blocks (before block) do these definitions reach? Def 1 reaches: Def 2 reaches: Def 3 reaches: Def 4 reaches: Def 5 reaches: 37 1.i=2 2.k=i+1 3. i=1 4. k=k+1 5. k=k-4 B1 B2 B3 B4
38
Reaching definitions analysis What are the definitions in the program? Of variable i: 1,3 Of variable k: 2,4,5 Which basic blocks (before block) do these definitions reach? Def 1 reaches: B2 Def 2 reaches: B1, B2, B3 Def 3 reaches: B1, B3, B4 Def 4 reaches: B4 Def 5 reaches: exit 38 1.i=2 2.k=i+1 3. i=1 4. k=k+1 5. k=k-4 B1 B2 B3 B4
39
Reaching definitions analysis Method Compute two kinds of basic information (within the block) Gen[B]: set of definitions generated within B Kill[B]: set of definitions that, if they reach the point before B, won’t reach end of B Compute two other sets by propagation IN[B]: set of definitions the reach the beginning of B OUT[B]: set of definitions that reach the end of B 39 1.i=2 2.k=i+1 3. i=1 4. k=k+1 5. k=k-4 B1 B2 B3 B4
40
Reaching definitions analysis Init GEN Init KILL Init IN Init OUT INOUT 11,23,4,5--1,22,31,2 231--31,22,3 342,5--42,33,4 452,4--53,43,5 40 1.i=2 2.k=i+1 3. i=1 4. k=k+1 5. k=k-4 B1 B2 B3 B4
41
Iterative Data-Flow analysis algorithm Algorithm for Reaching Definitions Input: CFG with GEN[B], KILL[B] for all B Output: IN[B], OUT[B] for all B Begin RD IN[B]=empty, OUT[B]=GEN[B] for all B; change = true While change do begin change=false For each B do begin IN[B]=union OUT[P] (P is a predecessor of B) OLDOUT=OUT[B] OUT[B]=GEN[B] union (IN[B]-KILL[B]) if (OUT[B]!=OLDOUT) then change = true; End for End while End RD 41
42
Tools 42 Eclipse JDT/AST (APIs to construct, traverse and manipulate AST) http://www.vogella.de/articles/EclipseJDT/article.html Sourcerer http://sourcerer.ics.uci.edu/index.html Crystal (Data Analysis Framework, mostly for academic purposes) http://code.google.com/p/crystalsaf/wiki/Installation
43
Mandatory Reading List 43 Representation and Analysis of Software – Rep- Analysis.pdf Crystal Notes – CrystalTutorialNotes.pdf, CrystalTutorial.ppt Eclipse JDT - AST - http://www.vogella.de/articles/EclipseJDT/article.html http://www.vogella.de/articles/EclipseJDT/article.html
44
More (optional) Reading List 44 Principles of Program Analysis, Nielson and Hankin Invariant Detection using Daikon – daikon.pdf More optional readings available at Program Analysis course material at CMU http://www.cs.cmu.edu/~aldrich/courses/15-819M/ http://www.cs.cmu.edu/~aldrich/courses/15-819M/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.