Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lexical Analysis Uses formalism of Regular Languages

Similar presentations


Presentation on theme: "Lexical Analysis Uses formalism of Regular Languages"— Presentation transcript:

1 Lexical Analysis Uses formalism of Regular Languages
Regular Expressions Deterministic Finite Automata (DFA) Non-deterministic Finite Automata (NDFA) RE  NDFA  DFA  minimal DFA (F)Lex uses RE as input, builds lexor

2 Regular Expressions Regular expression (over S)  e a where aS r+r’
where r,r’ regular (over S) Notational shorthand: r0 = e, ri = rri-1 r+ = rr*

3 DFAs: Formal Definition
DFA M = (Q, S, d, q0, F) Q = states finite set S = alphabet finite set d = transition function function in Q  S  Q q0 = initial/starting state q0  Q F = final states F  Q

4 strings over {a,b} with next-to-last symbol = a
DFAs: Example strings over {a,b} with next-to-last symbol = a …aa …ab a …ba …bb e b

5 Nondeterministic Finite Automata
“Nondeterminism” implies having a choice. Multiple possible transitions from a state on a symbol. d(q,a) is a set of states d : Q  S  Pow(Q) Can be empty, so no need for error/nonsense state. Acceptance: exist path to a final state? I.e., try all choices. Also allow transitions on no input: d : Q  (S  {e})  Pow(Q)

6 NFAs: Example strings over {a,b} with next-to-last symbol = a …aS …a …
Loop until we “guess” which is the next-to-last a. a S …aS …a

7 CFGs: Formal Definition
G = (V, S, P, S) V = variables, a finite set S = alphabet or terminals a finite set P = productions, a finite set S = start variable, SV Productions’ form, where AV, a(VS)*: A  a

8 CFGs: Derivations Derivations in one step:
bAg G bag  Aa  P  xS*, a,b,g(VS)* Can choose any variable for use for derivation step. Derivations in zero-or-more steps: G* is the reflexive and transitive closure of G . Language of a grammar: L(G) = {xS* | S G* x}

9 Parse Trees Sample derivations:
S  A | A B A  e | a | A b | A A B  b | b c | B c | b B Sample derivations: S  AB  AAB  aAB  aaB  aabB  aabb S  AB  AbB  Abb  AAbb  Aabb  aabb These two derivations use same productions, but in different orders. S A B b a Root label = start node. Each interior label = variable. Each parent/child relation = derivation step. Each leaf label = terminal or e. All leaf labels together = derived string = yield.

10 Left- & Rightmost Derivations
S  A | A B A  e | a | A b | A A B  b | b c | B c | b B Sample derivations: S  AB  AAB  aAB  aaB  aabB  aabb S  AB  AbB  Abb  AAbb  Aabb  aabb S A B b a These two derivations are special. 1st derivation is leftmost. Always picks leftmost variable. 2nd derivation is rightmost. Always picks rightmost variable.

11 Disambiguation Example
Exp  n | Exp + Exp | Exp  Exp What is an equivalent unambiguous grammar? Exp  Term | Term + Exp Term  n | n  Term Uses operator precedence left-associativity

12 Parsing Designations Major parsing algorithm classes are LL and LR
The first letter indicates what order the input is read – L means left to right Second letter is direction in the “parsing tree” the derivation goes, L = top down, R = bottom up K of LL(k) or LR(k) is number of symbols lookahead in input during parsing Power of parsing techniques LL(k) < LR(k) LL(n) < LL(n+1), LR(n) < LR(n+1) Choice of LL or LR largely religious

13 Items and Itemsets An itemset is merely a set of items
In LR parsing terminology an item Looks like a production with a ‘.’ in it The ‘.’ indicates how far the parse has gone in recognizing a string that matches this production e.g. A -> aAb.BcC suggests that we’ve “seen” input that could replace aAb. If, by following the rules we get A -> aAbBcC. we can reduce by A -> aAbBcC

14 Building LR(0) Itemsets
Start with an augmented grammar; if S is the grammar start symbol add S’ -> S The first set of items includes the closure of S’ -> S Itemset construction requires two functions Closure Goto

15 Closure of LR(0) Itemset
If J is a set of items for Grammar G, then closure(J) is the set of items constructed from G by two rules 1) Each item in J is added to closure(J) 2) If A  α.Bβ is in closure(J) and B  φ is a production, add B  .φ to closure(J)

16 Closure Example Grammar: A  aBC A  aA B  bB B bC C cC C  λ
Closure(J) A  a.BC A-> a.A A  .aBC A  .aA B  .bB B  .bC J A  a.BC A a.A

17 GoTo Goto(J,X) where J is a set of items and X is a grammar symbol – either terminal or non-terminal is defined to be closure of A αX.β for A  α.Xβ in J So, in English, Goto(J,X) is the closure of all items in J which have a ‘.’ immediately preceding X

18 Set of Items Construction
Procedure items(G’) Begin C = {closure({[S’  .S]})} repeat for each set of items J in C and each grammar symbol X such that GoTo(J,X) is not empty and not in C do add GoTo(J,X) to C until no more sets of items can be added to C

19 Build LR(0) Itemsets for:
{S  (S), S  λ} {S  (S), S  SS, S  λ}

20 Building LR(0) Table from Itemsets
One row for each Itemset One column for each terminal or non-terminal symbol, and one for $ Table [J][X] is: Rn if J includes A  rhs., A  rhs is rule number n, and X is a terminal Sn if Goto(J,X) is itemset n

21 LR(0) Parse Table for: {S  (S), S  λ} {S  (S), S  SS, S  λ}

22 Building SLR Table from Itemsets
One row for each Itemset One column for each terminal or non-terminal symbol, and one for $ Table [J][X] is: Rn if J includes A  rhs., A  rhs is rule number n, X is a terminal, AND X is in Follow(A) Sn if Goto(J,X) is itemset n

23 LR(0) and LR(1) Items LR(0) item “is” a production with a ‘.’ in it.
LR(1) item has a “kernel” that looks like LR(0), but also has a “lookahead” – e.g. A  α.Xβ, {terminals} A  α.Xβ, a/b/c ≠ A  α.Xβ, a/b/d

24 Closure of LR(1) Itemset
If J is a set of LR(1) items for Grammar G, then closure(J) includes 1) Each LR(1) item in J 2) If A  α.Bβ, a in closure(J) and B  φ is a production, add B  .φ, First(β,a) to closure(J)

25 LR(1) Itemset Construction
Procedure items(G’) Begin C = {closure({[S’  .S, $]})} repeat for each set of items J in C and each grammar symbol X such that GoTo(J,X) is not empty and not in C do add GoTo(J,X) to C until no more sets of items can be added to C

26 Build LR(1) Itemsets for:
{S  (S), S  SS, S  λ}

27 {S  CC, S  cC, C d} Is this grammar LR(0)? SLR? LR(1)?
How can we tell?

28 LR(1) Table from LR(1) Itemsets
One row for each Itemset One column for each terminal or non-terminal symbol, and one for $ Table [J][X] is: Rn if J includes A  rhs., a; A  rhs is rule number n; X = a Sn if Goto(J,X) in LR(1) itemset n

29 LALR(1) Parsing LookAhead LR (1) Start with LR(1) items
LALR(1) items --- combine LR(1) items with same kernel, different lookahead sets Build table just as LR(1) table but use LALR(1) items Same number of states (row) as LR(0)

30 Code Generation Pick three registers to be used throughout
Assuming stmt of form dest = s1 op s2 Generate code by: Load source 1 into r5 Load source 2 into r6 R7 = r5 op r6 Store r7 into destination

31 Three-Address Code section 6.2.1 (new), pp 467 (old)
Assembler for generic computer Types of statements 3-address (Dragon) Assignment statement x = y op z Unconditional jump br label Conditional jump if( cond ) goto label Parameter x Call statement call f

32 Example “Source” a = ((c-1) * b) + (-c * b)

33 t1 = c - 1 t2 = b * t1 t3 = -c t4 = t3 * b t5 = t2 + t4 a = t5
Example 3-Address t1 = c - 1 t2 = b * t1 t3 = -c t4 = t3 * b t5 = t2 + t4 a = t5

34 Three-Address Implementation (Quadruples, sec 6.2.2; pp 470-2)
op arg1 arg2 result - c 1 t1 * b t2 2 uminus t3 3 t4 4 + t5 5 = a

35 Three-Address Implementation (Triples, section 6.2.3)
op arg1 arg2 - c 1 * b (0) 2 uminus 3 (2) 4 + (1) (3) 5 = a (4)

36 Three-Address Implementation
N-tuples (my choice – and yours ??) Lhs = oper(op1, op2, …, opn) Lhs = call(func, arg1, arg2, … argn) If condOper(op1, op2, Label) br Label

37 Three-Address Code 3-address operands Variable Constant Array Pointer

38 Variable Storage Memory Locations (Logical) Variable Classes Stack
Heap Program Code Register Variable Classes Automatic (locals) Parameters Globals

39 Variable Types Scalars Arrays Structs Unions Objects ?

40 Row Major Array Storage
char A[20][15][10]; 1000 A[0][0][0] . . . 1150 A[1][0][0] 1160 A[1][1][0] 1161 A[1][1][1] 3999 A[19][14][9]

41 Column Major Array Storage
char A[20][15][10]; 1000 A[0][0][0] 1001 A[1][0][0] . . . 1021 A[1][1][0] 1321 A[1][1][1] 3999 A[19][14][9]

42 OR (Row Major) char A[20][15][10]; 3999 A[0][0][0] . . . 3849
3839 A[1][1][0] 3838 A[1][1][1] 1000 A[19][14][9]

43 Array Declaration Algorithm
Dimension Node { int min; int max; int size; }

44 Declaration Algorithm (2)
Doubly linked list of dimension nodes Pass 1 – while parsing Build linked list from left to right Insert min, max Size = size of an element Append node to end of list min = max = size = 1

45 Declaration Algorithm (3) Pass 2
Traverse list from tail to head For each node, n, going “right” to “left” Factor = n.max – n.min + 1 For each node, m, right to left starting with n m.size = m.size * factor For each node, n, going left to right N->right->max = max; N->right->min = min Delete first element of list

46 Array Declaration (Row Major)
int weight[ ][1..12][1..31]; list of “dimension” nodes int min, max, size size of element of this dimension 1448 124 4

47 Array Offset (Row Major)
Traverse list summing (max-min) * size int weight[ ][1..12][1..31]; x = [2002][5][31] ( ) * (5-1) * (31-1) * 4 1448 124 4

48 Your Turn Assume “Show” A’s dimension list
int A[10][20][30]; Row major order “Show” A’s dimension list Show hypothetical 3-addr code for X = A[2][3][4] ; A[3][4][5] = 9

49 Your Turn 2 Assume “Show” A’s dimension list
int A[10][20][30]; Column major order “Show” A’s dimension list Show hypothetical 3-addr code for X = A[2][3][4] ; A[3][4][5] = 9

50 Road Map Regular Exprs, Context-Free Grammars LR parsing algorithm
Building LR parse tables Compiling Expressions Build control flow intermediates Generate target code Optimize intermediate

51 Control Constructs Can be cumbersome, but not difficult
“Write” control construct in 3-addr pseudo code using labels and gotos. Map that “control construct” to grammar rule action(s).

52 Semantic Hooks Selection_statement : IF ‘(‘ comma_expr ‘)’ stmt | IF ‘(‘ comma_expr ‘)’ stmt ELSE stmt ; 1 shift/reduce error

53 Add actions (1) Selection_statement : IF ‘(‘ comma_expr ‘)’ {printf(“start IF \n”);} stmt {printf(“IF Body \n”);}

54 Add actions (2) | IF ‘(‘ comma_expr ‘)’ {printf(“start IF \n”);} stmt {printf(“Then Body \n”);} ELSE stmt {printf(“ELSE Body \n”);} ; 31 reduce/reduce errors !

55 Solution (1) Selection_statement : if_start | if_start ELSE stmt {printf(“ELSE body”);} }

56 Solution (2) if_start : IF ‘(‘ comma_expr ‘)’ {printf(“start IF \n”);} stmt {printf(“Then Body \n”);} ; 1 shift-reduce

57 Control Flow Graph sec 8.4 (new); sec 9.4 (old)
Nodes are Basic Blocks Single entry, single exit No branch exempt (possibly) at bottom Edges represent one possible flow of execution between two basic blocks Whole CFG represents a function

58 Bubble Sort begin; int A[10]; main(){ int i,j; Do 10 i = 0, 9, 1
A[i] = random(); Do 20 i = 1, 9, 1 Do 20 j = 1, 9, 1 if( A[j] > A[j+1]) swap(j); }

59 Bubble Sort (cont.) int swap(int i) { int temp; temp = A[i];
A[i] = A[i+1]; A[i+1] = temp; } end;

60 Example Generate 3-addr code for BubbleSort

61 Building CFG alg 8.5 (pp 526-7); alg 9.1(p 529)
Starting with 3-addr code Each leader starts a basic block which ends immediately before the next leader ID “leaders” (heads of basic blocks) First statement is a leader Any statement that is the target of a conditional of unconditional goto is a leader Any statement immediately following a goto or conditional goto is a leader

62 Example Build control flow graphs for BubbleSort

63 “Simple Optimizations”
Once called “Dragon Book” optimizations Now often called “Machine Independent Optimizations” (chapter 9 of text) Common subexpression elimination (CSE) Copy propagation Dead code elimination Partial redundancy elimination

64 Machine Independent Optimization (cont.)
Code motion Induction variable simplification Constant propagation Local vs. Global optimization Interprocedural optimization

65 Loop Optimization “Programs spend 90% of time in loops”
Loop optimizations well studied “simple” optimizations “loop mangeling”

66 Loop Invariant Code Motion
Identify code that computes same value during each iteration Move loop invariant code to above loop “Standard” optimization in most compilers

67 Loop Invariant Example
for (i = 0; i < N; i++) for(j=0; j < N; j++) { c[i][j] = 0; for(k=0; k < N; k++) c[i][j] += a[i][k] * b[k][j]; }

68 Example (cont.) “Assembler” for Innermost (k) loop t2 = t1 + j
L1: t1 = i * N t2 = t1 + j t3 = t2 * 4 t4 = &c + t3 t12 = t1 + k t13 = t12 * 4 t14 = &a + t13 t21 = k * N t22 = t21 + j t23 = t22 * 4 t24 = &b + t23 t31 = *t14 * *t24 *t4 = *t4 + t31 k = k + 1 if( k < N) goto L1 “Assembler” for Innermost (k) loop

69 Example (cont.) t1 = i * N t2 = t1 + j t3 = t2 * 4 t4 = &c + t3
L1: t12 = t1 + k t13 = t12 * 4 t14 = &a + t13 t21 = k * N t22 = t21 + j t23 = t22 * 4 t24 = &b + t23 t31 = *t14 * *t24 *t4 = *t4 + t31 k = k + 1 if( k < N) goto L1

70 Induction Variables Changes by constant amount per iteration
Often used in array address computation Simplification of induction variables Strength reduction --- convert * to +

71 Example (cont.) t1 = i * N t2 = t1 + j t3 = t2 * 4 t4 = &c + t3
t14 = &a t24 = &b t32 = N * 4 t33 = t32 + &a L1: t31 = *t14 * *t24 *t4 = *t4 + t31 t14 = t14 + 4 t24 = t24 + t32 if(t14 < t33) goto L1

72 Loop Transformations More sophisticated
Relatively few compilers include them Loop Interchange – for nested loops Unroll and Jam – for nested loops Loop fusion Loop distribution Loop unrolling

73 Register Usage Keep as many values in registers as possible
Register assignment Register allocation Popular techniques Local vs. global Graph coloring Bin packing

74 Local Register Assignment
Given Control-flow graph of basic blocks List of 3-addr statements per BB Set of “live” scalar values per stmt Sets of scalar values used, defined per stmt Design a local register assignment/allocation algorithm

75 Graph Coloring Assign a color to each node in graph
Two nodes connected by an edge must have different colors Classic problem in graph theory NP complete But good heuristics exist for register allocation

76 Live Ranges def y def x def y def x use y use x use y use x def x

77 Graph Coloring Register Assign
Each value is allocated a (symbolic) register “Variables” interfere iff live ranges overlap Two interfering values cannot share register How can we tell if two values interfere? s1 s2 s3 s4

78 Interference Graph Values and interference Nodes are the values
Edge between two nodes iff they interfere s1 s2 s3 s4

79 Graph Coloring Example

80 Graph Coloring Example
3 Colors

81 Heuristics for Register Coloring
Coloring a graph with N colors For each node, m If degree(m) < N Node can always be colored, because After coloring adjacent nodes, at least one color left for current node If degree(m) >= N Still may be colorable with N colors

82 Heuristics for Register Coloring
Remove nodes that have degree < N Push the removed nodes onto a stack When all the nodes have degree >= N Find a node to spill (no color for that node) Remove that node When graph empty, start to color Pop a node from stack back Color node different from adjacent (colored) nodes

83 Another Coloring Example
s1 s2 s0 s3 s4

84 Another Coloring Example
s1 s2 s0 s4 s3 s4

85 Another Coloring Example
s1 s2 s0 s4 s3 s4

86 Another Coloring Example
s1 s2 s0 s3 s4 s3 s4

87 Another Coloring Example
s1 s2 s2 s0 s3 s4 s3 s4

88 Another Coloring Example
s1 s2 s2 s0 s3 s4 s3 s4

89 Another Coloring Example
s1 s2 s2 s0 s3 s4 s3 s4

90 Another Coloring Example
s1 s2 s0 s3 s4 s3 s4

91 Another Coloring Example
s1 s2 s0 s4 s3 s4

92 Another Coloring Example
s1 s2 s0 s4 s3 s4

93 Another Coloring Example
s1 s2 s0 s3 s4

94 Another Coloring Example
s1 s2 s0 s3 s4

95 Which value to pick? One with interference degree >= N
One with minimal spill cost (cost of placing value in memory rather than in register) What is spill cost? Cost of extra load and store instructions

96 One Way to Compute Spill Cost
Goal: give priority to values used in loops So assume loops execute 10 times Spill cost = defCost + useCost defCost = sum over all definitions of cost of a store times 10nestingDepthOfLoop useCost = sum over all uses of cost of a load times 10nestingDepthOfLoop Choose the value with the lowest spill cost


Download ppt "Lexical Analysis Uses formalism of Regular Languages"

Similar presentations


Ads by Google