Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMP3190: Principle of Programming Languages Formal Language Syntax.

Similar presentations


Presentation on theme: "COMP3190: Principle of Programming Languages Formal Language Syntax."— Presentation transcript:

1 COMP3190: Principle of Programming Languages Formal Language Syntax

2 - 1 - Motivation The problem of parsing structured text is very common Consider the structure of email addresses (using a grammar): := @ := := |. Describe and recognize email addresses in arbitrary text.

3 - 2 - Outline v DFA & NFA v Regular expression v Regular languages v Context free languages &PDA v Scanner v Parser

4 - 3 - Deterministic Finite Automata (DFA) v Q: finite set of states v Σ: finite set of “letters” (alphabet) v δ: QxΣ -> Q (transition function) v q 0 : start state (in Q) v F : set of accept states (subset of Q) v Acceptance: input consumed with the automata in a final state.

5 - 4 - Example of DFA q1 q2 1 0 01 δ01 q1 q2 q1q2 Accepts all strings that end in 1

6 - 5 - Another Example of a DFA S q1 q2 r1 r2 a b a ab b b ab a Accepts all strings that start and end with “a” OR start and end with “b”

7 - 6 - Non-deterministic Finite Automata (NFA) Transition function is different v δ: QxΣ ε -> P(Q) v P(Q) is the powerset of Q (set of all subsets) v Σ ε is the union of Σ and the special symbol ε (denoting empty) String is accepted if there is at least one path leading to an accept state, and input consumed.

8 - 7 - Example of an NFA q1q2q3q4 0, 1 1 0, ε1 0, 1 δ01ε q1{q1}{q1, q2} q2{q3} q3{q4} q4{q4} What strings does this NFA accept?

9 - 8 - Outline v DFA & NFA v Regular expression v Regular languages v Context free languages &PDA v Scanner v Parser

10 - 9 - Regular Expressions R is a regular expression if R is v “a” for some a in Σ. v ε (the empty string). v member of the empty language. v the union of two regular expressions. v the concatenation of two regular expr. v R 1 * (Kleene closure: zero or more repetitions of R 1 ).

11 - 10 - Regular Expression Notation v a: an ordinary letter v ε: the empty string v M | N: choosing from M or N v MN: concatenation of M and N v M*: zero or more times (Kleene star) v M + : one or more times v M?: zero or one occurence v [a-zA-Z] character set alternation (choice) v. period stands for any single char exc. newline

12 - 11 - Examples of Regular Expressions {0, 1}* 0 all strings that end in 0 {0, 1} 0* string that start with 1 or 0 followed by zero or more 0s. {0, 1}* all strings {0 n 1 n, n >=0} not a regular expression!!!

13 - 12 - Converting a Regular Expression to an NFA ε ε ε ε ε M N M M N ε a M|N MN M*

14 - 13 - Regular expression->NFA Language: Strings of 0s and 1s in which the number of 0s is even Regular expression: (1*01*0)*1*

15 - 14 - Converting an NFA to a DFA v For set of states S, closure(S) is the set of states that can be reached from S without consuming any input. v For a set of states S, DFAedge(s, c) is the set of states that can be reached from S by consuming input symbol c. v Each set of NFA states corresponds to one DFA state (hence at most 2 n states).

16 - 15 - NFA -> DFA Initial classes: {A, B, E}, {C, D} No class requires partitioning! Hence a two-state DFA is obtained.

17 - 16 - Obtaining the minimal equivalent DFA v Initially two equivalence classes: final and nonfinal states.  Search for an equivalence class C and an input letter a such that with a as input, the states in C make transitions to states in k>1 different equivalence classes. v Partition C into k classes accordingly v Repeat until unable to find a class to partition.

18 - 17 - Example (cont.)

19 - 18 - Outline v DFA & NFA v Regular expression v Regular languages v Context free languages &PDA v Scanner v Parser

20 - 19 - Regular Grammar v Later definitions build on earlier ones v Nothing defined in terms of itself (no recursion) Regular grammar for numeric literals in Pascal: digit -> 0|1|2|...|8|9 unsigned_integer -> digit digit* unsigned_number -> unsigned_integer ((. unsigned_integer) | ε ) (( e (+ | - | ε ) unsigned_integer ) | ε )

21 - 20 - Languages and Automata in Programming Languages v Regular languages »Recognized(accepted) by finite automata »Useful for tokenizing program text (lexical analysis) v Context-free languages »Recognized(accepted) by pushdown automata »Useful for parsing the syntax of a program

22 - 21 - Important Theorems v A language is regular if a regular expression describes it. v A language is regular if a finite automata recognizes it. v DFAs and NFAs are equally powerful.

23 - 22 - Outline v DFA & NFA v Regular expression v Regular languages v Context free languages &PDA v Scanner v Parser

24 - 23 - Context-free Grammars Context-free grammars are defined by substitution rules Big Jim ate gree cheese green Jim ate green cheese Jim ate cheese Cheese ate Jim P -> N P -> AP S -> PVP A -> big|green N -> cheese|Jim V -> ate

25 - 24 - Context-free Grammars v Context-free grammars are used to formally describe the syntax of programming languages. v Every syntactically correct program is derived using the context-free grammar of the language. v Parsing a program involves tracing such derivation, given the context-free grammar and the program.

26 - 25 - Context-free Grammars A context-free grammar consists of v V: a finite set of variables v Σ: a finite set of terminals v R: a finite set of rules of the form variable -> {variable, terminal}* v S: the start variable

27 - 26 - Pushdown Automata (PDA) v A pushdown automata consists of v Q: a set of states v Σ: input alphabet (of terminals) v Γ: stack alphabet v δ: a set of transition rules Q x Σ ε x Γ ε -> P(Q x Γ ε ) currentState, inputSymbol, headOfStack ->newState, pushSymbolOnStack v q 0 : the start state v F: the set of accept states (subset of Q) Deterministic: At most one move is possible from any configuration

28 - 27 - How does a PDA accept? v By final state: »Consume all the input while »Reaching a final state v By empty stack: »Consume all the input while »Having an empty stack »Set of final states is irrelevant

29 - 28 - Example of a PDA q1 q2 q3 q4 ε, ε ->$ 0, ε->0 1, 0->ε ε, $->ε Notation: a, b->c: when PDA reads “a” from input, it replaces “b” at the top of stack with “c”. What does this PDA accept?

30 - 29 - Important Theorems v A language is context-free iff a pushdown automata recognizes it v Non-deterministic PDA are more powerful than deterministic ones

31 - 30 - Example of Context-free Language That Requires a Non-deterministic PDA {w w R | w belongs to {0, 1}*} i.e. w R is w written backwards Idea: Non-deterministically guess the middle of the input string

32 - 31 - The Solution q1 q2 q3 q4 ε, ε ->$ 0, ε->0 1, ε->1 ε, ε->ε 1, 1->ε 0, 0->ε ε, $->ε

33 - 32 - Derivations and Parse Trees Nested constructs require recursion, i.e. context-free grammars CFG for arithmetic expressions expression -> identifier | number | - expression | (expression) | expression operator expression operator -> + | - | * | /

34 - 33 - Parse Tree for Slope*x + Intercept Is this the only parse tree for this expression and grammar?

35 - 34 - A Better Expression Grammar 1. expression -> term | expression add_op term 2. term -> factor | term mult_op factor 3. factor -> identifier | number | - factor | (expression) 4. add_op -> + | - 5. mult_op -> * | / A good grammar reflects the internal structure of programs. This grammar is unambiguous and captures (HOW?): - operator precedence (*,/ bind tighter than +,- ) - associativity (ops group left to right)

36 - 35 - And Better Parse Trees... 3 + 4 * 5 10 - 4 - 3

37 - 36 - Syntax-directed Compilation v Parser calls scanner to obtain tokens. v Assembles tokens into parse tree. v Passes tree to later phases of compilation. v Scanner: deterministic finite automata. v Parser: pushdown automata. v Scanners and parsers can be generated automatically from regular expressions and CFGs (e.G. lex/yacc).

38 - 37 - Outline v DFA & NFA v Regular expression v Regular languages v Context free languages &PDA v Scanner v Parser

39 - 38 - Scanning v Accept the longest possible token in each invocation of the scanner. v Implementation. »Capture finite automata.  Case(switch) statements.  Table and driver.

40 - 39 - Scanner for Pascal

41 - 40 - Scanner for Pascal(case Statements)

42 - 41 - Scanner (Table&driver)

43 - 42 - Scanner Generators v Start with a regular expression. v Construct an NFA from it. v Use a set of subsets construction to obtain an equivalent DFA. v Construct the minimal equivalent DFA.

44 - 43 - Outline v DFA & NFA v Regular expression v Regular languages v Context free languages &PDA v Scanner v Parser »Top-down parsing »Bottom-up Parsing »Comparison

45 - 44 - Parsing approaches v Parsing in general has O(n 3 ) cost. v Need classes of grammars that can be parsed in linear time »Top-down or predictive parsing or recursive descent parsing or LL parsing (Left-to-right Left-most) »Bottom-up or shift-reduce parsing or LR parsing (Left-to-right Right-most)

46 - 45 - A Simple Grammar for a Comma-separated List of Identifiers id_list -> id id_list_tail id_list_tail ->, id id_list_tail id_list_tail -> ; _________________________ String to be parsed: A, B, C;

47 - 46 - Top-down/bottom-up Parsing

48 - 47 - Outline v DFA & NFA v Regular expression v Regular languages v Context free languages &PDA v Scanner v Parser »Top-down parsing »Bottom-up Parsing »Comparison

49 - 48 - Top-down Parsing v Predicts a derivation v Matches non-terminal against token observed in input

50 - 49 - LL(1) Grammar v A grammar for which a top-down deterministic parser can be produced with one token of look- ahead. v LL(1) grammar: »For a given non-terminal, the lookahead symbol uniquely determines the production to apply »Top-down parsing = predictive parsing »Driven by predictive parsing table of  non-terminals x terminals  productions

51 - 50 - From Last Time: Parsing with Table Partly-derived StringLookaheadparsed part unparsed part  ES’((1+2+(3+4))+5  (S)S’1(1+2+(3+4))+5  (ES’)S’1(1+2+(3+4))+5  (1S’)S’+(1+2+(3+4))+5  (1+ES’)S’2(1+2+(3+4))+5  (1+2S’)S’+(1+2+(3+4))+5 S  ES’S’   | +SE  num | (S) num+()$ S  ES’  ES’ S’  +S     E  num  (S)

52 - 51 - How to Construct Parsing Tables? Needed: Algorithm for automatically generating a predictive parse table from a grammar S  ES’ S’   | +S E  number | (S) num+()$ SES’ES’ S’+S  Enum (S) ??

53 - 52 - Constructing Parse Tables v Can construct predictive parser if: »For every non-terminal, every lookahead symbol can be handled by at most 1 production v FIRST(  ) for an arbitrary string of terminals and non-terminals  is: »Set of symbols that might begin the fully expanded version of  v FOLLOW(X) for a non-terminal X is: »Set of symbols that might follow the derivation of X in the input stream FIRSTFOLLOW X

54 - 53 - Parse Table Entries v Consider a production X   v Add   to the X row for each symbol in FIRST(  ) v If  can derive  (  is nullable), add   for each symbol in FOLLOW(X) v Grammar is LL(1) if no conflicting entries num+()$ SES’ES’ S’+S  Enum (S) S  ES’ S’   | +S E  number | (S)

55 - 54 - Computing Nullable v X is nullable if it can derive the empty string: »If it derives  directly (X   ) »If it has a production X  YZ... where all RHS symbols (Y,Z) are nullable v Algorithm: assume all non-terminals are non- nullable, apply rules repeatedly until no change S  ES’ S’   | +S E  number | (S) Only S’ is nullable

56 - 55 - Computing FIRST v Determining FIRST(X) 1.if X is a terminal, then add X to FIRST(X) 2.if X   then add  to FIRST(X) 3.if X is a nonterminal and X  Y1Y2...Yk then a is in FIRST(X) if a is in FIRST(Yi) and  is in FIRST(Yj) for j = 1...i-1 (i.e., its possible to have an empty prefix Y1... Yi-1 4.if  is in FIRST(Y1Y2...Yk) then  is in FIRST(X)

57 - 56 - FIRST Example S  ES’ S’   | +S E  number | (S) Apply rule 1: FIRST(num) = {num}, FIRST(+) = {+}, etc. Apply rule 2: FIRST(S’) = {  } Apply rule 3: FIRST(S) = FIRST(E) = {} FIRST(S’) = FIRST(‘+’) + {  } = { , + } FIRST(E) = FIRST(num) + FIRST(‘(‘) = {num, ( } Rule 3 again: FIRST(S) = FIRST(E) = {num, ( } FIRST(S’) = { , + } FIRST(E) = {num, ( }

58 - 57 - Computing FOLLOW v Determining FOLLOW(X) 1.if S is the start symbol then $ is in FOLLOW(S) 2.if A   B  then add all FIRST(  ) !=  to FOLLOW(B) 3.if A   B or  B  and  is in FIRST(  ) then add FOLLOW(A) to FOLLOW(B)

59 - 58 - FOLLOW Example S  ES’ S’   | +S E  number | (S) FIRST(S) = {num, ( } FIRST(S’) = { , + } FIRST(E) = { num, ( } Apply rule 1: FOL(S) = {$} Apply rule 2: S  ES’FOL(E) += {FIRST(S’) -  } = {+} S’   | +S- E  num | (S) FOL(S) += {FIRST(‘)’) -  } = {$,) } Apply rule 3:S  ES’FOL(E) += FOL(S) = {+,$,)} (because S’ is nullable) FOL(S’) += FOL(S) = {$,)}

60 - 59 - Putting it all Together FOLLOW(S) = { $, ) } FOLLOW(S’) = { $, ) } FOLLOW(E) = { +, ), $ } FIRST(S) = {num, ( } FIRST(S’) = { , + } FIRST(E) = { num, ( } v Consider a production X   v Add   to the X row for each symbol in FIRST(  ) v If  can derive  (  is nullable), add   for each symbol in FOLLOW(X) num+()$ SES’ES’ S’+S  Enum (S) S  ES’ S’   | +S E  number | (S)

61 - 60 - Ambiguous Grammars Construction of predictive parse table for ambiguous grammar results in conflicts in the table (ie 2 or more productions to apply in same cell) S  S + S | S * S | num FIRST(S+S) = FIRST(S*S) = FIRST(num) = { num }

62 - 61 - Class Problem E  E + T | T T  T * F | F F  (E) | num |  1. Compute FIRST and FOLLOW sets for this G 2. Compute parse table entries

63 - 62 - Top-Down Parsing Up to This Point v Now we know »How to build parsing table for an LL(1) grammar (ie FIRST/FOLLOW) »How to construct recursive-descent parser from parsing table »Call tree = parse tree v Open question – Can we generate the AST?

64 - 63 - Creating the Abstract Syntax Tree v Some class definitions to assist with AST construction v class Expr {} v class Add extends Expr { »Expr left, right; »Add(Expr L, Expr R) {  left = L; right = R; »} v } v class Num extends Expr { »int value; »Num(int v) {value = v;} v } Expr NumAdd Class Hierarchy

65 - 64 - Creating the AST + +5 1+ 2+ 34 (1 + 2 + (3 + 4)) + 5 S E+S ( S )E E + S 5 1 2E ( S ) E + S E34 We got the parse tree from the call tree Just add code to each parsing routine to create the appropriate nodes Works because parse tree and call tree are the same shape, and AST is just a compressed form of the parse tree

66 - 65 - AST Creation: parse_E v Expr parse_E() { »switch (token) {  case num:// E  number u Expr result = Num(token.value); u token = input.read(); return result;  case ‘(‘:// E  (S) u token = input.read(); u Expr result = parse_S(); u if (token != ‘)’) ParseError(); u token = input.read(); return result;  default: ParseError(); »} v } Remember, this is lookahead token S  ES’ S’   | +S E  number | (S)

67 - 66 - AST Creation: parse_S v Expr parse_S() { »switch (token) {  case num:  case ‘(‘:// S  ES’ u Expr left = parse_E(); u Expr right = parse_S’(); u if (right == NULL) return left; u else return new Add(left,right);  default: ParseError(); »} v } S  ES’ S’   | +S E  number | (S)

68 - 67 - Grammars v Have been using grammar for language “sums with parentheses” (1+2+(3+4))+5 v Started with simple, right-associative grammar »S  E + S | E »E  num | (S) v Transformed it to an LL(1) by left factoring: »S  ES’ »S’   | +S »E  num (S) v What if we start with a left-associative grammar? »S  S + E | E »E  num | (S)

69 - 68 - Reminder: Left vs Right Associativity + 1+ 2+ 34 S  E + S S  E E  num S  S + E S  E E  num + 1 + 2 + 3 4 Right recursion : right associative Left recursion : left associative Consider a simpler string on a simpler grammar: “1 + 2 + 3 + 4”

70 - 69 - Left Recursion derived stringlookaheadread/unread S11+2+3+4 S+E11+2+3+4 S+E+E11+2+3+4 S+E+E+E11+2+3+4 E+E+E+E11+2+3+4 1+E+E+E21+2+3+4 1+2+E+E31+2+3+4 1+2+3+E41+2+3+4 1+2+3+4$1+2+3+4 Is this right? If not, what’s the problem? S  S + E S  E E  num “1 + 2 + 3 + 4”

71 - 70 - Left-Recursive Grammars v Left-recursive grammars don’t work with top-down parsers: we don’t know when to stop the recursion v Left-recursive grammars are NOT LL(1)! »S  S  »S   v In parse table »Both productions will appear in the predictive table at row S in all the columns corresponding to FIRST(  )

72 - 71 - Eliminate Left Recursion v Replace »X  X  1 |... | X  m »X   1 |... |  n v With »X   1X’ |... |  nX’ »X’   1X’ |... |  mX’ |  v See complete algorithm in Dragon book

73 - 72 - Class Problem E  E + T | T T  T * F | F F  (E) | num Transform the following grammar to eliminate left recursion:

74 - 73 - Creating an LL(1) Grammar v Start with a left-recursive grammar  S  S + E  S  E »and apply left-recursion elimination algorithm  S  ES’  S’  +ES’ |  v Start with a right-recursive grammar  S  E + S  S  E »and apply left-factoring to eliminate common prefixes  S  ES’  S’  +S | 

75 - 74 - Top-Down Parsing Summary Language grammar Left-recursion elimination Left factoring LL(1) grammar predictive parsing table FIRST, FOLLOW recursive-descent parser parser with AST gen

76 - 75 - Outline v DFA & NFA v Regular expression v Regular languages v Context free languages &PDA v Scanner v Parser »Top-down parsing »Bottom-up Parsing »Comparison

77 - 76 - New Topic: Bottom-Up Parsing v A more power parsing technology v LR grammars – more expressive than LL »Construct right-most derivation of program »Left-recursive grammars, virtually all programming languages are left-recursive »Easier to express syntax v Shift-reduce parsers »Parsers for LR grammars »Automatic parser generators (yacc, bison)

78 - 77 - Bottom-Up Parsing (2) v Right-most derivation – Backward »Start with the tokens »End with the start symbol »Match substring on RHS of production, replace by LHS S  S + E | E E  num | (S) (1+2+(3+4))+5  (E+2+(3+4))+5  (S+2+(3+4))+5  (S+E+(3+4))+5  (S+(3+4))+5  (S+(E+4))+5  (S+(S+4))+5  (S+(S+E))+5  (S+(S))+5  (S+E)+5  (S)+5  E+5  S+E  S

79 - 78 - Shift-Reduce Parsing v Parsing actions: A sequence of shift and reduce operations v Parser state: A stack of terminals and non- terminals (grows to the right) v Current derivation step = stack + input Derivation stepstackUnconsumed input (1+2+(3+4))+5  (1+2+(3+4))+5 (E+2+(3+4))+5  (E+2+(3+4))+5 (S+2+(3+4))+5  (S+2+(3+4))+5 (S+E+(3+4))+5  (S+E+(3+4))+5...

80 - 79 - Shift-Reduce Actions v Parsing is a sequence of shifts and reduces v Shift: move look-ahead token to stack v Reduce: Replace symbols  from top of stack with non-terminal symbol X corresponding to the production: X   (e.g., pop , push X) stackinputaction (1+2+(3+4))+5 shift 1 (1+2+(3+4))+5 stackinputaction (S+E+(3+4))+5 reduce S  S+ E (S+(3+4))+5

81 - 80 - Shift-Reduce Parsing derivationstackinput streamaction (1+2+(3+4))+5(1+2+(3+4))+5shift (1+2+(3+4))+5(1+2+(3+4))+5reduce E  num (E+2+(3+4))+5(E+2+(3+4))+5reduce S  E (S+2+(3+4))+5(S+2+(3+4))+5shift (S+2+(3+4))+5(S+2+(3+4))+5reduce E  num (S+E+(3+4))+5(S+E+(3+4))+5reduce S  S+E (S+(3+4))+5(S+(3+4))+5shift (S+(3+4))+5(S+(3+4))+5reduce E  num... S  S + E | E E  num | (S)

82 - 81 - Potential Problems v How do we know which action to take: whether to shift or reduce, and which production to apply v Issues »Sometimes can reduce but should not »Sometimes can reduce in different ways

83 - 82 - Action Selection Problem v Given stack  and look-ahead symbol b, should parser: »Shift b onto the stack making it  b ? »Reduce X   assuming that the stack has the form  =  making it  X ? v If stack has the form , should apply reduction X   (or shift) depending on stack prefix  ? »  is different for different possible reductions since  ’s have different lengths

84 - 83 - LR Parsing Engine v Basic mechanism »Use a set of parser states »Use stack with alternating symbols and states  E.g., 1 ( 6 S 10 + 5(blue = state numbers) »Use parsing table to:  Determine what action to apply (shift/reduce)  Determine next state v The parser actions can be precisely determined from the table

85 - 84 - LR Parsing Table v Algorithm: look at entry for current state S and input terminal C »If Table[S,C] = s(S’) then shift:  push(C), push(S’) »If Table[S,C] = X   then reduce:  pop(2*|  |), S’= top(), push(X), push(Table[S’,X]) Next action and next state Next state Terminals Non-terminals State Action tableGoto table

86 - 85 - LR Parsing Table Example ()id,$SL 1s3s2g4 2S  idS  idS  idS  idS  id 3s3s2g7g5 4accept 5s6s8 6S  (L)S  (L)S  (L)S  (L)S  (L) 7L  SL  SL  SL  SL  S 8s3s2g9 9L  L,SL  L,SL  L,SL  L,SL  L,S State Input terminalNon-terminals We want to derive this in an algorithmic fashion

87 - 86 - Parsing Example ((a),b) derivationstackinputaction ((a),b)  1((a),b)shift, goto 3 ((a),b)  1(3(a),b)shift, goto 3 ((a),b)  1(3(3a),b)shift, goto 2 ((a),b)  1(3(3a2),b)reduce S  id ((S),b)  1(3(3(S7),b)reduce L  S ((L),b)  1(3(3(L5),b)shift, goto 6 ((L),b)  1(3(3L5)6,b)reduce S  (L) (S,b)  1(3S7,b)reduce L  S (L,b)  1(3L5,b)shift, goto 8 (L,b)  1(3L5,8b)shift, goto 9 (L,b)  1(3L5,8b2)reduce S  id (L,S)  1(3L8,S9)reduce L  L,S (L)  1(3L5)shift, goto 6 (L)  1(3L5)6reduce S  (L) S  1S4$done S  (L) | id L  S | L,S

88 - 87 - LR(k) Grammars v LR(k) = Left-to-right scanning, right-most derivation, k lookahead chars v Main cases »LR(0), LR(1) »Some variations SLR and LALR(1) v Parsers for LR(0) Grammars: »Determine the actions without any lookahead »Will help us understand shift-reduce parsing

89 - 88 - Building LR(0) Parsing Tables v To build the parsing table: »Define states of the parser »Build a DFA to describe transitions between states »Use the DFA to build the parsing table v Each LR(0) state is a set of LR(0) items »An LR(0) item: X  .  where X   is a production in the grammar »The LR(0) items keep track of the progress on all of the possible upcoming productions »The item X  .  abstracts the fact that the parser already matched the string  at the top of the stack

90 - 89 - Example LR(0) State v An LR(0) item is a production from the language with a separator “.” somewhere in the RHS of the production v Sub-string before “.” is already on the stack (beginnings of possible  ’s to be reduced) v Sub-string after “.”: what we might see next E  num. E  (. S) state item

91 - 90 - Class Problem For the production, E  num | (S) Two items are: E  num. E  (. S ) Are there any others? If so, what are they? If not, why?

92 - 91 - LR(0) Grammar v Nested lists »S  (L) | id »L  S | L,S v Examples »(a,b,c) »((a,b), (c,d), (e,f)) »(a, (b,c,d), ((f,g))) S ( L ) L, S ( S )S a L, S S b c d Parse tree for (a, (b,c), d)

93 - 92 - Start State and Closure v Start state »Augment grammar with production: S’  S $ »Start state of DFA has empty stack: S’ . S $ v Closure of a parser state: »Start with Closure(S) = S »Then for each item in S:  X  . Y   Add items for all the productions Y   to the closure of S: Y . 

94 - 93 - Closure Example S  (L) | id L  S | L,S DFA start state S’ . S $ closure S’ . S $ S . (L) S . id - Set of possible productions to be reduced next - Added items have the “.” located at the beginning: no symbols for these items on the stack yet

95 - 94 - The Goto Operation v Goto operation = describes transitions between parser states, which are sets of items v Algorithm: for state S and a symbol Y »If the item [X  . Y  ] is in I, then »Goto(I, Y) = Closure( [X   Y.  ] ) S’ . S $ S . (L) S . id Goto(S, ‘(‘) Closure( { S  (. L) } )

96 - 95 - Class Problem 1.If I = { [E’ . E]}, then Closure(I) = ?? 2.If I = { [E’  E. ], [E  E. + T] }, then Goto(I,+) = ?? E’  E E  E + T | T T  T * F | F F  (E) | id

97 - 96 - Applying Reduce Actions S’ . S $ S . (L) S . id S  (. L) L . S L . L, S S . (L) S . id S  id. id ( ( Grammar S  (L) | id L  S | L,S S  (L. ) L  L., S L  S. L S states causing reductions (dot has reached the end!) Pop RHS off stack, replace with LHS X (X   ), then rerun DFA (e.g., (x))

98 - 97 - Reductions v On reducing X   with stack  »Pop  off stack, revealing prefix  and state »Take single step in DFA from top state »Push X onto stack with new DFA state v Example derivationstackinputaction ((a),b)  1 ( 3 ( 3a),b)shift, goto 2 ((a),b)  1 ( 3 ( 3 a 2),b)reduce S  id ((S),b)  1 ( 3 ( 3 S 7),b)reduce L  S

99 - 98 - Full DFA S’ . S $ S . (L) S . id S  (. L) L . S L . L, S S . (L) S . id S  id. id ( ( S  (L. )L L  L., S L  S. S L  L,. S S . (L) S . id L  L,S. S  (L). S’  S. $ final state 12 8 9 6 5 3 7 4 S, ) S $ id L Grammar S  (L) | id L  S | L,S

100 - 99 - Building the Parsing Table v States in the table = states in the DFA v For transition S  S’ on terminal C: »Table[S,C] += Shift(S’) v For transition S  S’ on non-terminal N: »Table[S,N] += Goto(S’) v If S is a reduction state X   then: »Table[S,*] += Reduce(X   )

101 - 100 - Computed LR Parsing Table ()id,$SL 1s3s2g4 2S  idS  idS  idS  idS  id 3s3s2g7g5 4accept 5s6s8 6S  (L)S  (L)S  (L)S  (L)S  (L) 7L  SL  SL  SL  SL  S 8s3s2g9 9L  L,SL  L,SL  L,SL  L,SL  L,S State Input terminalNon-terminals red = reduceblue = shift

102 - 101 - LR(0) Summary v LR(0) parsing recipe: »Start with LR(0) grammar »Compute LR(0) states and build DFA:  Use the closure operation to compute states  Use the goto operation to compute transitions »Build the LR(0) parsing table from the DFA v This can be done automatically

103 - 102 - Class Problem S  E + S | E E  num Generate the DFA for the following grammar

104 - 103 - LR(0) Limitations v An LR(0) machine only works if states with reduce actions have a single reduce action »Always reduce regardless of lookahead v With a more complex grammar, construction gives states with shift/reduce or reduce/reduce conflicts v Need to use lookahead to choose L  L, S. S  S., L L  S, L. L  S. OK shift/reduce reduce/reduce

105 - 104 - A Non-LR(0) Grammar v Grammar for addition of numbers »S  S + E | E »E  num v Left-associative version is LR(0) v Right-associative is not LR(0) as you saw with the previous class problem »S  E + S | E »E  num

106 - 105 - LR(0) Parsing Table S’ . S $ S .E + S S . E E .num E  num. S  E. +S S  E. E num + S  E + S. S’  S $. S S  E +. S S . E + S S . E E . num S’  S. $ 12 5 3 7 4 S Grammar S  E + S | E E  num $ E num num+$ES 1s4g2g6 2S  Es3/S  ES  E Shift or reduce in state 2?

107 - 106 - Solve Conflict With Lookahead v 3 popular techniques for employing lookahead of 1 symbol with bottom-up parsing »SLR – Simple LR »LALR – LookAhead LR »LR(1) v Each as a different means of utilizing the lookahead »Results in different processing capabilities

108 - 107 - SLR Parsing v SLR Parsing = Easy extension of LR(0) »For each reduction X  , look at next symbol C »Apply reduction only if C is in FOLLOW(X) v SLR parsing table eliminates some conflicts »Same as LR(0) table except reduction rows »Adds reductions X   only in the columns of symbols in FOLLOW(X) num+$ES 1s4g2g6 2s3S  E Example: FOLLOW(S) = {$} Grammar S  E + S | E E  num

109 - 108 - SLR Parsing Table v Reductions do not fill entire rows as before v Otherwise, same as LR(0) num+$ES 1s4g2g6 2s3S  E 3s4g2g5 4E  numE  num 5 S  E+S 6 s7 7 accept Grammar S  E + S | E E  num

110 - 109 - Class Problem Consider: S  L = R S  R L  *R L  ident R  L Think of L as l-value, R as r-value, and * as a pointer dereference When you create the states in the SLR(1) DFA, 2 of the states are the following: S  L. = R R  L. S  R. Do you have any shift/reduce conflicts? (Not as easy as it looks)

111 - 110 - LR(1) Parsing v Get as much as possible out of 1 lookahead symbol parsing table v LR(1) grammar = recognizable by a shift/reduce parser with 1 lookahead v LR(1) parsing uses similar concepts as LR(0) »Parser states = set of items »LR(1) item = LR(0) item + lookahead symbol possibly following production  LR(0) item:S . S + E  LR(1) item:S . S + E, +  Lookahead only has impact upon REDUCE operations, apply when lookahead = next input

112 - 111 - LR(1) States v LR(1) state = set of LR(1) items v LR(1) item = (X  . , y) »Meaning:  already matched at top of the stack, next expect to see  y v Shorthand notation »(X  . , {x1,..., xn}) »means:  (X  . , x1) ...  (X  . , xn) v Need to extend closure and goto operations S  S. + E+,$ S  S +. Enum

113 - 112 - LR(1) Closure v LR(1) closure operation: »Start with Closure(S) = S »For each item in S:  X  . Y , z  and for each production Y  , add the following item to the closure of S: Y . , FIRST(  z) »Repeat until nothing changes v Similar to LR(0) closure, but also keeps track of lookahead symbol

114 - 113 - LR(1) Start State v Initial state: start with (S’ . S, $), then apply closure operation v Example: sum grammar S’ . S, $ S . E + S, $ S . E, $ E . num, +,$ closure S’  S $ S  E + S | E E  num

115 - 114 - LR(1) Goto Operation v LR(1) goto operation = describes transitions between LR(1) states v Algorithm: for a state S and a symbol Y (as before) »If the item [X  . Y  ] is in I, then »Goto(I, Y) = Closure( [X   Y.  ] ) S  E. + S, $ S  E., $ Closure({S  E +. S, $}) Goto(S1, ‘+’) S1 S2 Grammar: S’  S$ S  E + S | E E  num

116 - 115 - Class Problem 1. Compute: Closure(I = {S  E +. S, $}) 2. Compute: Goto(I, num) 3. Compute: Goto(I, E) S’  S $ S  E + S | E E  num

117 - 116 - LR(1) DFA Construction S’ . S, $ S . E + S, $ S . E, $ E .num, +,$ E  num., +,$ S’  S., $ E num + S  E+S., +,$ S S  E +. S, $ S . E + S, $ S . E, $ E . num, +,$ S  E. + S, $ S  E., $ S Grammar S’  S$ S  E + S | E E  num E num

118 - 117 - LR(1) Reductions S’ . S, $ S . E + S, $ S . E, $ E .num, +,$ E  num., +,$ S’ S., $ E num + S  E., +,$ S S  E +. S, $ S . E + S, $ S . E, $ E . num, +,$ S  E. + S, $ S  E., $ S Grammar S’  S$ S  E + S | E E  num E num Reductions correspond to LR(1) items of the form (X  ., y)

119 - 118 - LR(1) Parsing Table Construction v Same as construction of LR(0), except for reductions v For a transition S  S’ on terminal x: »Table[S,x] += Shift(S’) v For a transition S  S’ on non-terminal N: »Table[S,N] += Goto(S’) v If I contains {(X  ., y)} then: »Table[I,y] += Reduce(X   )

120 - 119 - LR(1) Parsing Table Example S’ . S, $ S . E + S, $ S . E, $ E .num, +,$ E + S  E +. S, $ S . E + S, $ S . E, $ E . num, +,$ S  E. + S, $ S  E., $ Grammar S’  S$ S  E + S | E E  num 1 2 3 +$E 1g2 2s3S  E Fragment of the parsing table

121 - 120 - Class Problem Compute the LR(1) DFA for the following grammar E  E + T | T T  TF | F F  F* | a | b

122 - 121 - LALR(1) Grammars v Problem with LR(1): too many states v LALR(1) parsing (aka LookAhead LR) »Constructs LR(1) DFA and then merge any 2 LR(1) states whose items are identical except lookahead »Results in smaller parser tables »Theoretically less powerful than LR(1) v LALR(1) grammar = a grammar whose LALR(1) parsing table has no conflicts S  id., + S  E., $ S  id., $ S  E., + += ??

123 - 122 - LALR Parsers v LALR(1) »Generally same number of states as SLR (much less than LR(1)) »But, with same lookahead capability of LR(1) (much better than SLR) »Example: Pascal programming language  In SLR, several hundred states  In LR(1), several thousand states

124 - 123 - Automate the Parsing Process v Can automate: »The construction of LR parsing tables »The construction of shift-reduce parsers based on these parsing tables v LALR(1) parser generators »yacc, bison »Not much difference compared to LR(1) in practice »Smaller parsing tables than LR(1) »Augment LALR(1) grammar specification with declarations of precedence, associativity »Output: LALR(1) parser program

125 - 124 - Associativity S  S + E | E E  num E  E + E E  num What happens if we run this grammar through LALR construction? E  E + E E  num E  E + E., + E  E. + E, +,$ + shift/reduce conflict shift: 1+ (2+3) reduce: (1+2)+3 1 + 2 + 3

126 - 125 - Associativity (2) v If an operator is left associative »Assign a slightly higher value to its precedence if it is on the parse stack than if it is in the input stream »Since stack precedence is higher, reduce will take priority (which is correct for left associative) v If operator is right associative »Assign a slightly higher value if it is in the input stream »Since input stream is higher, shift will take priority (which is correct for right associative)

127 - 126 - Precedence E  E + E | T T  T x T | num | (E) E  E + E | E x E | num | (E) Shift/reduce conflict results What happens if we run this grammar through LALR construction? E  E. + E,... E  E x E., + E  E + E., x E  E. x E,... Precedence: attach precedence indicators to terminals Shift/reduce conflict resolved by: 1. If precedence of the input token is greater than the last terminal on parse stack, favor shift over reduce 2. If the precedence of the input token is less than or equal to the last terminal on the parse stack, favor reduce over shift

128 - 127 - Abstract Syntax Tree (AST) - Review v Derivation = sequence of applied productions »S  E+S  1+S  1+E  1+2 v Parse tree = graph representation of a derivation »Doesn’t capture the order of applying the productions v AST discards unnecessary information from the parse tree + +5 1+ 2+ 34 S E+S ( S )E E + S 5 1 2E ( S ) E + S E34

129 - 128 - Implicit AST Construction v LL/LR parsing techniques implicitly build AST v The parse tree is captured in the derivation »LL parsing: AST represented by applied productions »LR parsing: AST represented by applied reductions v We want to explicitly construct the AST during the parsing phase

130 - 129 - AST Construction - LL void parse_S() { switch (token) { case num: case ‘(‘: parse_E(); parse_S’(); return; default: ParseError(); } Expr parse_S() { switch (token) { case num: case ‘(‘: Expr left = parse_E(); Expr right = parse_S’(); if (right == NULL) return left else return new Add(left,right); default: ParseError(); } LL parsing: extend procedures for non-terminals S  ES’ S’   | +S E  num | (S)

131 - 130 - AST Construction - LR v We again need to add code for explicit AST construction v AST construction mechanism »Store parts of the tree on the stack »For each nonterminal symbol X on stack, also store the sub-tree rooted at X on stack »Whenever the parser performs a reduce operation for a production X  , create an AST node for X

132 - 131 - AST Construction for LR - Example S  E + S | S E  num | (S)............ S + E.. Add Num(1)Num(2) stack Before reduction: S  E + S Num(3)............ S. Add Num(1) Num(2)Num(3) Add After reduction: S  E + S input string: “1 + 2 + 3”

133 - 132 - Problems v Unstructured code: mixing parsing code with AST construction code v Automatic parser generators »The generated parser needs to contain AST construction code »How to construct a customized AST data structure using an automatic parser generator? v May want to perform other actions concurrently with parsing phase »E.g., semantic checks »This can reduce the number of compiler passes

134 - 133 - Syntax-Directed Definition v Solution: Syntax-directed definition »Extends each grammar production with an associated semantic action (code):  S  E + S {action} »The parser generator adds these actions into the generated parser »Each action is executed when the corresponding production is reduced

135 - 134 - Semantic Actions v Actions = C code (for bison/yacc) v The actions access the parser stack »Parser generators extend the stack of symbols with entries for user-defined structures (e.g., parse trees) v The action code should be able to refer to the grammar symbols in the productions »Need to refer to multiple occurrences of the same non- terminal symbol, distinguish RHS vs LHS occurrence  E  E + E »Use dollar variables in yacc/bison ($$, $1, $2, etc.)  expr ::= expr PLUS expr{$$ = $1 + $3;}

136 - 135 - Building the AST v Use semantic actions to build the AST v AST is built bottom-up along with parsing expr ::= NUM{$$ = new Num($1.val); } expr ::= expr PLUS expr{$$ = new Add($1, $3); } expr ::= expr MULT expr{$$ = new Mul($1, $3); } expr ::= LPAR expr RPAR{$$ = $2; } Recall: User-defined type for objects on the stack (%union)

137 - 136 - Outline v DFA & NFA v Regular expression v Regular languages v Context free languages &PDA v Scanner v Parser »Top-down parsing »Bottom-up Parsing »Comparison

138 - 137 - LL/LR Grammar Summary v LL parsing tables »Non-terminals x terminals  productions »Computed using FIRST/FOLLOW v LR parsing tables »LR states x terminals  {shift/reduce} »LR states x non-terminals  goto »Computed using closure/goto operations on LR states v A grammar is: »LL(1) if its LL(1) parsing table has no conflicts »same for LR(0), SLR, LALR(1), LR(1)

139 - 138 - Top-Down Parsing S  S+E  E+E  (S)+E  (S+E)+E  (S+E+E)+E  (E+E+E)+E  (1+E+E)+E  (1+2+E)+E... S  S + E | E E  num | (S) In left-most derivation, entire tree above token (2) has been expanded when encountered S S+E ( S ) S + E 5 E 2 E 1 ( S ) S + E 4E 3

140 - 139 - Top-Down vs Bottom-Up scanned unscannedscannedunscanned Top-downBottom-up Bottom-up: Don’t need to figure out as much of he parse tree for a given amount of input  More time to decide what rules to apply

141 - 140 - Terminology: LL vs LR v LL(k) »Left-to-right scan of input »Left-most derivation »k symbol lookahead »[Top-down or predictive] parsing or LL parser »Performs pre-order traversal of parse tree v LR(k) »Left-to-right scan of input »Right-most derivation »k symbol lookahead »[Bottom-up or shift-reduce] parsing or LR parser »Performs post-order traversal of parse tree

142 - 141 - Classification of Grammars LR(0) SLR LALR(1) LR(1) LL(1) LR(k)  LR(k+1) LL(k)  LL(k+0) LL(k)  LR(k) LR(0)  SLR LALR(1)  LR(1) not to scale

143 - 142 - Bottom-Up Parsing (1+2+(3+4))+5  (E+2+(3+4))+5  (S+2+(3+4))+5  (S+E+(3+4))+5 S  S + E | E E  num | (S) Advantage of bottom-up parsing: can postpone the selection of productions until more of the input is scanned S S+E ( S ) S + E 5 E 2 E 1 ( S ) S + E 4E 3


Download ppt "COMP3190: Principle of Programming Languages Formal Language Syntax."

Similar presentations


Ads by Google