Parser Generation Tools (Yacc and Bison) CS 471 September 24, 2007.

Parser Generation Tools (Yacc and Bison) CS 471 September 24, 2007

CS 471 – Fall 2007 1 Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if ( b == 0 ) a = “Hi” ; Abstract Syntax Tree (AST) Lexical Analysis Syntactic Analysis Semantic Analysis if == b 0 = a “Hi” ;

CS 471 – Fall 2007 2 Last Time … DFA S ’ →. S $ S →. ( L ) S →. id S ’ → S. $ final state S → (. L ) L →. S L →. L, S S →. ( L ) S →. id S → id. L → L,. S S →. ( L ) S →. id S → ( L. ) L → L., S L → S.S → ( L ). L → L, S. S $ ( id S L ), S ( 1 4 2 3 7 8 5 6 9 S → ( L ) | id L → S | L, S (

CS 471 – Fall 2007 3 Grammar Parsing Table ()id,$SL 1s3s2g4 2S → id 3s3s2g7g5 4accept 5s6s8 6S → (L) 7L→SL→SL→SL→SL→SL→SL→SL→SL→SL→S 8s3s2g9 9L → L,S

CS 471 – Fall 2007 4 Practical Issues We use an LR parser generator… Question: how many DFA states are there? Does it matter? What does it affect? –Answer: Even simple languages have 1000s of states Most LR parser generators don’t construct the DFA as described

CS 471 – Fall 2007 5 LR(1) Parsing Tables Many states are similar, e.g. and How can we exploit this? Same reduction, different lookahead tokens Idea: merge the states… E  int, )/+ E  int on ), + 5 E  int on $, + E  int, $/+ 1 E  int on $, +, ) E  int, $/+/) 1’

CS 471 – Fall 2007 6 The Core of a Set of LR Items When can states be merged? Def: the core of a set of LR items is: Just the production parts of the items Without the lookahead terminals Example: the core of { [X  , b], [Y  , d] } is ANSWER:

CS 471 – Fall 2007 7 Merging States Consider for example the LR(1) states {[X  , a], [Y  , c]} {[X  , b], [Y  , d]} They have the same core and can be merged Resulting state is: {[X  , a/b], [Y  , c/d]} These are called LALR(1) states Stands for LookAhead LR Typically 10X fewer LALR(1) states than LR(1)

CS 471 – Fall 2007 8 The LALR(1) DFA Algorithm: repeat Choose two states with same core Merge the states by combining the items Point edges from predecessors to new state New state points to all the previous successors until all states have distinct core A ED CB F A BE D C F

CS 471 – Fall 2007 9 int E  int on $, + E  int on ), + E  E + (E) on $, + E  E + (E) on ), + ( + E int 10 9 11 01 234 56 8 7 + E + ) ( int E ) accept on $ int E  int on $, +, ) E  E + (E) on $, +, ) ( E int 01,5 23,84,9 6,107,11 + + ) E accept on $ Conversion LR(1) to LALR(1)

CS 471 – Fall 2007 10 Notes on Parsing Parsing A solid foundation: context-free grammars A simple parser: LL(1) A more powerful parser: LR(1) An efficiency hack: LALR(1) LALR(1) parser generators

CS 471 – Fall 2007 11 Issues with LR Parsers What happens if a state contains: [X   a, b] and [Y  , a] Then on input “a” we could either Shift into state [X  a , b], or Reduce with Y   This is called a shift-reduce conflict Typically due to ambiguity

CS 471 – Fall 2007 12 Shift/Reduce Conflicts Classic example: the dangling else S  if E then S | if E then S else S | OTHER Will have DFA state containing [S  if E then S, else] [S  if E then S else S, x] Practical solutions: Modify grammar to reflect the precedence of else Many LR parsers default to “shift” Often have a precedence declaration

CS 471 – Fall 2007 13 Another Example Consider the ambiguous grammar E E + E | E * E | int Part of the DFA: We have a shift/reduce on input + What do we want to happen? Consider: x * y + z We need to reduce (* binds more tightly than +) Default action is shift [E  E * E, +] [E  E + E, +] … [E  E * E, +] [E  E + E, +] … E

CS 471 – Fall 2007 14 Declare relative precedence Explicitly resolve conflict Tell parser: we prefer the action involving * over + In practice: Parser generators support a precedence declaration for operators Precedence [E  E * E, +] [E  E + E, +] [E  E * E, +] [E  E + E, +] E

CS 471 – Fall 2007 15 More… Still a problem? Shift/reduce conflict on + Do we care? Maybe: we want left associativity parse: “a+b+c” as “((a+b)+c)” Which rule should we choose? Also handled by a declaration “+ is left-associative” [E  E + E, +] [E  E + E, +] E

CS 471 – Fall 2007 16 Other Problems If a DFA state contains both [X  , a] and [Y  , a] What’s the problem here? Two reductions to choose from when next token is a This is called a reduce/reduce conflict Usually a serious ambiguity in the grammar Must be fixed in order to generate parser

CS 471 – Fall 2007 17 Reduce/Reduce Conflicts Example: a sequence of identifiers S   | id | id S There are two parse trees for the string id S  id S  id S  id How does this confuse the parser?

CS 471 – Fall 2007 18 Reduce/Reduce conflicts Consider the DFA states: Reduce/reduce conflict on input $ G  S  id G  S  id S id Instead rewrite the grammar: ____________ [G  S, $] [S , $] [S  id, $] [S  id S, $] [S  id, $] [S  id S, $] [S , $] [S  id, $] [S  id S, $] id

CS 471 – Fall 2007 19 LR Parsing LR Parsing Engine a 1 a 2 … a i … a n $ Input: Scanner s m X m s m-1 X m-1 … s 0 Stack: Parser Generator ActionGoto Grammar Compiler construction LR tables

CS 471 – Fall 2007 20 Yacc Yet Another Compiler Compiler Automatically constructs an LALR(1) parsing table from a set of grammar rules Yacc/Bison specification: parser declarations % grammar rules % auxiliary code bison –vd file.y -or- yacc –vd file.y y.tab.c y.tab.h y.output file.y

CS 471 – Fall 2007 21 Yacc/Bison Input: A CFG and a translation scheme – file.y Output: A parser file.tab.c (bison) or y.tab.c (yacc) An output file file.output containing the parsing tables (when invoked with –v option) A file file.tab.h containing declarations (if invoked with –d option) The parser called yyparse() Parser expects to use a function called yylex() to get tokens

CS 471 – Fall 2007 22 Yacc Declaration Section % { c code % } % token PLUS MULTIPLY DIVIDE % left PLUS MINUS % left MULT DIV % nonassoc EQ NEQ LT GT % prec UMINUS Terminal symbols Assigned enum Placed in f.tab.h

CS 471 – Fall 2007 23 Yacc Grammar Rules Section exp : exp PLUS exp{ semantic action } Non-terminal Terminal C code. Executed when parser reduces this rule

CS 471 – Fall 2007 24 Example Grammar P  L S  id := id S  while id do S S  begin L end S  if id then S S  if id then S else S L  S L  L ; S

CS 471 – Fall 2007 25 Corresponding Yacc Specification %{ int yylex(void); void yyerror(char *s) {EM_error(EM_tokPos,”%s”,s);} %} % token ID WHILE BEGIN END DO … % start prog % [please fill in your solution] P  L S  id := id S  while id do S S  begin L end S  if id then S S  if id then S else S L  S L  L ; S

CS 471 – Fall 2007 26 Conflicts Yacc reports shift-reduce and reduce-reduce conflicts Default behavior: shift/reduce: choose shift reduce/reduce: uses earlier rule State 17: shift/reduce conflict (shift ELSE, reduce 4) stm: IF ID THEN stm. stm: IF ID THEN stm. ELSE stm ELSE shift 19. reduce by rule 4 Resolve all conflicts!! (Use precedence rules)

CS 471 – Fall 2007 27 Must Manage Conflicts % left PLUS; % left TIMES; // TIMES > PLUS E : E PLUS E | E TIMES E |... E → E. + E … E → E  E. + E → E + E.  E → E.  E … Rule: in conflict, choose reduce if production symbol higher precedence than shifted symbol; choose shift if vice-versa

CS 471 – Fall 2007 28 Precedence Directives E  E * E. + E  E. + E (any) E EE EE+ * E +E EE* E shiftreduce %nonassoc EQ NEQ %left PLUS MINUS %left TIMES DIV %right EXP %left prefers reducing %right prefers shifting %nonassoc error

CS 471 – Fall 2007 29 The %prec Directive %{ declarations of yylex and yyerror %} % token INT PLUS MINUS TIMES UMINUS % start exp % left PLUS MINUS % left TIMES % left UMINUS % exp : INT | exp PLUS exp | exp MINUS exp | exp TIMES exp | MINUS exp %prec UMINUS

CS 471 – Fall 2007 30 Syntax vs. Semantics Consider: arithmetic expressions (x+y) boolean expressions (x+y=z) arithmetic operators > boolean operators boolean expression cannot be added to an arithmetic expression

CS 471 – Fall 2007 31 Syntax vs. Semantics %{ declarations of yylex and yyerror %} % token ID ASSIGN PLUS MINUS AND EQUAL % start stm % left OR % left AND % left PLUS % stm : ID ASSIGN ae | ID ASSIGN be be : be OR be | be AND be | ae EQUAL ae | ID ae : ae PLUS ae | ID Solution: Defer to semantic analysis Problem: Given an ID (a) Is it arithmetic or boolean?

CS 471 – Fall 2007 32 Error Recovery Ideally, report all errors, not just first Approaches: Local error recovery – Find synch point, resume Global error recovery – Change input, resume

CS 471 – Fall 2007 33 Local Error Recovery Informally, skip to next synchronizing token Example exp  ID exp  exp + exp exp  ( exps ) exps  exp exps  exps ; exp Skip to next SEMICOLON or RPAREN Resume parsing

CS 471 – Fall 2007 34 How Does it Work? Introduces new terminal symbols and rules: exp  ( error ) exps  error ; exp Shift operations entered as if it were an ordinary token!

CS 471 – Fall 2007 35 Caution… Unfortunately, error states can cause problems when semantic actions are attached (unless we’re careful). Consider: statements : statements exp SEMICOLON | statements error SEMICOLON | /* empty */ exp : increment exp decrement | ID increment: LPAREN{nest=nest+1;} increment: RPAREN{nest=nest-1;}

CS 471 – Fall 2007 36 Global Error Repair Another way to recover: insert/delete tokens before the point of the error let type a := intArray [ 10 ] of 0 in … ^ (real error: type should be var) Global repair: Find smallest set of insertions/deletions that would convert string to a syntactically correct string (not always at error point) Outcome: type  var (1 replacement)

CS 471 – Fall 2007 37 Burke-Fisher Error Repair Tries every possible single-token insertion, deletion, replacement up to K tokens before error. Chooses: Whichever replacement, etc. allows it to progress farthest after error Realistic limit: 4 tokens beyond error is “good enough” K error

CS 471 – Fall 2007 38 Benefits and Drawbacks Benefit: parse tables are not updated Drawback: must maintain “history” –Maintains two parse stacks: current, old Overhead: –N tokens, K window size –K + K * N + K * N possible changes Only happens on an error! Clearly cannot apply semantic actions!

CS 471 – Fall 2007 39 Yacc Support for Error Correction The %value directive (used during insertions) %value ID (“bogus”) %value INT (1) %value STRING (“”) Programmer specified substitutions %changeEQ -> ASSIGN | ASSIGN -> EQ | SEMICOLON ELSE -> ELSE | -> IN INT END Scope closer let … in … end

CS 471 – Fall 2007 40 Parsing Summary Look-ahead information makes SLR(1), LALR(1), LR(1) grammars expressive Automatic parser generators support LALR(1) Precedence, associativity declarations simplify grammar writing Yacc/bison simplify the tedious task of parsing! Error recovery – local, global options Supported in yacc, bison!

CS 471 – Fall 2007 41 Now What? Parsing Tells us if input is syntactically correct Gives us derivation or parse tree But we want to do more: –Build some data structure – the IR –Perform other checks and computations IR Lexical analyzer Parser tokens text chars Errors

Parser Generation Tools (Yacc and Bison) CS 471 September 24, 2007.

Similar presentations

Presentation on theme: "Parser Generation Tools (Yacc and Bison) CS 471 September 24, 2007."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parser Generation Tools (Yacc and Bison) CS 471 September 24, 2007.

Similar presentations

Presentation on theme: "Parser Generation Tools (Yacc and Bison) CS 471 September 24, 2007."— Presentation transcript:

Similar presentations

About project

Feedback