Lecture 03 – Syntax analysis: top-down parsing Eran Yahav 1.

Slides:



Advertisements
Similar presentations
Compiler Construction
Advertisements

Compilation (Semester A, 2013/14) Lecture 6a: Syntax (Bottom–up parsing) Noam Rinetzky 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav.
1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Mooly Sagiv and Roman Manevich School of Computer Science
Lecture 04 – Syntax analysis: top-down and bottom-up parsing
Theory of Compilation Erez Petrank Lecture 2: Syntax Analysis, Top-Down Parsing 1.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Bottom-Up Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter (modified)
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter
Parsing III (Eliminating left recursion, recursive descent parsing)
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 3.
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
Compiler construction in4020 – lecture 3 Koen Langendoen Delft University of Technology The Netherlands.
Compiler Principles Winter Compiler Principles Syntax Analysis (Parsing) – Part 1 Mayer Goldberg and Roman Manevich Ben-Gurion University.
Syntax and Semantics Structure of programming languages.
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
Compilation Lecture 4 Syntax Analysis Noam Rinetzky 1 Zijian Xu, Hong Chen, Song-Chun Zhu and Jiebo Luo, "A hierarchical compositional model.
Top-Down Parsing - recursive descent - predictive parsing
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
Chapter 5 Top-Down Parsing.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
Theory of Compilation Erez Petrank Lecture 2: Syntax Analysis, Top-Down Parsing 1.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Syntax and Semantics Structure of programming languages.
Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter 2.2 (Partial) 1.
Compilation (Semester A, 2013/14) Lecture 3: Syntax Analysis (Top-Down Parsing) Modern Compiler Design: Chapter 2.2 Noam Rinetzky 1 Slides credit:
Compiler Principles Fall Compiler Principles Lecture 3: Parsing part 2 Roman Manevich Ben-Gurion University.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Top-Down Parsing.
Syntax Analyzer (Parser)
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
Compiler Principles Fall Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion University 1.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Bernd Fischer RW713: Compiler and Software Language Engineering.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
Parsing III (Top-down parsing: recursive descent & LL(1) )
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Lecture 02b – Syntax analysis: top-down parsing Eran Yahav 1.
Compiler Principles Winter Compiler Principles Syntax Analysis (Parsing) – Part 2 Mayer Goldberg and Roman Manevich Ben-Gurion University.
Compilation (Semester A, 2013/14) Lecture 4: Syntax Analysis (Top-Down Parsing) Modern Compiler Design: Chapter 2.2 Noam Rinetzky 1 Slides credit:
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Syntax and Semantics Structure of programming languages.
Fall Compiler Principles Lecture 2: LL parsing
Programming Languages Translator
Lecture 3: Syntax Analysis: CFLs, PDAs, Top-Down parsing Noam Rinetzky
Table-driven parsing Parsing performed by a finite state machine.
Top-down parsing cannot be performed on left recursive grammars.
Lecture 3: Syntax Analysis: Top-Down parsing Noam Rinetzky
Bottom-Up Syntax Analysis
4 (c) parsing.
Top-Down Parsing CS 671 January 29, 2008.
Compiler Design 7. Top-Down Table-Driven Parsing
Fall Compiler Principles Lecture 2: LL parsing
LL and Recursive-Descent Parsing Hal Perkins Autumn 2009
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Predictive Parsing Program
LL and Recursive-Descent Parsing Hal Perkins Winter 2008
Presentation transcript:

Lecture 03 – Syntax analysis: top-down parsing Eran Yahav 1

2 You are here Executable code exe Source text txt Compiler Lexical Analysis Syntax Analysis Parsing Semantic Analysis Inter. Rep. (IR) Code Gen.

Last Week: from characters to tokens 3 x = b*b – 4*a*c txt Token Stream

Last Week: Regular Expressions 4 Basic PatternsMatching xThe character x.Any character, usually except a new line [xyz]Any of the characters x,y,z Repetition Operators R?An R or nothing (=optionally an R) R*Zero or more occurrences of R R+One or more occurrences of R Composition Operators R1R2An R1 followed by R2 R1|R2Either an R1 or R2 Grouping (R)R itself

Today: from tokens to AST 5 Lexical Analysis Syntax Analysis Sem. Analysis Inter. Rep. Code Gen. ‘b’ ‘4’ ‘b’ ‘a’ ‘c’ ID factor term factor MULT term expression factor term factor MULT term expression term MULTfactor MINUS Syntax Tree

Parsing  Goals  Is a sequence of tokens a valid program in the language?  Construct a structured representation of the input text  Error detection and reporting  Challenges  How do you describe the programming language?  How do you check validity of an input?  Where do you report an error? 6

Context free grammars  V – non terminals  T – terminals (tokens)  P – derivation rules  Each rule of the form V  (T  V)*  S – initial symbol 7 G = (V,T,P,S)

Why do we need context free grammars? 8 S  SS S  (S) S  ()

Example S  S;S S  id := E E  id | E + E | E * E | ( E ) 9 V = { S, E } T = { id, ‘+’, ‘*’, ‘(‘, ‘)’}

Derivation 10 S SS ; id := ES ; id := idS ; id := E ; id := idid := E + E ; id := idid := E + id ; id := idid := id + id ; x := z; y := x + z S  S;S S  id := E E  id | E + E | E * E | ( E ) S  S;S S  id := E E  id S  id := E E  E + E E  id x:= z;y := x + z inputgrammar

Parse Tree 11 S SS ; id := ES ; id := idS ; id := E ; id := idid := E + E ; id := idid := E + id ; id := idid := id + id ; x:= z;y := x + z S S ; S id:= E id :=E E + E id

Questions  How did we know which rule to apply on every step?  Does it matter?  Would we always get the same result? 12

Ambiguity 13 x := y+z*w S  S;S S  id := E E  id | E + E | E * E | ( E ) S id:=E E+E id E*E S :=E E* E id E+E

Leftmost/rightmost Derivation  Leftmost derivation  always expand leftmost non-terminal  Rightmost derivation  Always expand rightmost non-terminal  Allows us to describe derivation by listing the sequence of rules  always know what a rule is applied to  Orders of derivation applied in our parsers (coming soon) 14

Leftmost Derivation 15 x := z; y := x + z S  S;S S  id := E E  id | E + E | E * E | ( E ) S SS ; id := ES ; id := idS ; id := E ; id := idid := E + E ; id := idid := id + E ; id := idid := id + id ; S  S;S S  id := E E  id S  id := E E  E + E E  id x:= z;y := x + z

Rightmost Derivation 16 S SS ; Sid := E ; Sid := E + E ; Sid := E + id ; Sid := id + id ; id := Eid := id + id ; id := idid := id + id ; x := z; y := x + z S  S;S S  id := E E  id | E + E | E * E | ( E ) S  S;S S  id := E E  E + E E  id S  id := E E  id x:= z;y := x + z

Bottom-up Example 17 x := z; y := x + z S  S;S S  id := E E  id | E + E | E * E | ( E ) id := id ; id := id + id id := Eid := id + id ; S ; Sid := E + id ; Sid := E + E ; Sid := E ; SS ; S E  id S  id := E E  id E  E + E S  id := E S  S;S Bottom-up picking left alternative on every step  Rightmost derivation when going top-down

Parsing  A context free language can be recognized by a non- deterministic pushdown automaton  Parsing can be seen as a search problem  Can you find a derivation from the start symbol to the input word?  Easy (but very expensive) to solve with backtracking  CYK parser can be used to parse any context-free language but has complexity O(n 3 )  We want efficient parsers  Linear in input size  Deterministic pushdown automata  We will sacrifice generality for efficiency 18

“Brute-force” Parsing 19 x := z; y := x + z S  S;S S  id := E E  id | E + E | E * E | ( E ) id := id ; id := id + id id := Eid := id + id ; id := idid := E+ id ; … E  id (not a parse tree… a search for the parse tree by exhaustively applying all rules) id := Eid := id + id ; id := Eid := id + id ;

Efficient Parsers  Top-down (predictive)  Construct the leftmost derivation  Apply rules “from left to right”  Predict what rule to apply based on nonterminal and token  Bottom up (shift reduce)  Construct the rightmost derivation  Apply rules “from right to left”  Reduce a right-hand side of a production to its non-terminal 20

Efficient Parsers  Top-down (predictive parsing) 21  Bottom-up (shift reduce) to be read… already read…

22 Non-ambiguous CFG CLR(1) LALR(1) SLR(1) LL(1) LR(0) Grammar Hierarchy

Top-down Parsing  Given a grammar G=(V,T,P,S) and a word w  Goal: derive w using G  Idea  Apply production to leftmost nonterminal  Pick production rule based on next input token  General grammar  More than one option for choosing the next production based on a token  Restricted grammars (LL)  Know exactly which single rule to apply  May require some lookahead to decide 23

Boolean Expressions Example 24 E  LIT | (E OP E) | not E LIT  true | false OP  and | or | xor not (not true or false) E => not E => not ( E OP E ) => not ( not E OP E ) => not ( not LIT OP E ) => not ( not true OP E ) => not ( not true or E ) => not ( not true or LIT ) => not ( not true or false ) Production to apply is known from next input token E not E EOP E LIT true not LITor ( ( false

Recursive Descent Parsing  Define a function for every nonterminal  Every function work as follows  Find applicable production rule  Terminal function checks match with next input token  Nonterminal function calls (recursively) other functions  If there are several applicable productions for a nonterminal, use lookahead 25

Matching tokens  Variable current holds the current input token 26 void match(token t) { if (current == t) current = next_token(); else error; }

functions for nonterminals 27 E  LIT | (E OP E) | not E LIT  true | false OP  and | or | xor void E() { if (current  {TRUE, FALSE}) // E → LIT LIT(); else if (current == LPAREN) // E → ( E OP E ) match(LPARENT); E(); OP(); E(); match(RPAREN); else if (current == NOT)// E → not E match(NOT); E(); else error; } void LIT() { if (current == TRUE) match(TRUE); else if (current == FALSE) match(FALSE); else error; }

functions for nonterminals 28 E → LIT | ( E OP E ) | not E LIT → true | false OP → and | or | xor void E() { if (current  {TRUE, FALSE})LIT(); else if (current == LPAREN)match(LPARENT); E(); OP(); E(); match(RPAREN); else if (current == NOT)match(NOT); E(); else error; } void LIT() { if (current == TRUE)match(TRUE); else if (current == FALSE)match(FALSE); elseerror; } void OP() { if (current == AND)match(AND); else if (current == OR)match(OR); else if (current == XOR)match(XOR); elseerror; }

Adding semantic actions  Can add an action to perform on each production rule  Can build the parse tree  Every function returns an object of type Node  Every Node maintains a list of children  Function calls can add new children 29

Building the parse tree 30 Node E() { result = new Node(); result.name = “E”; if (current  {TRUE, FALSE}) // E → LIT result.addChild(LIT()); else if (current == LPAREN) // E → ( E OP E ) result.addChild(match(LPARENT)); result.addChild(E()); result.addChild(OP()); result.addChild(E()); result.addChild(match(RPAREN)); else if (current == NOT) // E → not E result.addChild(match(NOT)); result.addChild(E()); else error; return result; }

Recursive Descent  How do you pick the right A-production?  Generally – try them all and use backtracking  In our case – use lookahead 31 void A() { choose an A-production, A -> X 1 X 2 …X k ; for (i=1; i≤ k; i++) { if (X i is a nonterminal) call procedure X i (); elseif (X i == current) advance input; else report error; }

Recursive descent: are we done?  The function for indexed_elem will never be tried…  What happens for input of the form  ID [ expr ] 32 term  ID | indexed_elem indexed_elem  ID [ expr ]

Recursive descent: are we done? int S() { return A() && match(token(‘a’)) && match(token(‘b’)); } int A() { return match(token(‘a’)) || 1; } 33 S  A a b A  a |   What happens for input “ab” ?  What happens if you flip order of alternatives and try “aab”?

Recursive descent: are we done? int E() { return E() && match(token(‘-’)) && term(); } 34 E  E - term  What happens with this procedure?  Recursive descent parsers cannot handle left-recursive grammars

Figuring out when it works… 35 term  ID | indexed_elem indexed_elem  ID [ expr ] S  A a b A  a |  E  E - term 3 examples where we got into trouble with our recursive descent approach

FIRST sets  For every production rule A  α  FIRST(α) = all terminals that α can start with  i.e., every token that can appear as first in α under some derivation for α  In our Boolean expressions example  FIRST(LIT) = { true, false }  FIRST( ( E OP E ) ) = { ‘(‘ }  FIRST ( not E ) = { not }  No intersection between FIRST sets => can always pick a single rule  If the FIRST sets intersect, may need longer lookahead  LL(k) = class of grammars in which production rule can be determined using a lookahead of k tokens  LL(1) is an important and useful class 36

FOLLOW Sets  What do we do with nullable alternatives?  Use what comes afterwards to predict the right production  For every production rule A  α  FOLLOW(A) = set of tokens that can immediately follow A  Can predict the alternative A k for a non-terminal N when the lookahead token is in the set  FIRST(A k )  (if A k is nullable then FOLLOW(N)) 37

LL(k) Grammars  A grammar is in the class LL(K) when it can be derived via:  Top down derivation  Scanning the input from left to right (L)  Producing the leftmost derivation (L)  With lookahead of k tokens (k)  A language is said to be LL(k) when it has an LL(k) grammar 38

Back to our 1 st example  FIRST(ID) = { ID }  FIRST(indexed_elem) = { ID }  FIRST/FIRST conflict 39 term  ID | indexed_elem indexed_elem  ID [ expr ]

Left factoring  Rewrite the grammar to be in LL(1) 40 term  ID | indexed_elem indexed_elem  ID [ expr ] term  ID after_ID after_ID  [ expr ] |  Intuition: just like factoring x*y + x*z into x*(y+z)

Left factoring – another example 41 S  if E then S else S | if E then S | T S  if E then S S’ | T S’  else S | 

Back to our 2 nd example  FIRST(S) = { ‘a’ }, FOLLOW(S) = { }  FIRST(A) = { ‘a’  }, FOLLOW(A) = { ‘a’ }  FIRST/FOLLOW conflict 42 S  A a b A  a | 

Substitution 43 S  A a b A  a |  S  a a b | a b Substitute A in S S  a after_A after_A  a b | b Left factoring

Back to our 3 rd example  Left recursion cannot be handled with a bounded lookahead  What can we do? 44 E  E - term

Left recursion removal  L(G1) = β, βα, βαα, βααα, …  L(G2) = same 45 N  Nα | β N  βN’ N’  αN’ |  G1 G2 E  E - term E  term TE TE  - term TE |   For our 3 rd example:

LL(k) Parsers  Recursive Descent  Manual construction  Uses recursion  Wanted  A parser that can be generated automatically  Does not use recursion 46

LL(k) parsing with pushdown automata  Pushdown automaton uses  Prediction stack  Input stream  Transition table  nonterminals x tokens -> production alternative  Entry indexed by nonterminal N and token t contains the alternative of N that must be predicated when current input starts with t 47

LL(k) parsing with pushdown automata  Two possible moves  Prediction  When top of stack is nonterminal N, pop N, lookup table[N,t]. If table[N,t] is not empty, push table[N,t] on prediction stack, otherwise – syntax error  Match  When top of prediction stack is a terminal T, must be equal to next input token t. If (t == T), pop T and consume t. If (t ≠ T) syntax error  Parsing terminates when prediction stack is empty. If input is empty at that point, success. Otherwise, syntax error 48

Example transition table 49 ()nottruefalseandorxor$ E2311 LIT45 OP678 (1) E → LIT (2) E → ( E OP E ) (3) E → not E (4) LIT → true (5) LIT → false (6) OP → and (7) OP → or (8) OP → xor Nonterminals Input tokens Which rule should be used

Simple Example 50 abc AA  aAbA  c A  aAb | c aacbb$ Input suffixStack contentMove aacbb$A$predict(A,a) = A  aAb aacbb$aAb$match(a,a) acbb$Ab$predict(A,a) = A  aAb acbb$aAbb$match(a,a) cbb$Abb$predict(A,c) = A  c cbb$ match(c,c) bb$ match(b,b) b$ match(b,b) $$match($,$) – success

Simple Example 51 abc AA  aAbA  c A  aAb | c abcbb$ Input suffixStack contentMove abcbb$A$predict(A,a) = A  aAb abcbb$aAb$match(a,a) bcbb$Ab$predict(A,b) = ERROR

Error Handling  Mentioned last time  Lexical errors  Syntax errors  Semantic errors (e.g., type mismatch) 52

Error Handling and Recovery x = a * (p+q * ( -b * (r-s); 53  Where should we report the error?  The valid prefix property  Recovery is tricky  Heuristics for dropping tokens, skipping to semicolon, etc.

Error Handling in LL Parsers  Now what?  Predict bS anyway “missing token b inserted in line XXX” 54 S  a c | b S c$ abc SS  a cS  bS Input suffixStack contentMove c$S$predict(S,c) = ERROR

Error Handling in LL Parsers  Result: infinite loop 55 S  a c | b S c$ abc SS  a cS  bS Input suffixStack contentMove bc$S$predict(b,c) = S  bS bc$bS$match(b,b) c$S$Looks familiar?

Error Handling  Requires more systematic treatment  Enrichment  Acceptable-set method  Not part of course material 56

Summary  Parsing  Top-down or bottom-up  Top-down parsing  Recursive descent  LL(k) grammars  LL(k) parsing with pushdown automata  LL(K) parsers  Cannot deal with left recursion  Left-recursion removal might result with complicated grammar 57

Coming up next time  More syntax analysis 58