CS 153 A little bit about LR Parsing. Background We’ve seen three ways to write parsers:  By hand, typically recursive descent  Using parsing combinators.

Slides:



Advertisements
Similar presentations
Compiler construction in4020 – lecture 4 Koen Langendoen Delft University of Technology The Netherlands.
Advertisements

Mooly Sagiv and Roman Manevich School of Computer Science
Predictive Parsing l Find derivation for an input string, l Build a abstract syntax tree (AST) –a representation of the parsed program l Build a symbol.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Top-Down Parsing.
1 Contents Introduction A Simple Compiler Scanning – Theory and Practice Grammars and Parsing LL(1) Parsing LR Parsing Lex and yacc Semantic Processing.
CS Summer 2005 Top-down and Bottom-up Parsing - a whirlwind tour June 20, 2005 Slide acknowledgment: Radu Rugina, CS 412.
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
CS 310 – Fall 2006 Pacific University CS310 Parsing with Context Free Grammars Today’s reference: Compilers: Principles, Techniques, and Tools by: Aho,
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
Professor Yihjia Tsai Tamkang University
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.
Bottom-up parsing Goal of parser : build a derivation
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
8/19/2015© Hal Perkins & UW CSEC-1 CSE P 501 – Compilers Parsing & Context-Free Grammars Hal Perkins Winter 2008.
1 214 review. 2 What we have learnt Generate scanner and parser –We do not program directly –Instead we write the specifications for the scanner and parser.
Parsing IV Bottom-up Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Top-Down Parsing - recursive descent - predictive parsing
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
LR(k) Parsing CPSC 388 Ellen Walker Hiram College.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
Parsing G Programming Languages May 24, 2012 New York University Chanseok Oh
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
1 Compiler Construction Syntax Analysis Top-down parsing.
Syntactic Analysis Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
Prof. Necula CS 164 Lecture 8-91 Bottom-Up Parsing LR Parsing. Parser Generators. Lecture 6.
–Exercise: construct the SLR parsing table for grammar: S->L=R, S->R L->*R L->id R->L –The grammar can have shift/reduce conflict or reduce/reduce conflict.
Parsing Top-Down.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Lecture 3: Parsing CS 540 George Mason University.
1 Context free grammars  Terminals  Nonterminals  Start symbol  productions E --> E + T E --> E – T E --> T T --> T * F T --> T / F T --> F F --> (F)
Top-Down Parsing.
Top-Down Predictive Parsing We will look at two different ways to implement a non- backtracking top-down parser called a predictive parser. A predictive.
LECTURE 7 Lex and Intro to Parsing. LEX Last lecture, we learned a little bit about how we can take our regular expressions (which specify our valid tokens)
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 6: LR grammars and automatic parser generators.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
Bottom-up parsing. Bottom-up parsing builds a parse tree from the leaves (terminals) to the start symbol int E T * TE+ T (4) (2) (3) (5) (1) int*+ E 
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
CS 154 Formal Languages and Computability March 22 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron Mak.
COMPILER CONSTRUCTION
Programming Languages Translator
50/50 rule You need to get 50% from tests, AND
Context free grammars Terminals Nonterminals Start symbol productions
Parsing with Context Free Grammars
CS 404 Introduction to Compiler Design
Bottom-Up Syntax Analysis
Top-Down Parsing CS 671 January 29, 2008.
Parsing #2 Leonidas Fegaras.
Lecture (From slides by G. Necula & R. Bodik)
Compiler Design 7. Top-Down Table-Driven Parsing
LALR Parsing Adapted from Notes by Profs Aiken and Necula (UCB) and
LL and Recursive-Descent Parsing Hal Perkins Autumn 2011
LL and Recursive-Descent Parsing
Syntax Analysis - Parsing
Kanat Bolazar February 16, 2010
Parsing Bottom-Up LR Table Construction.
LL and Recursive-Descent Parsing Hal Perkins Autumn 2009
Parsing Bottom-Up LR Table Construction.
LL and Recursive-Descent Parsing Hal Perkins Winter 2008
Chap. 3 BOTTOM-UP PARSING
Presentation transcript:

CS 153 A little bit about LR Parsing

Background We’ve seen three ways to write parsers:  By hand, typically recursive descent  Using parsing combinators  In both of these, left-recursion was a problem.  Using tools like Lex and Yacc Lex: limited to regular expressions  Compile regexp to NDFA, then to DFA Yacc: limited to LALR(1) grammars  Read details in book! This is just intuition.

Various Kinds of Grammars General context free grammars. Deterministic context free grammars Regular expressions (no recursion). LL(k): left-to-right, left-most derivation with k symbol lookahead. LR(k): left-to-right, right-most derivation with k symbol lookahead.  SLR(k) and LALR(k) are restricted variants that are almost as good, but much more space-efficient.

Intuition Behind Predictive Parsing Our combinators did lots of back-tracking: alt p1 p2 = fun cs -> (p1 (p2 cs) We run p1 on the string cs, then back up and run p2 on the same input. It would be much better if we could predict whether to select p1 or p2 and avoid running both of them.  Given a procedure first(p), which computes the set of characters that strings matching p can start with:  Check if the next input character is in p1 – if not, then we can skip it. Similarly with p2.  Ideally, it’s only in one so we don’t have to backtrack.

Computing First Sets e  t | t + e t  INT | ( e ) first(e) = first(t) first(t) = {INT, ( } Seems pretty easy.

But consider… N  a | M P Q M  b | ε P  c* Q  d N First(N) = {a} + first(M)?

But consider… N  a | M P Q M  b | ε P  c* Q  d N first(N) = {a} + first(M)? Since M can match the empty string, N might start with a character in first(P). And since P can match the empty string, it must also include first(Q)!

But consider… N  a | M P Q M  b | ε P  c* Q  d N first(N) = {a} + first(M) + first(P) + first(Q) = {a,b,c,d} Since M can match the empty string, N might start with a character in first(P). And since P can match the empty string, it must also include first(Q)!

In General: Given a parsing expression (X1 X2 … Xn) To compute it’s first set:  Add first(X1).  Check if X1 is nullable (i.e., derives empty string).  If so, then add first(X2), and if X2 is nullable, … So we need a procedure to check if a non-terminal is nullable.

Not Enough In general, our alternatives will not have disjoint first(-) sets.  We could try to transform the grammar so that this is the case.  We effectively did this for regular expressions by compiling to NDFAs and then to DFAs. So a tool like Yacc does a lot more…

LR(1) Parsing a la Yacc Use a stack and DFA:  The states correspond to parsing items  Roughly, a set of productions marked with a position.  Lets us effectively remember where we came from.  The DFA tells us what to do when we peek at the next character in the input.  For LR(k), we look at k-symbols of input.  By using the first sets (and other computations, like follow sets), we can predict what production(s) we should work with and avoid back-tracking.  Possible actions:  Shift the input symbol, pushing it onto the stack.  Reduce symbols on the top of the stack to a non-terminal.

Example 1. e  INT 2. e  ( e ) 3. e  e + e Stack:Input: [](3 + 4) + (5 + 6)

Example 1. e  INT 2. e  ( e ) 3. e  e + e Stack:Input: [](3 + 4) + (5 + 6) Shift the ‘(‘ onto the stack.

Example 1. e  INT 2. e  ( e ) 3. e  e + e Stack:Input: [](3 + 4) + (5 + 6) [‘(‘]3 + 4) + (5 + 6)

Example 1. e  INT 2. e  ( e ) 3. e  e + e Stack:Input: [](3 + 4) + (5 + 6) [‘(‘]3 + 4) + (5 + 6) Can’t reduce, so shift the 3 onto the stack.

Example 1. e  INT 2. e  ( e ) 3. e  e + e Stack:Input: [](3 + 4) + (5 + 6) [‘(‘]3 + 4) + (5 + 6) [‘(‘,INT]+ 4) + (5 + 6)

Example 1. e  INT 2. e  ( e ) 3. e  e + e Stack:Input: [](3 + 4) + (5 + 6) [‘(‘]3 + 4) + (5 + 6) [‘(‘,INT]+ 4) + (5 + 6) Reduce by production 1

Example 1. e  INT 2. e  ( e ) 3. e  e + e Stack:Input: [](3 + 4) + (5 + 6) [‘(‘]3 + 4) + (5 + 6) [‘(‘,INT]+ 4) + (5 + 6) [‘(‘,e]+ 4) + (5 + 6)

Example 1. e  INT 2. e  ( e ) 3. e  e + e Stack:Input: [](3 + 4) + (5 + 6) [‘(‘]3 + 4) + (5 + 6) [‘(‘,INT]+ 4) + (5 + 6) [‘(‘,e]+ 4) + (5 + 6) Shift the ‘+’ onto the stack

Example 1. e  INT 2. e  ( e ) 3. e  e + e Stack:Input: [‘(‘,e,’+’]4) + (5 + 6) Shift 4 onto stack.

Example 1. e  INT 2. e  ( e ) 3. e  e + e Stack:Input: [‘(‘,e,’+’]4) + (5 + 6) [‘(‘,e,’+’,INT]) + (5 + 6) Reduce using rule 1

Example 1. e  INT 2. e  ( e ) 3. e  e + e Stack:Input: [‘(‘,e,’+’]4) + (5 + 6) [‘(‘,e,’+’,INT]) + (5 + 6) [‘(‘,e,’+’,e] ) + (5 + 6) Reduce again with rule 3.

Example 1. e  INT 2. e  ( e ) 3. e  e + e Stack:Input: [‘(‘,e,’+’]4) + (5 + 6) [‘(‘,e,’+’,INT]) + (5 + 6) [‘(‘,e,’+’,e] ) + (5 + 6) [‘(‘,e]) + (5 + 6)

Example 1. e  INT 2. e  ( e ) 3. e  e + e Stack:Input: [‘(‘,e,’+’]4) + (5 + 6) [‘(‘,e,’+’,INT]) + (5 + 6) [‘(‘,e,’+’,e] ) + (5 + 6) [‘(‘,e]) + (5 + 6) Shift ‘)’ onto the stack

Example 1. e  INT 2. e  ( e ) 3. e  e + e Stack:Input: [‘(‘,e,’+’,e] ) + (5 + 6) [‘(‘,e] ) + (5 + 6) [‘(‘,e,’)’] + (5 + 6)

Example 1. e  INT 2. e  ( e ) 3. e  e + e Stack:Input: [‘(‘,e,’+’,e] ) + (5 + 6) [‘(‘,e] ) + (5 + 6) [‘(‘,e,’)’] + (5 + 6) Reduce by rule 2.

Example 1. e  INT 2. e  ( e ) 3. e  e + e Stack:Input: [‘(‘,e,’+’,e] ) + (5 + 6) [‘(‘,e] ) + (5 + 6) [‘(‘,e,’)’] + (5 + 6) [e] + (5 + 6)

Example 1. e  INT 2. e  ( e ) 3. e  e + e Stack:Input: [‘(‘,e,’+’,e] ) + (5 + 6) [‘(‘,e] ) + (5 + 6) [‘(‘,e,’)’] + (5 + 6) [e] + (5 + 6) Continuing on…

Example 1. e  INT 2. e  ( e ) 3. e  e + e Stack:Input: [e] + (5 + 6) [e,’+’] (5 + 6) [e,’+’,’(‘]5 + 6) [e,’+’,’(‘,INT]+ 6)

Example 1. e  INT 2. e  ( e ) 3. e  e + e Stack:Input: [e,’+’,’(‘,INT]+ 6) [e,’+’,’(‘,e]+ 6) [e,’+’,’(‘,e,’+’]6) [e,’+’,’(‘,e,’+’,INT] )

Example 1. e  INT 2. e  ( e ) 3. e  e + e Stack:Input: [e,’+’,’(‘,e,’+’,INT]) [e,’+’,’(‘,e,’+’,e]) [e,’+’,’(‘,e]) [e,’+’,’(‘,e,’)’]

Example 1. e  INT 2. e  ( e ) 3. e  e + e Stack:Input: [e,’+’,’(‘,e,’)’] [e,’+’,e] [e]

Right-Most Derivation If we read the sequence of stacks backwards, we are always expanding the right-most non-terminal. (Hence the “R” in LR parsing.) Stack: [e,’+’,’(‘,e,’+’,e,’)’] [e,’+’,’(‘,e,’)’] [e,’+’,e] [e]

The Tricky Part When do we shift and when do we reduce?  In general, whether to shift or reduce can be hard to figure out, even when the grammar is unambiguous.  When it is ambiguous, we get a conflict:  Reduce-Reduce: Yacc can’t figure out which of two productions to use to collapse the stack.  Shift-Reduce: Yacc can’t figure out whether it should shift or whether it should reduce.  Look at generated parse.grm.desc file for details.  Two fixes:  Use associativity & precedence directives  Rewrite the grammar

For Our Simple Grammar Example: Stack: [e ‘+’ e] Input: + 5 Should I shift or should I reduce now?

In this case: The fix is to tell Yacc that we want + to be left- associative (in which case it will reduce instead of shifting.) Alternatively, rewrite: e  t e  t + e t  INT t  ( e )

Another Example: s  var := e ; s  if e then s else s s  if e then s

Conflict if e then (if e then x := 3; else x := 4;) if e then (if e then x := 3;) else x := 4; By default, Yacc will shift instead of reducing. (So we get the first parse by default.) Better to engineer this away… Ideally, put in an ending delimiter (end-if) Or refactor grammar…

Refactoring s  os os  var := e ; | if e then os | if e then cs else os cs  var := e ; | if e then cs else cs if e then if e then x:=3; else x:=4;

What You Need to Know I’m less concerned that you understand the details in constructing predictive parsing tables.  Read the chapters in Appel.  They will help you understand what Yacc is doing. You should be able to build parsers using Yacc for standard linguistic constructs.  That means understanding what conflicts are, and how to avoid them.  Hence the Fish front-end.  Real grammars are, unfortunately, rather ugly.