Parsing Wrap Up Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit.

Slides:



Advertisements
Similar presentations
Parsing V: Bottom-up Parsing
Advertisements

A question from last class: construct the predictive parsing table for this grammar: S->i E t S e S | i E t S | a E -> B.
Compiler Designs and Constructions
Compiler construction in4020 – lecture 4 Koen Langendoen Delft University of Technology The Netherlands.
Joey Paquet, 2000, 2002, 2008, Lecture 7 Bottom-Up Parsing II.
Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
Mooly Sagiv and Roman Manevich School of Computer Science
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
1 Contents Introduction A Simple Compiler Scanning – Theory and Practice Grammars and Parsing LL(1) Parsing LR Parsing Lex and yacc Semantic Processing.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #10 Parsing.
Parsing VI The LR(1) Table Construction. LR(k) items The LR(1) table construction algorithm uses LR(1) items to represent valid configurations of an LR(1)
Parsing III (Eliminating left recursion, recursive descent parsing)
Parsing VII The Last Parsing Lecture Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
Context-sensitive Analysis. Beyond Syntax There is a level of correctness that is deeper than grammar fie(a,b,c,d) int a, b, c, d; { … } fee() { int f[3],g[0],
Parsing V Introduction to LR(1) Parsers. from Cooper & Torczon2 LR(1) Parsers LR(1) parsers are table-driven, shift-reduce parsers that use a limited.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
1 LR parsing techniques SLR (not in the book) –Simple LR parsing –Easy to implement, not strong enough –Uses LR(0) items Canonical LR –Larger parser but.
1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.
Parsing Wrap-up. from Cooper and Torczon2 Filling in the A CTION and G OTO Tables The algorithm Many items generate no table entry  Closure( ) instantiates.
Parsing IV Bottom-up Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Parsing IV Bottom-up Parsing. Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves.
LR(1) Parsers Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit.
Parsing. Goals of Parsing Check the input for syntactic accuracy Return appropriate error messages Recover if possible Produce, or at least traverse,
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Parsing III (Top-down parsing: recursive descent & LL(1) )
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Lexical and Syntax Analysis
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Top Down Parsing - Part I Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Parsing III (Top-down parsing: recursive descent & LL(1) ) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Bottom-up Parsing, Part I Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Parsing V LR(1) Parsers C OMP 412 Rice University Houston, Texas Fall 2001 Copyright 2000, Keith D. Cooper, Ken Kennedy, & Linda Torczon, all rights reserved.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Top-Down Parsing.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
Parsing VII The Last Parsing Lecture. High-level overview  Build the canonical collection of sets of LR(1) Items, I aBegin in an appropriate state, s.
Parsing V LR(1) Parsers N.B.: This lecture uses a left-recursive version of the SheepNoise grammar. The book uses a right-recursive version. The derivations.
Parsing III (Top-down parsing: recursive descent & LL(1) )
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Parsing V LR(1) Parsers. LR(1) Parsers LR(1) parsers are table-driven, shift-reduce parsers that use a limited right context (1 token) for handle recognition.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
Eliminating Left-Recursion Where some of a nonterminal’s productions are left-recursive, top-down parsing is not possible “Immediate” left-recursion can.
Parsing — Part II (Top-down parsing, left-recursion removal)
Parsing III (Top-down parsing: recursive descent & LL(1) )
Lexical and Syntax Analysis
Compiler design Bottom-up parsing: Canonical LR and LALR
Parsing VI The LR(1) Table Construction
4 (c) parsing.
Lexical and Syntax Analysis
Top-Down Parsing CS 671 January 29, 2008.
Context-sensitive Analysis
Parsing VII The Last Parsing Lecture
Introduction to Parsing
Introduction to Parsing
Parsing — Part II (Top-down parsing, left-recursion removal)
Kanat Bolazar February 16, 2010
Lecture 11 LR Parse Table Construction
Lecture 11 LR Parse Table Construction
Compiler design Bottom-up parsing: Canonical LR and LALR
Presentation transcript:

Parsing Wrap Up Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved. COMP 412 FALL 2010

Comp 412, Fall LR(k ) versus LL(k ) Finding the next step in a derivation LR(k)  Each reduction in the parse is detectable with  the complete left context,  the reducible phrase, itself, and  the k terminal symbols to its right LL(k)  Parser must select the expansion based on  The complete left context  The next k terminals Thus, LR(k) examines more context The question is, do languages fall in the gap between LR(k) and LL(k)? generalizations of LR(1) and LL(1) to longer lookaheads

Comp 412, Fall LR(1) versus LL(1) The following LR(1) grammar has no LL(1) counterpart The Canonical Collection has 18 sets of LR(1) Items —It is not a simple grammar —It is, however, LR(1) It requires an arbitrary lookahead to choose between A & B An LR(1) parser can carry the left context (the ‘(‘ s) until it sees a or b The table construction will handle it In contrast, an LL(1) parser cannot decide whether to expand Goal by A or B → No amount of massaging the grammar will resolve this problem More precisely, the language described by this LR(1) grammar cannot be described with an LL(1) grammar. In fact, the language has no LL(k) grammar, for finite k. Example due to Bill Waite, UC Boulder

Comp 412, Fall Building LR(1) Tables for Waite’s Language The Canonical Collection of Sets of LR(1) Items cc 0 [Goal →  S, EOF], [S →  A, EOF], [S →  B, EOF] [A →  ( A ), EOF], [A →  a, EOF], [B →  ( B >, EOF], [B →  b, EOF], cc 1 [Goal → S , EOF] cc 2 [S → A ,EOF] cc 3 [S → B ,EOF] cc 4 [A →  (  A ), )], [A → (  A ), EOF], [A →  a, )], [B →  (B >, >], [B → (  B >, EOF], [B →  b, >] cc 5 [A → a , EOF] cc 6 [B → b , EOF] cc 7 [A → ( A  ), EOF] cc 8 [B → ( B  >, EOF] cc 4 is goto(cc 0, ( )

Comp 412, Fall Building LR(1) Tables for Waite’s Language And, the rest of it … cc 9 [A → (  A ), )], [A → (  A ), )], [A →  a, )], [B →  (B >, >], [B → (  B >, >], [B →  b, >] cc 10 [A → a , )] cc 11 [B → b , >] cc 12 [A → ( A ) , EOF] cc 13 [B → ( B > , EOF] cc 14 [A → (A  ), )] cc 15 [B → (B  >, >] cc 16 [A → (A ) , )] cc 17 [B → (B > , >]

Comp 412, Fall Building LR(1) Tables for Waite’s Language Notice how sparse the table is. - Goto has 7 of 54 - Action has 23 of 108 Notice rows & columns that might be combined. Notice |CC| > |P|

Comp 412, Fall LR(k ) versus LL(k ) Other Non-LL Grammars Example from D.E Knuth, “Top-Down Syntactic Analysis,” Acta Informatica, 1:2 (1971), pages Example from Lewis, Rosenkrantz, & Stearns book, “Compiler Design Theory,” (1976), Figure 13.1 This grammar is actually LR(0)

Comp 412, Fall LR(k ) versus LL(k ) Finding the next step in a derivation LR(k)  Each reduction in the parse is detectable with  the complete left context,  the reducible phrase, itself, and  the k terminal symbols to its right LL(k)  Parser must select the expansion based on  The complete left context  The next k terminals Thus, LR(k) examines more context “… in practice, programming languages do not actually seem to fall in the gap between LL(1) languages and deterministic languages” J.J. Horning, “LR Grammars and Analysers”, in Compiler Construction, An Advanced Course, Springer-Verlag, 1976 generalizations of LR(1) and LL(1) to longer lookaheads

Comp 412, Fall Left Recursion versus Right Recursion Right recursion – Required for termination in top-down parsers – Uses (on average) more stack space – Naïve right recursion produces right-associativity Left recursion – Works fine in bottom-up parsers – Limits required stack space – Naïve left recursion produces left-associativity Rule of thumb – Left recursion for bottom-up parsers – Right recursion for top-down parsers * * * w x y z w * ( x * ( y * z ) ) * * * z w x y ( (w * x ) * y ) * z

Comp 412, Fall Associativity What difference does it make? Can change answers in floating-point arithmetic Can change opportunities for optimization Consider x+y+z What if y+z occurs elsewhere? Or x+y? or x+z? The compiler may want to change the “shape” of expressions What if x = 2 & z = 17 ? Neither left nor right exposes 19. Best shape is function of surrounding context + xyz Ideal operator + + x y z Right association xy z + + Left association

Comp 412, Fall Error Detection and Recovery Error Detection Recursive descent —Parser takes the last else clause in a routine —Compiler writer can code almost any arbitrary action Table-driven LL(1) —In state s i facing word x, entry is an error —Report the error, valid entries in row for s i encode possibilities Table-driven LR(1) —In state s i facing word x, entry is an error —Report the error, shift states in row encode possibilities —Can precompute better messages from LR(1) items

Comp 412, Fall Error Detection and Recovery Error Recovery Table-driven LL(1) —Treat as missing token, e.g. ‘)‘,  expand by desired symbol —Treat as extra token, e.g., ‘x - + y’,  pop stack and move ahead Table-driven LR(1) —Treat as missing token, e.g. ‘)’,  shift the token —Treat as extra token, e.g., ‘x - + y’,  don’t shift the token Can pre-compute sets of states with appropriate entries…

Comp 412, Fall One common strategy is “hard token” recovery Skip ahead in input until we find some “hard” token, e.g. ‘;’ —‘;’ separates statements, makes a logical break in the parse —Resynchronize state, stack, and input to point after hard token  LL(1): pop stack until we find a row with entry for ‘;’  LR(1): pop stack until we find a state with a reduction on ‘;’ —Does not correct the input, rather it allows parse to proceed Error Detection and Recovery NT  pop() repeat until Table[NT,’;’]  error NT  pop() token  NextToken() repeat until token = ‘;’ token  NextToken() NT  pop() repeat until Table[NT,’;’]  error NT  pop() token  NextToken() repeat until token = ‘;’ token  NextToken() Resynchronizing an LL(1) parser repeat until token = ‘;’ shift token shift s e token  NextToken() reduce by error production // pops all that state off stack repeat until token = ‘;’ shift token shift s e token  NextToken() reduce by error production // pops all that state off stack Resynchronizing an LR(1) parser

Comp 412, Fall Shrinking the A CTION and G OTO Tables Three options: Combine terminals such as number & identifier, + & -, * & / —Directly removes a column, may remove a row —For expression grammar, 198 (vs. 384) table entries Combine rows or columns —Implement identical rows once & remap states —Requires extra indirection on each lookup —Use separate mapping for A CTION & for G OTO Use another construction algorithm —Both L ALR(1) and S LR(1) produce smaller tables  LALR(1) represents each state with its “core” items  SLR(1) uses LR(0) items and the F OLLOW set —Implementations are readily available left-recursive expression grammar with precedence, see §3.7.2 in EAC classic space-time tradeoff Fewer grammars, same languages

Comp 412, Fall Hierarchy of Context-Free Languages Context-free languages Deterministic languages (LR(k)) LL(k) languagesSimple precedence languages LL(1) languagesOperator precedence languages LR(k)  LR(1) The inclusion hierarchy for context-free languages

Comp 412, Fall Hierarchy of Context-Free Grammars The inclusion hierarchy for context-free grammars Operator precedence includes some ambiguous grammars Note sub-categories of LR(1) LL(1) is a subset of SLR (1) Context-free grammars Unambiguous CFGs Operator Precedence Floyd-Evans Parsable LR(k) LR(1) LALR(1) SLR(1) LR(0) LL(k) LL(1)

Comp 412, Fall Summary Advantages Fast Good locality Simplicity Good error detection Fast Deterministic langs. Automatable Left associativity Disadvantages Hand-coded High maintenance Right associativity Large working sets Poor error messages Large table sizes Top-down Recursive descent, LL(1) LR(1)

Comp 412, Fall Lab 2 Overview: An LL(1) Parser Generator You will build a program to generate LL(1) parse tables Input is a modified form of Backus-Naur Form (MBNF) Output is a parse table and some auxiliary data structures —Both a pretty (human-readable) version and the C declarations You will write: —A hand-coded scanner for MBNF —A hand-coded, recursive descent parser for MBNF —A table generator (First, Follow, First + sets) Interim deadlines to keep you on track —Want you to make progress at a steady rate Documents will be online by Monday, October 4, 2010 Skeleton parser ready in about a week, with its own docs

Comp 412, Fall Beyond Syntax There is a level of correctness that is deeper than grammar fie(a,b,c,d) int a, b, c, d; { … } fee() { int f[3],g[0], h, i, j, k; char *p; fie(h,i,“ab”,j, k); k = f * i + j; h = g[17]; printf(“.\n”, p,q); p = 10; } What is wrong with this C program? (let me count the ways …)

Comp 412, Fall Beyond Syntax There is a level of correctness that is deeper than grammar fie(a,b,c,d) int a, b, c, d; { … } fee() { int f[3],g[0], h, i, j, k; char *p; fie(h,i,“ab”,j, k); k = f * i + j; h = g[17]; printf(“.\n”, p,q); p = 10; } What is wrong with this C program? (let me count the ways …) declared g[0], used g[17] wrong number of args to fie() “ab” is not an int wrong dimension on use of f undeclared variable q 10 is not a character string All of these are “deeper than syntax” To generate code, we need to understand its meaning !

Comp 412, Fall Beyond Syntax To generate code, the compiler needs to answer many questions Is “x” a scalar, an array, or a function? Is “x” declared? Are there names that are not declared? Declared but not used? Which declaration of “x” does each use reference? Is the expression “x * y + z” type-consistent? In “a[i,j,k]”, does a have three dimensions? Where can “z” be stored? (register, local, global, heap, static) In “f  15”, how should 15 be represented? How many arguments does “fie()” take? What about “printf ()” ? Does “*p” reference the result of a “malloc()” ? Do “p” & “q” refer to the same memory location? Is “x” defined before it is used? These cannot be expressed in a CFG