LALR Parsing Adapted from Notes by Profs Aiken and Necula (UCB) and

Slides:



Advertisements
Similar presentations
Joey Paquet, 2000, 2002, 2008, Lecture 7 Bottom-Up Parsing II.
Advertisements

Mooly Sagiv and Roman Manevich School of Computer Science
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
1 Chapter 5: Bottom-Up Parsing (Shift-Reduce). 2 - attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working.
1 Bottom Up Parsing. 2 Bottom-Up Parsing l Bottom-up parsing is more general than top-down parsing »And just as efficient »Builds on ideas in top-down.
Lecture #8, Feb. 7, 2007 Shift-reduce parsing,
Bottom Up Parsing.
1 LR parsing techniques SLR (not in the book) –Simple LR parsing –Easy to implement, not strong enough –Uses LR(0) items Canonical LR –Larger parser but.
1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.
Bottom-up parsing Goal of parser : build a derivation
LALR Parsing Canonical sets of LR(1) items
LR Parsing Compiler Baojian Hua
LR(k) Parsing CPSC 388 Ellen Walker Hiram College.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Syntactic Analysis Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
Chapter 5: Bottom-Up Parsing (Shift-Reduce)
Prof. Necula CS 164 Lecture 8-91 Bottom-Up Parsing LR Parsing. Parser Generators. Lecture 6.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Top-Down Parsing.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 6: LR grammars and automatic parser generators.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
Bottom Up Parsing PITT CS PITT CS Bottom Up Parsing  Also known as Shift-Reduce parsing  More powerful than top down.
Bottom-up parsing. Bottom-up parsing builds a parse tree from the leaves (terminals) to the start symbol int E T * TE+ T (4) (2) (3) (5) (1) int*+ E 
Chapter 8. LR Syntactic Analysis Sung-Dong Kim, Dept. of Computer Engineering, Hansung University.
Parser Generation Tools (Yacc and Bison) CS 471 September 24, 2007.
2016/7/9Page 1 Lecture 11: Semester Review COMP3100 Dept. Computer Science and Technology United International College.
Announcements/Reading
Programming Languages Translator
Compiler design Bottom-up parsing Concepts
50/50 rule You need to get 50% from tests, AND
Compiler Baojian Hua LR Parsing Compiler Baojian Hua
Introduction to Parsing (adapted from CS 164 at Berkeley)
Table-driven parsing Parsing performed by a finite state machine.
Chapter 4 Syntax Analysis.
Compiler Construction
Compiler design Bottom-up parsing: Canonical LR and LALR
Fall Compiler Principles Lecture 4: Parsing part 3
LALR Parsing Canonical sets of LR(1) items
Bottom-Up Syntax Analysis
Syntax Analysis Part II
4 (c) parsing.
Implementing a LR parser engine
Top-Down Parsing CS 671 January 29, 2008.
LR Parsing. Parser Generators.
Bison Marcin Zubrowski.
4d Bottom Up Parsing.
Parsing #2 Leonidas Fegaras.
Lecture (From slides by G. Necula & R. Bodik)
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Compiler Design 7. Top-Down Table-Driven Parsing
Bottom Up Parsing.
Compiler Construction
Chapter 4. Syntax Analysis (2)
Parsing #2 Leonidas Fegaras.
4d Bottom Up Parsing.
5. Bottom-Up Parsing Chih-Hung Wang
Syntax Analysis - 3 Chapter 4.
Kanat Bolazar February 16, 2010
Announcements HW2 due on Tuesday Fall 18 CSCI 4430, A Milanova.
4d Bottom Up Parsing.
Chapter 4. Syntax Analysis (2)
Parsing Bottom-Up LR Table Construction.
4d Bottom Up Parsing.
Lecture 18 Bottom-Up Parsing or Shift-Reduce Parsing
Parsing Bottom-Up LR Table Construction.
4d Bottom Up Parsing.
Compiler design Bottom-up parsing: Canonical LR and LALR
Presentation transcript:

LALR Parsing Adapted from Notes by Profs Aiken and Necula (UCB) and Prof. Saman Amarasinghe (MIT) CS780(Prasad) L15LALR

Two bottom-up parsing methods: SLR and LR Which one do we use? Neither LALR Parsing Two bottom-up parsing methods: SLR and LR Which one do we use? Neither SLR is not powerful enough. LR parsing tables are too big (1000’s of states vs. 100’s of states for SLR). In practice, use LALR(1) Stands for Look-Ahead LR A compromise between SLR(1) and LR(1) CS780(Prasad) L15LALR

Rough intuition: A LALR(1) parser for G has LALR Parsing (Cont.) Rough intuition: A LALR(1) parser for G has The same number of states as an SLR parser. Some of the lookahead discrimination of LR(1). Idea: Construct the DFA for the LR(1). Then merge the DFA states whose items differ only in the lookahead tokens We say that such states have the same core. CS780(Prasad) L15LALR

The Core of a Set of LR Item Definition: The core of a set of LR items is the set of first components. Example: the core of { [X ® a.b, b], [Y ® g.d, d]} is {X ® a.b, Y ® g.d} The core of an LR item is an LR(0) item. CS780(Prasad) L15LALR

Repeat until all states have distinct core. A LALR(1) DFA Repeat until all states have distinct core. Choose two distinct states with same core. Merge the states by creating a new one with the union of all the items. Point edges from predecessors to new state. New state points to all the previous successors. A B C A C BE D E F D F CS780(Prasad) L15LALR

The LALR Parser Can Have Conflicts Consider for example the LR(1) states {[X ® a., a], [Y ® b., b]} {[X ® a., b], [Y ® b., a]} And the merged LALR(1) state {[X ® a., a/b], [Y ® b., a/b]} Has a new reduce-reduce conflict. In practice such cases are rare. CS780(Prasad) L15LALR

LALR languages are not natural. LALR vs. LR Parsing LALR languages are not natural. They are an efficiency hack on LR languages Any reasonable programming language has an LALR(1) grammar. LALR(1) has become a standard for programming languages and for parser generators. CS780(Prasad) L15LALR

Example -- LR(0)/SLR DFA 26 Example -- LR(0)/SLR DFA <S>  <X> $ <X>  <Y> <X>  ( <Y>  ( <Y> ) <Y>   ( <Y>  ( • <Y> ) <Y>  • ( <Y> ) <Y>  • s2 <S>  • <X> $ <X>  • <Y> <X>  • ( <Y>  • ( <Y> ) <Y>  • s0 ( <X>  ( • <Y>  ( • <Y> ) <Y>  • ( <Y> ) <Y>  • s1 ( Y Y <Y>  ( <Y> • ) s3 Y X ) <X>  <Y> • s6 <S>  <X> • $ s5 s4 <Y>  ( <Y> ) • L15LALR

( ( ( Y Y Y X ) 31 s0 s2 s1 s3 s6 s5 s4 <S>  • <X> $ <X>  • <Y> <X>  • ( <Y>  • ( <Y> ) <Y>  • s0 ( <X>  ( • <Y>  ( • <Y> ) <Y>  • ( <Y> ) <Y>  • s1 ( Y Y <Y>  ( <Y> • ) s3 Y X ) <X>  <Y> • s6 <S>  <X> • $ s5 s4 <Y>  ( <Y> ) • L15LALR

Example -- LR(1) DFA <S>  <X> $ <X>  <Y> 46 Example -- LR(1) DFA <S>  <X> $ <X>  <Y> <X>  ( <Y>  ( <Y> ) <Y>   Y [<Y>  (<Y> • ) )] s7 ) s8 [<Y>  (<Y>) • )] ( [<Y>  ( • <Y>) )] [<Y>  • ( <Y> ) )] [<Y>  • )] s2 [<S>  • <X> $ ?] [<X>  • <Y> $] [<X>  • ( $] [<Y>  • (<Y>) $] [<Y>  • $] s0 ( [<X>  ( • $] [<Y>  ( • <Y> ) $] [<Y>  • ( <Y>) )] [<Y>  • )] s1 ( Y [<Y>  (<Y> • ) $] s3 Y X ) [<X>  <Y> • $] s6 [<S>  <X> •$ ?] s5 s4 [<Y>  (<Y>) • $] L15LALR

s7 … s8 => Y ) ( ( ( Y Y X ) 51 s7 s8 s0 s2 s1 s3 s6 s5 s4 s7 [<S>  • <X> $ ?] [<X>  • <Y> $] [<X>  • ( $] [<Y>  • (<Y>) $] [<Y>  • $] [<Y>  ( • <Y>) )] [<Y>  • ( <Y> ) )] [<Y>  • )] s1 For input (|$ , that is lookahead $, reduce in s1 using X -> ( do not reduce using Y -> e For input (|)$, that is lookahead ), reduce in s1 using Y -> e. do not reduce using X -> ( For input (|())$, that is lookahead (, shift ) in s1. do not reduce (^n)^n is recognized similarly for n>=2 ( [<X>  ( • $] [<Y>  ( • <Y> ) $] [<Y>  • ( <Y>) )] [<Y>  • )] Y s3 Y X [<Y>  (<Y> • ) $] ) s6 s5 s4 [<X>  <Y> • $] [<S>  <X> •$ ?] [<Y>  (<Y>) • $] L15LALR

Example -- LALR(1) DFA <S>  <X> $ <X>  <Y> 46 Example -- LALR(1) DFA <S>  <X> $ <X>  <Y> <X>  ( <Y>  ( <Y> ) <Y>   Y [<Y>  (<Y> • ) )] s7 ) s8 [<Y>  (<Y>) • )] ( [<Y>  ( • <Y>) )] [<Y>  • ( <Y> ) )] [<Y>  • )] s2 [<S>  • <X> $ ?] [<X>  • <Y> $] [<X>  • ( $] [<Y>  • (<Y>) $] [<Y>  • $] s0 ( [<X>  ( • $] [<Y>  ( • <Y> ) $] [<Y>  • ( <Y>) )] [<Y>  • )] s1 ( Y Y [<Y>  (<Y> • ) {$,)}] s3 Y X ) [<X>  <Y> • $] s6 [<S>  <X> •$ ?] s5 s4 [<Y>  (<Y>) • {$,)}] L15LALR

Example -- LALR(1) DFA <S>  <X> $ <X>  <Y> 46 Example -- LALR(1) DFA <S>  <X> $ <X>  <Y> <X>  ( <Y>  ( <Y> ) <Y>   ( [<Y>  ( • <Y>) )] [<Y>  • ( <Y> ) )] [<Y>  • )] s2 [<S>  • <X> $ ?] [<X>  • <Y> $] [<X>  • ( $] [<Y>  • (<Y>) $] [<Y>  • $] s0 ( [<X>  ( • $] [<Y>  ( • <Y> ) $] [<Y>  • ( <Y>) )] [<Y>  • )] s1 1) The number of states in LALR(1) DFA same as that of SLR(1). 2) Discrimination power for LALR(1) similar to LR(1), enabling removal of reduce-reduce conflict in s1. In general, parallel investigation of multiple possibilities disambiguated using the extra context (rather than being waylaid by the follow set). In other situations, follow set interpretation regenerated (for compact representation) by merging the LR(1) states with common core. ( Y Y [<Y>  (<Y> • ) {$,)}] s3 Y X ) [<X>  <Y> • $] s6 [<S>  <X> •$ ?] s5 s4 [<Y>  (<Y>) • {$,)}] L15LALR

( ( ( Y Y Y X ) 51 s0 s2 s1 s3 s6 s5 s4 reduce(4) [<S>  • <X> $ ?] [<X>  • <Y> $] [<X>  • ( $] [<Y>  • (<Y>) $] [<Y>  • $] [<Y>  ( • <Y>) )] [<Y>  • ( <Y> ) )] [<Y>  • )] s1 For input (|$ , that is lookahead $, reduce in s1 using X -> ( do not reduce using Y -> e For input (|)$, that is lookahead ), reduce in s1 using Y -> e. do not reduce using X -> ( For input (|())$, that is lookahead (, shift ) in s1. do not reduce In LR(1) table, entry(s4,”)”) was an ERROR. In LALR(1), REDUCE by rule 4 leads to s0, which triggers ERROR on “)”. So error detection postponed by a reduction but before any additional shifts. ( [<X>  ( • $] [<Y>  ( • <Y> ) $] [<Y>  • ( <Y>) )] [<Y>  • )] Y Y [<Y>  (<Y> • ) {$,)}] s3 Y X ) s6 s5 s4 [<Y>  (<Y>) • {$,)}] [<X>  <Y> • $] [<S>  <X> •$ ?] L15LALR

52 LALR(1) reduce(4) s7 LR(1) s7 … s8 => L15LALR

52 LALR(1) reduce(4) SLR(1) L15LALR

A Hierarchy of Grammar Classes 1) To show LR(k) more powerful than LL(k) use left recursive grammar. 2) To show grammars (for regular languages) that are not LR(K), for any k, consider: S -> Aa | Bb A -> Ad | d B -> Bd | d To choose the first reduction, we need to know if the input ends in an “a” or a “b”. 3) On the otherhand, the existance of a DFA for every regular language implies that all regular languages are LL(1) parsable. 4) To show LL(2) can be LR(1): S -> Aa | Bb A -> d B -> d (Shift “d”; decide on the reduction based in whether the lookahead is an “a” or a “b”.) 4) Show LL(2) can fail to be LR(1): CS780(Prasad) L15LALR

Keep attributes on the stack. Semantic Actions We can now illustrate how semantic actions are implemented for LR parsing. Keep attributes on the stack. On shift a, push attribute for a on stack. On reduce X ® a pop attributes for a compute attribute for X and push it on the stack CS780(Prasad) L15LALR

Performing Semantic Actions. Example Recall the example from earlier lecture E ® T + E1 { E.val = T.val + E1.val } | T { E.val = T.val } T ® int * T1 { T.val = int.val * T1.val } | int { T.val = int.val } Consider the parsing of the string 3 * 5 + 8 CS780(Prasad) L15LALR

Performing Semantic Actions. Example | int * int + int shift int3 | * int + int shift int3 * | int + int shift int3 * int5 | + int reduce T ® int int3 * T5 | + int reduce T ® int * T T15 | + int shift T15 + | int shift T15 + int8 | reduce T ® int T15 + T8 | reduce E ® T T15 + E8 | reduce E ® T + E E23 | accept CS780(Prasad) L15LALR

It is also possible to compute inherited attributes in an LR parser. Notes The previous discussion shows how synthesized attributes are computed by LR parsers. It is also possible to compute inherited attributes in an LR parser. CS780(Prasad) L15LALR

Using Parser Generators Most common parser generators are LALR(1). A parser generator constructs a LALR(1) table. And reports an error when a table entry is multiply defined: A shift and a reduce. Called shift/reduce conflict Multiple reduces. Called reduce/reduce conflict An ambiguous grammar will generate conflicts. What do we do in that case? CS780(Prasad) L15LALR

Shift/Reduce Conflicts Typically due to ambiguities in the grammar. Classic example: the dangling else S ® if E then S | if E then S else S | OTHER Will have DFA state containing [S ® if E then S., else] [S ® if E then S. else S, x] if else follows, then we can shift or reduce Default (bison, CUP, etc.) is to shift Default behavior is as needed in this case. CS780(Prasad) L15LALR

More Shift/Reduce Conflicts Consider the ambiguous grammar E ® E + E | E * E | int We will have the states containing [E ® E * . E, +] [E ® E * E., +] [E ® . E + E, +] ÞE [E ® E . + E, +] … … Again a shift/reduce conflict on input + We need to reduce (* binds more tightly that +) Recall solution: declare the precedence of * and + CS780(Prasad) L15LALR

In bison, declare precedence and associativity: Bison Approach In bison, declare precedence and associativity: %left + %left * Precedence of a rule = that of its last terminal See bison manual for ways to override this default. Resolve shift/reduce conflict with a shift if: no precedence declared for either rule or terminal input terminal has higher precedence than the rule the precedences are the same and right associative CS780(Prasad) L15LALR

Using Precedence to Solve S/R Conflicts Back to our example: [E ® E * . E, +] [E ®E * E., +] [E ® . E + E, +] ÞE [E ®E . + E, +] … … Will choose reduce because precedence of rule E ® E * E is higher than of terminal + CS780(Prasad) L15LALR

Using Associativity to Solve S/R Conflicts Same grammar as before E ® E + E | E * E | int We will also have the states [E ® E + . E, +] [E ® E + E., +] [E ® . E + E, +] ÞE [E ® E . + E, +] … … Now we also have an S/R conflict on input + We choose reduce because E ® E + E and + have the same precedence and + is left-associative. CS780(Prasad) L15LALR

Using Precedence to Solve S/R Conflicts Back to our dangling else example [S ® if E then S., else] [S ® if E then S. else S, x] Can eliminate conflict by declaring else with higher precedence than then. But this starts to look like “hacking the tables”. Best to avoid overuse of precedence declarations, or you’ll end with unexpected parse trees. CS780(Prasad) L15LALR

Reduce/Reduce Conflicts Usually due to gross ambiguity in the grammar Example: a sequence of identifiers S ® e | id | id S There are two parse trees for the string id S ® id S ® id S ® id How does this confuse the parser? CS780(Prasad) L15LALR

More on Reduce/Reduce Conflicts Consider the states [S ® id ., $] [S’ ® . S, $] [S ® id . S, $] [S ® ., $] Þid [S ® ., $] [S ® . id, $] [S ® . id, $] [S ® . id S, $] [S ® . id S, $] Reduce/reduce conflict on input $ S’ ® S ® id S’ ® S ® id S ® id Better rewrite the grammar: S ® e | id S CS780(Prasad) L15LALR

Strange Reduce/Reduce Conflicts Consider the grammar S ® P R , NL ® N | N , NL P ® T | NL : T R ® T | N : T N ® id T ® id P - parameters specification R - result specification N - a parameter or result name T - a type name NL - a list of names S => i,j,k: int p:real, S => int real, CS780(Prasad) L15LALR

Strange Reduce/Reduce Conflicts In P an id is a N when followed by , or : T when followed by id In R an id is a N when followed by : T when followed by , This is an LR(1) grammar. But it is not LALR(1). Why? For obscure reasons CS780(Prasad) L15LALR

A Few LR(1) States LALR reduce/reduce conflict on “,” P ® . T id P ® . NL : T id NL ® . N : NL ® . N , NL : N ® . id : N ® . id , T ® . id id 1 T ® id . id N ® id . : N ® id . , id 3 LALR reduce/reduce conflict on “,” T ® id . id/, N ® id . :/, LALR merge R ® . T , R ® . N : T , T ® . id , N ® . id : T ® id . , N ® id . : id 4 2 L15LALR

Two distinct states were confused because they have the same core. What Happened? Two distinct states were confused because they have the same core. Fix: add dummy productions to distinguish the two confused states. E.g., add R ® id bogus bogus is a terminal not used by the lexer. This production will never be used during parsing. But it distinguishes R from P. CS780(Prasad) L15LALR

A Few LR(1) States After Fix P ® . T id P ® . NL : T id NL ® . N : NL ® . N , NL : N ® . id : N ® . id , T ® . id id 1 T ® id . id N ® id . : N ® id . , 3 id Different cores Þ no LALR merging T ® id . , N ® id . : R ® id . bogus , 4 R ® . T , R ® . N : T , R ® . id bogus , T ® . id , N ® . id : 2 id L15LALR

Notes on Parsing Parsing A solid foundation: context-free grammars A simple parser: LL(1) A more powerful parser: LR(1) An efficiency hack: LALR(1) LALR(1) parser generators CS780(Prasad) L15LALR