Presentation is loading. Please wait.

Presentation is loading. Please wait.

LALR Parsing Adapted from Notes by Profs Aiken and Necula (UCB) and

Similar presentations


Presentation on theme: "LALR Parsing Adapted from Notes by Profs Aiken and Necula (UCB) and"— Presentation transcript:

1 LALR Parsing Adapted from Notes by Profs Aiken and Necula (UCB) and
Prof. Saman Amarasinghe (MIT) CS780(Prasad) L15LALR

2 Two bottom-up parsing methods: SLR and LR Which one do we use? Neither
LALR Parsing Two bottom-up parsing methods: SLR and LR Which one do we use? Neither SLR is not powerful enough. LR parsing tables are too big (1000’s of states vs. 100’s of states for SLR). In practice, use LALR(1) Stands for Look-Ahead LR A compromise between SLR(1) and LR(1) CS780(Prasad) L15LALR

3 Rough intuition: A LALR(1) parser for G has
LALR Parsing (Cont.) Rough intuition: A LALR(1) parser for G has The same number of states as an SLR parser. Some of the lookahead discrimination of LR(1). Idea: Construct the DFA for the LR(1). Then merge the DFA states whose items differ only in the lookahead tokens We say that such states have the same core. CS780(Prasad) L15LALR

4 The Core of a Set of LR Item
Definition: The core of a set of LR items is the set of first components. Example: the core of { [X ® a.b, b], [Y ® g.d, d]} is {X ® a.b, Y ® g.d} The core of an LR item is an LR(0) item. CS780(Prasad) L15LALR

5 Repeat until all states have distinct core.
A LALR(1) DFA Repeat until all states have distinct core. Choose two distinct states with same core. Merge the states by creating a new one with the union of all the items. Point edges from predecessors to new state. New state points to all the previous successors. A B C A C BE D E F D F CS780(Prasad) L15LALR

6 The LALR Parser Can Have Conflicts
Consider for example the LR(1) states {[X ® a., a], [Y ® b., b]} {[X ® a., b], [Y ® b., a]} And the merged LALR(1) state {[X ® a., a/b], [Y ® b., a/b]} Has a new reduce-reduce conflict. In practice such cases are rare. CS780(Prasad) L15LALR

7 LALR languages are not natural.
LALR vs. LR Parsing LALR languages are not natural. They are an efficiency hack on LR languages Any reasonable programming language has an LALR(1) grammar. LALR(1) has become a standard for programming languages and for parser generators. CS780(Prasad) L15LALR

8 Example -- LR(0)/SLR DFA
26 Example -- LR(0)/SLR DFA <S>  <X> $ <X>  <Y> <X>  ( <Y>  ( <Y> ) <Y>   ( <Y>  ( • <Y> ) <Y>  • ( <Y> ) <Y>  • s2 <S>  • <X> $ <X>  • <Y> <X>  • ( <Y>  • ( <Y> ) <Y>  • s0 ( <X>  ( • <Y>  ( • <Y> ) <Y>  • ( <Y> ) <Y>  • s1 ( Y Y <Y>  ( <Y> • ) s3 Y X ) <X>  <Y> • s6 <S>  <X> • $ s5 s4 <Y>  ( <Y> ) • L15LALR

9 ( ( ( Y Y Y X ) 31 s0 s2 s1 s3 s6 s5 s4 <S>  • <X> $
<X>  • <Y> <X>  • ( <Y>  • ( <Y> ) <Y>  • s0 ( <X>  ( • <Y>  ( • <Y> ) <Y>  • ( <Y> ) <Y>  • s1 ( Y Y <Y>  ( <Y> • ) s3 Y X ) <X>  <Y> • s6 <S>  <X> • $ s5 s4 <Y>  ( <Y> ) • L15LALR

10 Example -- LR(1) DFA <S>  <X> $ <X>  <Y>
46 Example -- LR(1) DFA <S>  <X> $ <X>  <Y> <X>  ( <Y>  ( <Y> ) <Y>   Y [<Y>  (<Y> • ) )] s7 ) s8 [<Y>  (<Y>) • )] ( [<Y>  ( • <Y>) )] [<Y>  • ( <Y> ) )] [<Y>  • )] s2 [<S>  • <X> $ ?] [<X>  • <Y> $] [<X>  • ( $] [<Y>  • (<Y>) $] [<Y>  • $] s0 ( [<X>  ( • $] [<Y>  ( • <Y> ) $] [<Y>  • ( <Y>) )] [<Y>  • )] s1 ( Y [<Y>  (<Y> • ) $] s3 Y X ) [<X>  <Y> • $] s6 [<S>  <X> •$ ?] s5 s4 [<Y>  (<Y>) • $] L15LALR

11 s7 … s8 => Y ) ( ( ( Y Y X ) 51 s7 s8 s0 s2 s1 s3 s6 s5 s4 s7
[<S>  • <X> $ ?] [<X>  • <Y> $] [<X>  • ( $] [<Y>  • (<Y>) $] [<Y>  • $] [<Y>  ( • <Y>) )] [<Y>  • ( <Y> ) )] [<Y>  • )] s1 For input (|$ , that is lookahead $, reduce in s using X -> ( do not reduce using Y -> e For input (|)$, that is lookahead ), reduce in s1 using Y -> e. do not reduce using X -> ( For input (|())$, that is lookahead (, shift ) in s1. do not reduce (^n)^n is recognized similarly for n>=2 ( [<X>  ( • $] [<Y>  ( • <Y> ) $] [<Y>  • ( <Y>) )] [<Y>  • )] Y s3 Y X [<Y>  (<Y> • ) $] ) s6 s5 s4 [<X>  <Y> • $] [<S>  <X> •$ ?] [<Y>  (<Y>) • $] L15LALR

12 Example -- LALR(1) DFA <S>  <X> $ <X>  <Y>
46 Example -- LALR(1) DFA <S>  <X> $ <X>  <Y> <X>  ( <Y>  ( <Y> ) <Y>   Y [<Y>  (<Y> • ) )] s7 ) s8 [<Y>  (<Y>) • )] ( [<Y>  ( • <Y>) )] [<Y>  • ( <Y> ) )] [<Y>  • )] s2 [<S>  • <X> $ ?] [<X>  • <Y> $] [<X>  • ( $] [<Y>  • (<Y>) $] [<Y>  • $] s0 ( [<X>  ( • $] [<Y>  ( • <Y> ) $] [<Y>  • ( <Y>) )] [<Y>  • )] s1 ( Y Y [<Y>  (<Y> • ) {$,)}] s3 Y X ) [<X>  <Y> • $] s6 [<S>  <X> •$ ?] s5 s4 [<Y>  (<Y>) • {$,)}] L15LALR

13 Example -- LALR(1) DFA <S>  <X> $ <X>  <Y>
46 Example -- LALR(1) DFA <S>  <X> $ <X>  <Y> <X>  ( <Y>  ( <Y> ) <Y>   ( [<Y>  ( • <Y>) )] [<Y>  • ( <Y> ) )] [<Y>  • )] s2 [<S>  • <X> $ ?] [<X>  • <Y> $] [<X>  • ( $] [<Y>  • (<Y>) $] [<Y>  • $] s0 ( [<X>  ( • $] [<Y>  ( • <Y> ) $] [<Y>  • ( <Y>) )] [<Y>  • )] s1 1) The number of states in LALR(1) DFA same as that of SLR(1). 2) Discrimination power for LALR(1) similar to LR(1), enabling removal of reduce-reduce conflict in s1. In general, parallel investigation of multiple possibilities disambiguated using the extra context (rather than being waylaid by the follow set). In other situations, follow set interpretation regenerated (for compact representation) by merging the LR(1) states with common core. ( Y Y [<Y>  (<Y> • ) {$,)}] s3 Y X ) [<X>  <Y> • $] s6 [<S>  <X> •$ ?] s5 s4 [<Y>  (<Y>) • {$,)}] L15LALR

14 ( ( ( Y Y Y X ) 51 s0 s2 s1 s3 s6 s5 s4 reduce(4)
[<S>  • <X> $ ?] [<X>  • <Y> $] [<X>  • ( $] [<Y>  • (<Y>) $] [<Y>  • $] [<Y>  ( • <Y>) )] [<Y>  • ( <Y> ) )] [<Y>  • )] s1 For input (|$ , that is lookahead $, reduce in s using X -> ( do not reduce using Y -> e For input (|)$, that is lookahead ), reduce in s1 using Y -> e. do not reduce using X -> ( For input (|())$, that is lookahead (, shift ) in s1. do not reduce In LR(1) table, entry(s4,”)”) was an ERROR. In LALR(1), REDUCE by rule 4 leads to s0, which triggers ERROR on “)”. So error detection postponed by a reduction but before any additional shifts. ( [<X>  ( • $] [<Y>  ( • <Y> ) $] [<Y>  • ( <Y>) )] [<Y>  • )] Y Y [<Y>  (<Y> • ) {$,)}] s3 Y X ) s6 s5 s4 [<Y>  (<Y>) • {$,)}] [<X>  <Y> • $] [<S>  <X> •$ ?] L15LALR

15 52 LALR(1) reduce(4) s7 LR(1) s7 … s8 => L15LALR

16 52 LALR(1) reduce(4) SLR(1) L15LALR

17 A Hierarchy of Grammar Classes
1) To show LR(k) more powerful than LL(k) use left recursive grammar. 2) To show grammars (for regular languages) that are not LR(K), for any k, consider: S -> Aa | Bb A -> Ad | d B -> Bd | d To choose the first reduction, we need to know if the input ends in an “a” or a “b”. 3) On the otherhand, the existance of a DFA for every regular language implies that all regular languages are LL(1) parsable. 4) To show LL(2) can be LR(1): S -> Aa | Bb A -> d B -> d (Shift “d”; decide on the reduction based in whether the lookahead is an “a” or a “b”.) 4) Show LL(2) can fail to be LR(1): CS780(Prasad) L15LALR

18 Keep attributes on the stack.
Semantic Actions We can now illustrate how semantic actions are implemented for LR parsing. Keep attributes on the stack. On shift a, push attribute for a on stack. On reduce X ® a pop attributes for a compute attribute for X and push it on the stack CS780(Prasad) L15LALR

19 Performing Semantic Actions. Example
Recall the example from earlier lecture E ® T + E { E.val = T.val + E1.val } | T { E.val = T.val } T ® int * T1 { T.val = int.val * T1.val } | int { T.val = int.val } Consider the parsing of the string 3 * 5 + 8 CS780(Prasad) L15LALR

20 Performing Semantic Actions. Example
| int * int + int shift int3 | * int + int shift int3 * | int + int shift int3 * int5 | + int reduce T ® int int3 * T5 | + int reduce T ® int * T T15 | + int shift T15 + | int shift T15 + int8 | reduce T ® int T15 + T8 | reduce E ® T T15 + E8 | reduce E ® T + E E23 | accept CS780(Prasad) L15LALR

21 It is also possible to compute inherited attributes in an LR parser.
Notes The previous discussion shows how synthesized attributes are computed by LR parsers. It is also possible to compute inherited attributes in an LR parser. CS780(Prasad) L15LALR

22 Using Parser Generators
Most common parser generators are LALR(1). A parser generator constructs a LALR(1) table. And reports an error when a table entry is multiply defined: A shift and a reduce. Called shift/reduce conflict Multiple reduces. Called reduce/reduce conflict An ambiguous grammar will generate conflicts. What do we do in that case? CS780(Prasad) L15LALR

23 Shift/Reduce Conflicts
Typically due to ambiguities in the grammar. Classic example: the dangling else S ® if E then S | if E then S else S | OTHER Will have DFA state containing [S ® if E then S., else] [S ® if E then S. else S, x] if else follows, then we can shift or reduce Default (bison, CUP, etc.) is to shift Default behavior is as needed in this case. CS780(Prasad) L15LALR

24 More Shift/Reduce Conflicts
Consider the ambiguous grammar E ® E + E | E * E | int We will have the states containing [E ® E * . E, +] [E ® E * E., +] [E ® . E + E, +] ÞE [E ® E . + E, +] … … Again a shift/reduce conflict on input + We need to reduce (* binds more tightly that +) Recall solution: declare the precedence of * and + CS780(Prasad) L15LALR

25 In bison, declare precedence and associativity:
Bison Approach In bison, declare precedence and associativity: %left + %left * Precedence of a rule = that of its last terminal See bison manual for ways to override this default. Resolve shift/reduce conflict with a shift if: no precedence declared for either rule or terminal input terminal has higher precedence than the rule the precedences are the same and right associative CS780(Prasad) L15LALR

26 Using Precedence to Solve S/R Conflicts
Back to our example: [E ® E * . E, +] [E ®E * E., +] [E ® . E + E, +] ÞE [E ®E . + E, +] … … Will choose reduce because precedence of rule E ® E * E is higher than of terminal + CS780(Prasad) L15LALR

27 Using Associativity to Solve S/R Conflicts
Same grammar as before E ® E + E | E * E | int We will also have the states [E ® E + . E, +] [E ® E + E., +] [E ® . E + E, +] ÞE [E ® E . + E, +] … … Now we also have an S/R conflict on input + We choose reduce because E ® E + E and + have the same precedence and + is left-associative. CS780(Prasad) L15LALR

28 Using Precedence to Solve S/R Conflicts
Back to our dangling else example [S ® if E then S., else] [S ® if E then S. else S, x] Can eliminate conflict by declaring else with higher precedence than then. But this starts to look like “hacking the tables”. Best to avoid overuse of precedence declarations, or you’ll end with unexpected parse trees. CS780(Prasad) L15LALR

29 Reduce/Reduce Conflicts
Usually due to gross ambiguity in the grammar Example: a sequence of identifiers S ® e | id | id S There are two parse trees for the string id S ® id S ® id S ® id How does this confuse the parser? CS780(Prasad) L15LALR

30 More on Reduce/Reduce Conflicts
Consider the states [S ® id ., $] [S’ ® . S, $] [S ® id . S, $] [S ® ., $] Þid [S ® ., $] [S ® . id, $] [S ® . id, $] [S ® . id S, $] [S ® . id S, $] Reduce/reduce conflict on input $ S’ ® S ® id S’ ® S ® id S ® id Better rewrite the grammar: S ® e | id S CS780(Prasad) L15LALR

31 Strange Reduce/Reduce Conflicts
Consider the grammar S ® P R , NL ® N | N , NL P ® T | NL : T R ® T | N : T N ® id T ® id P - parameters specification R - result specification N - a parameter or result name T - a type name NL - a list of names S => i,j,k: int p:real, S => int real, CS780(Prasad) L15LALR

32 Strange Reduce/Reduce Conflicts
In P an id is a N when followed by , or : T when followed by id In R an id is a N when followed by : T when followed by , This is an LR(1) grammar. But it is not LALR(1). Why? For obscure reasons CS780(Prasad) L15LALR

33 A Few LR(1) States LALR reduce/reduce conflict on “,” P ® . T id
P ® . NL : T id NL ® . N : NL ® . N , NL : N ® . id : N ® . id , T ® . id id 1 T ® id id N ® id : N ® id , id 3 LALR reduce/reduce conflict on “,” T ® id id/, N ® id :/, LALR merge R ® . T , R ® . N : T , T ® . id , N ® . id : T ® id , N ® id : id 4 2 L15LALR

34 Two distinct states were confused because they have the same core.
What Happened? Two distinct states were confused because they have the same core. Fix: add dummy productions to distinguish the two confused states. E.g., add R ® id bogus bogus is a terminal not used by the lexer. This production will never be used during parsing. But it distinguishes R from P. CS780(Prasad) L15LALR

35 A Few LR(1) States After Fix
P ® . T id P ® . NL : T id NL ® . N : NL ® . N , NL : N ® . id : N ® . id , T ® . id id 1 T ® id id N ® id : N ® id , 3 id Different cores Þ no LALR merging T ® id , N ® id : R ® id . bogus , 4 R ® . T , R ® . N : T , R ® . id bogus , T ® . id , N ® . id : 2 id L15LALR

36 Notes on Parsing Parsing A solid foundation: context-free grammars
A simple parser: LL(1) A more powerful parser: LR(1) An efficiency hack: LALR(1) LALR(1) parser generators CS780(Prasad) L15LALR


Download ppt "LALR Parsing Adapted from Notes by Profs Aiken and Necula (UCB) and"

Similar presentations


Ads by Google