Compiler Construction Sohail Aslam Lecture 24 compiler: Bottom-up Parsing
LR(1) Items An LR(1) item is a pair: [X → ab, a] X → ab is a production aT is lookahead symbol end of lec 23 compiler: Bottom-up Parsing
Building LR(1) Tables The model uses a set of LR(1) items to represent each parser state The model is called the canonical collection of set of LR(1) items
Building LR(1) Tables The model uses a set of LR(1) items to represent each parser state The model is called the canonical collection (CC) of set of LR(1) items
Canonical Collection Each set in CC represents a state in the eventual parser DFA The construction of CC begins by building a model of parser’s initial state
Canonical Collection Each set in CC represents a state in the eventual parser DFA The construction of CC begins by building a model of parser’s initial state
Canonical Collection The initial state consists of the set of LR(1) items that represent the parser’s initial state, along with any items that must also hold in the initial state.
Canonical Collection To simplify the task of building this initial state, the construction requires that the grammar have a unique goal symbol
Convention Add a new start symbol S to grammar and a production S → E Augmented grammar S → E E → E + (E) | int
Convention Add a new start symbol S to grammar and a production S → E Augmented grammar S → E E → E + (E) | int
The Closure Procedure The item [S → E, $] describes the parser’s initial state. It represents a configuration in which recognizing S followed by $ would be a valid parse
The Closure Procedure The item [S → E, $] describes the parser’s initial state. It represents a configuration in which recognizing S followed by $ would be a valid parse
The Closure Procedure This item, i.e., [S → E, $] becomes the core of the first state in CC, labelled I0.
The Closure Procedure If the grammar has several distinct productions for the start symbol, each of them generates an item in this initial core of I0 The procedure closure does this.
The Closure Procedure If the grammar has several distinct productions for the start symbol, each of them generates an item in this initial core of I0 The procedure closure does this.
closure(s) = repeat for each [X → aYb, a] s for each production Y → g for each b FIRST(ba) s ← s [Y → g, b] until s is unchanged
closure(s) = repeat for each [X → aYb, a] s for each production Y → g for each b FIRST(ba) s ← s [Y → g, b] until s is unchanged
I0 = closure({[S → E, $] }) s = {[S → E, $]} [X → aYb, a][S → E, $] X = S, a = e, Y = E b = e, a = $ Y → g E → E + (E) E → int
closure(s) = repeat for each [X → aYb, a] s for each production Y → g for each b FIRST(ba) s ← s [Y → g, b] until s is unchanged
FIRST(ba) = FIRST($) = $ s = { [S → E, $] } { [E → E + (E), $] } { [E → int, $] }
FIRST(ba) = FIRST($) = $ s = { [S → E, $] } { [E → E + (E), $] } { [E → int, $] }
FIRST(ba) = FIRST($) = $ s = { [S → E, $] } { [E → E + (E), $] } { [E → int, $] }
FIRST(ba) = FIRST($) = $ s = { [S → E, $] } { [E → E + (E), $] } { [E → int, $] }
s = { [S → E, $] , [E → E + (E), $] , [E → int, $] }
closure(s) = repeat for each [X → aYb, a] s for each production Y → g for each b FIRST(ba) s ← s [Y → g, b] until s is unchanged
[S → E, $] already processed [X → aYb,a][E→E+(E),$] X = E, a = e, Y = E b = +(E), a = $ Y → g E → E + (E) E → int
FIRST(ba) = FIRST(+(E)$) = + s = s { [E → E+(E), +] } { [E → int, +] }
FIRST(ba) = FIRST(+(E)$) = + s = s { [E → E+(E), +] } { [E → int, +] }