Download presentation
Presentation is loading. Please wait.
1
2/20/2008Prof. Hilfinger CS164 Lecture 121 Earley’s Algorithm: General Context-Free Parsing Lecture 12 P. N. Hilfinger
2
2/20/2008Prof. Hilfinger CS164 Lecture 122 Parsing General Context-Free Grammars Shift-reduce parsing can work for most practical applications. However, one must sometimes munge the grammar, though not as much as LL(1). Cannot handle ambiguity, nor situations where resolving ambiguities requires looking far ahead. Today, we’ll look at a method that can: Earley’s Algorithm. In fact, shift-reduce parsing is a highly optimized special case of this algorithm.
3
2/20/2008Prof. Hilfinger CS164 Lecture 123 Earley’s Algorithm: Basic Idea Scan tokens left-to-right. At each point, keep track of all possible subtrees that could include the current point in the input, based on everthing seen so far. At the end of the input, if there is a tree that is rooted at the start symbol, we’ve found a parse (possibly many).
4
2/20/2008Prof. Hilfinger CS164 Lecture 124 Some Notation If input is s=s 1 s 2 …s n then “position k’’ in the input is just after s k and before s k+1, with position 0 at the beginning and position n at the end. At each input position, k, compute a set of items, where each item has the form A , m where A is a production and 0≤m≤k. Together, the items in the set describe all subtrees of possible parse trees that begin or end at position k or have a child that does.
5
2/20/2008Prof. Hilfinger CS164 Lecture 125 Meaning of an Item An item A , m at position k means: 1.The input between positions m and k matches . 2.Depending on what s k+1 …s n is, there might be a subtree formed from production A in the (or a) parse tree for the entire string. 3.So when is empty, means that there is a possible handle for A that ends at k. So that leaves the problem of figuring out what items to put in each set.
6
2/20/2008Prof. Hilfinger CS164 Lecture 126 Example Grammar: E E + T E T T T * int T int Input: 0 int 1 + 2 int 3 * 4 int 5 At position 0, we expect to see an E to our right, formed from one of E’s productions. Plus, since an E can start with a T, we won’t be surprised by a T formed from one of its productions.
7
2/20/2008Prof. Hilfinger CS164 Lecture 127 Example: Getting Started E T, 0 E E + T, 0 int 0 1 and (since E can start with T), also add items for T + T int, 0 T T * int, 0 Start with items for start symbol E
8
2/20/2008Prof. Hilfinger CS164 Lecture 128 Closure Items Whenever we have an item B A , j in item set m, it indicates that a substring producing A might start at this position. That’s what the item A , m means, so we also add those items (for each production A ) to item set m. These are called closure items. Other items are kernel items.
9
2/20/2008Prof. Hilfinger CS164 Lecture 129 Example: Computing next item set E T, 0 E E + T, 0 T int, 0 T T * int, 0 int 0 1 T int , 0 T T * int, 0 E T , 0 E E + T, 0 +
10
2/20/2008Prof. Hilfinger CS164 Lecture 1210 Computing next item set For each item of the form A c , k in item set m, where c=s m+1 is the next input symbol, insert A c , k in item set m+1. For each complete item, A , k in item set m+1, and each item B A , j back in item set k, add item B A , j to item set m+1. (When creating a parse tree, the A in this new item will have have children , as denoted by dashed red arrows in our examples).
11
2/20/2008Prof. Hilfinger CS164 Lecture 1211 Continuing the Example, Set 2 T int , 0 T T * int, 0 E T , 0 E E + T, 0 1 + 2 E E + T, 0 T T * int, 2 T int, 2 closure items int
12
2/20/2008Prof. Hilfinger CS164 Lecture 1212 Continuing the Example, Set 3 2 E E + T, 0 T T * int, 2 T int, 2 int T int , 2 T T * int, 2 E E + T , 0 3 * E E + T, 0 from item set 0
13
2/20/2008Prof. Hilfinger CS164 Lecture 1213 Continuing the Example, Sets 4 & 5 T int , 2 T T * int, 2 E E + T , 0 3 * E E + T, 0 T T * int, 2 4 T T * int , 2 5 int T T * int, 2 E E + T , 0 E E + T, 0 ACCEPT!
14
2/20/2008Prof. Hilfinger CS164 Lecture 1214 Accepting the String In the last item set, have a completed item for the start symbol that started in set 0. That means “the input between 0 and end matches an entire production for the start symbol,” so the string parses correctly.
15
2/20/2008Prof. Hilfinger CS164 Lecture 1215 Retrieving a Parse Tree or Derivation Start with a completed item in the last set that produces the whole input (has form S …,0 for start symbol S). Follow the red arrows to find how to expand that symbol. Work backwards through the sets to find the expansions of the other nonterminals.
16
2/20/2008Prof. Hilfinger CS164 Lecture 1216 Getting a Tree from our Example (I) T T * int , 2 5 int T T * int, 2 E E + T , 0 E E + T, 0 start here E E + T T * int To find out how to expand this T, go back to chart 3 (before * int)
17
2/20/2008Prof. Hilfinger CS164 Lecture 1217 Getting a Tree from our Example (II) int T int , 2 T T * int, 2 E E + T , 0 3 E E + T, 0 E E + T T * int int To find out how to expand this E, go back to chart 1 (before +)
18
2/20/2008Prof. Hilfinger CS164 Lecture 1218 Figuring out Where to Look In the last slide, we had to figure out where to look for the derivation of the E in E + T We used the items T T * int, 2 and T int , 2 to get the T in E + T, both of which tell us that the T started after item set #2. And since + is a terminal, we then have to go back one more.
19
2/20/2008Prof. Hilfinger CS164 Lecture 1219 Getting a Tree from our Example (III) E E T T * intint 1 T int , 0 T T * int, 0 E T , 0 E E + T, 0 start here T +int
20
2/20/2008Prof. Hilfinger CS164 Lecture 1220 An Ambiguous Grammar (I) Grammar: E E + E E E * E E int Input: 0 int 1 + 2 int 3 * 4 int 5 E int, 0 E E + E, 0 E E * E, 0 E int , 0 E E + E, 0 E E * E, 0 0 int 1
21
2/20/2008Prof. Hilfinger CS164 Lecture 1221 An Ambiguous Grammar (II) E int , 0 E E + E, 0 E E * E, 0 1 + 2 int 3 E E + E, 0 E int, 2 E E + E, 2 E E * E, 2 E int , 2 E E + E, 2 E E * E, 2 E E + E , 0 E E + E, 0 E E * E, 0
22
2/20/2008Prof. Hilfinger CS164 Lecture 1222 An Ambiguous Grammar (III) 3 * 4 int 5 E int , 2 E E + E, 2 E E * E, 2 E E + E , 0 E E + E, 0 E E * E, 0 E E * E, 2 E E * E, 0 E int, 4 E E + E, 4 E E * E, 4 E int , 4 E E * E , 2 E E * E , 0 E E + E, 4 E E * E, 4 E E + E , 0 There are two ways to produce the E starting at 0, reflecting ambiguity.
23
2/20/2008Prof. Hilfinger CS164 Lecture 1223 Just for Fun… E E E E Grammar is ferociously ambiguous: produces an infinite number of ways! E , 0 E E E, 0 E E E, 0 E E E , 0 ! ! ! 0
24
2/20/2008Prof. Hilfinger CS164 Lecture 1224 Relationship to LR Shift-Reduce Parsing With an LR(1) grammar, never have item sets where two items have the same production, with the dot in the same place, but different starting positions. So, ignoring the starting positions, there is a finite number of possible item sets. These are the states in the shift-reduce parser.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.