Parsing context-free grammars Context-free grammars specify structure, not process. There are many different ways to parse input in accordance with a given.

Parsing context-free grammars Context-free grammars specify structure, not process. There are many different ways to parse input in accordance with a given context-free grammar. We will review – a top-down parsing algorithm – a bottom-up parsing algorithm We will present the Earley algorithm

Bottom-up parsing Yngve (1955) presented a bottom-up algorithm Example (figure 10.4): Book that flight.

Book is ambiguous – there are two possible POS tags for the word “Book”. NounDetNounVerbDetNoun BookthatflightBookthatflight Look up words in lexicon

NOMNOMNOM NounDetNounVerbDetNoun BookthatflightBookthatflight Build structure from bottom up

Now we have three possible structures:NP NOMNOM VPNOMNOM NounDetNounVerbDetNounVerbDetNoun BookthatflightBookthatflightBookthatflight Build structure from bottom up

The Noun interpretation of Book leads to a dead end, so only two parse trees survive: VPNP VPNOMNOM VerbDetNounVerbDetNoun BookthatflightBookthatflight Build structure from bottom up

There is way to combine a VP and an NP to form an S, so only one parse tree survives: S VP NP NOM VerbDetNoun Bookthatflight Build structure from bottom up

When parsing top-down, we start with the grammar’s start symbol and apply productions to try to match input: S Bookthatflight Build structure from top down

Here we show only the successful choices: S VP Bookthatflight Build structure from top down

Here we show only the successful choices: S VP NP Verb Bookthatflight Build structure from top down

Here we show only the successful choices: S VP NP NOM VerbDet Bookthatflight Build structure from top down

Here we show only the successful choices: S VP NP NOM VerbDetNoun Bookthatflight Build structure from top down

Top-down versus bottom-up approaches Top-down advantages – Doesn’t explore trees which cannot be S – Subtrees fit under S Top-down disadvantages – Many fruitless trees are explored: trees explored may have no hope of matching input Bottom-up advantages – All trees explored are consistent with input Bottom-up disadvantages – Builds structure even if S cannot be formed – Builds neighboring structures which can never combine

Approaches to dealing with ambiguity parallel exploration depth-first strategy with backtracking

Improving top-down parsing Make top-down parser pay attention to input with bottom-up filtering (left-corner parsing) “The parser should not consider any grammar rule if he current input cannot serve as the first word along the left edge of some derivation from this rule.” [pg. 369] Left corners are pre-compiled.

Problems with top-down parsers left-recursion X  *  X    *  Infinite loop in derivation! ambiguity not efficiently handled recomputation subtrees can be built multiple times (built, then thrown away during backtracking)

Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming involves storing of results so they don’t ever need to be recomputed. Dynamic programming reduces exponential time requirement to polynomial time requirement: O(N 3 ), where N is length of input in words.

Data structure Earley’s algorithm uses a data structure called a chart to store information about the progress of the parse. A chart contains an entry for each position in the input A position occurs before the first word, between words, and after the last word.  word1  word2  …  wordN  A position is represented by a number; positions in the input are numbered from 0 (at the left) to N (at the right).

Chart details A chart entry consists of a sequence of states. A state represents – a subtree corresponding to a single grammar rule – information about how much of a rule has been processed – information about the span of the subtree w.r.t. the input A state is represented by an annotated grammar rule – a dot (  ) is used to show how much of the rule has been processed – a pair of positions, [x,y], indicates the span of the subtree w.r.t. the input; x is the position of the left edge of the subtree, and y is the position of the dot.

Three operators on a chart Predictor – applies when NonTerminal to right of  in a state is not a POS category (i.e. is not a pre-terminal) – adds states to current chart entry Scanner – applies when NonTerminal to right of  in a state is a POS category (i.e. is a pre-terminal) – adds states to next chart entry Completer – applies when there is no NonTermial (and hence no Terminal) to right of  in a state (i.e.  is at end) – adds states to current chart entry

Predictor Suppose rule to which Predicator applies is: X    NT  [x,y] Predictor adds, to the current chart entry, a new state for each possible expansion of NT For each expansion EX of NT, state added is NT   EX [y,y]

Scanner Suppose rule to which Scanner applies is: X    POS  [x,y] Scanner adds, to the next chart entry, a new state for each possible expansion of POS The new state added is X   POS   [x,y+1]

Completer Suppose rule to which Completer applies is: X    [x,y] Completer adds, to the current chart entry, a new state for each possible reduction using the (now completed) state For each state (from any earlier chart entry) of the form Y    X  [w,x] a new state of the following form is added Y   X   [w,y]

Completer (modification) In order to recover parse tree information from the chart once parsing is complete, we need to modify the completer slightly. Each state in the chart must be given a unique identifier (  N for state N) Each time the completer adds a state, it also adds the unique identifier of the state completed to the list of previous states for that new state (which is a copy of an already existing state, waiting for the category which the current state just completed).

Initial state of chart chart[0]chart[1]chart[2]chart[3]  0 :    S

Example (from text) (work through on board)

Parsing context-free grammars Context-free grammars specify structure, not process. There are many different ways to parse input in accordance with a given.

Similar presentations

Presentation on theme: "Parsing context-free grammars Context-free grammars specify structure, not process. There are many different ways to parse input in accordance with a given."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parsing context-free grammars Context-free grammars specify structure, not process. There are many different ways to parse input in accordance with a given.

Similar presentations

Presentation on theme: "Parsing context-free grammars Context-free grammars specify structure, not process. There are many different ways to parse input in accordance with a given."— Presentation transcript:

Similar presentations

About project

Feedback