May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing
May 2006CLINT-LN Parsing2 Grammar versus Parsing A grammar is a description of a language. A grammar abstractly associates structures with all and only the strings of the grammar. A parser is an implementation of an algorithm that actually discovers the structures assigned by a grammar to a sentence. Typically there may be several different parsing algorithms for achieving this. Top down strategy Bottom up strategy
May 2006CLINT-LN Parsing3 Parse Tree A valid parse tree for a grammar G is a tree –whose root is the start symbol for G –whose interior nodes are nonterminals of G –whose children of a node T (from left to right) correspond to the symbols on the right hand side of some production for T in G. –whose leaf nodes are terminal symbols of G. Every sentence generated by a grammar has a corresponding parse tree Every valid parse tree exactly covers a sentence generated by the grammar
May 2006CLINT-LN Parsing4 Parsing Problem Given grammar G and sentence A find all valid parse trees for G that exactly cover A S VP NP V Det Nom N book that flight
May 2006CLINT-LN Parsing5 Top Down Top down parser tries to build from the root node S down to the leaves by replacing nodes with non-terminal labels with RHS of corresponding grammar rules. Nodes with pre-terminal (word class) labels are compared to input words.
May 2006CLINT-LN Parsing6 Top Down Search Space Start node → Goal node ↓
May 2006CLINT-LN Parsing7 Bottom Up Each state is a forest of trees. Start node is a forest of nodes labelled with pre-terminal categories (word classes derived from lexicon) Transformations look for places where RHS of rules can fit. Any such place is replaced with a node labelled with LHS of rule.
May 2006CLINT-LN Parsing8 Bottom Up Search Space fl
May 2006CLINT-LN Parsing9 Top Down vs Bottom Up General Top down –For: Never wastes time exploring trees that cannot be derived from S –Against: Can generate trees that are not consistent with the input Bottom up –For: Never wastes time building trees that cannot lead to input text segments. –Against: Can generate subtrees that can never lead to an S node.
May 2006CLINT-LN Parsing10 Development of a Concrete Strategy Try to combine best features of both top down and bottom up strategies. Use the grammar to control the parsing process –Top down, grammar directed control. Wherever possible use the text to eliminate impossible hypotheses –Bottom up filtering.