1/13 Parsing III Probabilistic Parsing and Conclusions
2/13 Probabilistic CFGs also known as Stochastic Grammars Date back to Booth (1969) Have grown in popularity with the growth of Corpus Linguistics
3/13 Probabilistic CFGs Essentially same as ordinary CFGS except that each rule has associated with it a probability S NP VP.80 S aux NP VP.15 S VP.05 NP det n.20 NP det adj n.35 NP n.20 NP adj n.15 NP pro.10 Notice that P for each set of rules sums to 1
4/13 Probabilistic CFGs Probabilities are used to calculate the probability of a given derivation –Defined as the product of the Ps of the rules used in the derivation Can be used to choose between competing derivations –As the parse progresses (so, can determine which rules to try first) as an efficiency measure –Or at the end, as a way of disambiguating, or expressing confidence in the results
5/13 Where do the probabilities come from? 1)Use a corpus of already parsed sentences: a “treebank” –Best known example is the Penn Treebank Marcus et al Available from Linguistic Data Consortium Based on Brown corpus + 1m words of Wall Street Journal + Switchboard corpus –Count all occurrences of each rule variation (e.g. NP) and divide by total number of NP rules –Very laborious, so of course is done automatically
6/13 Where do the probabilities come from? 2)Create your own treebank –Easy if all sentences are unambiguous: just count the (successful) rule applications –When there are ambiguities, rules which contribute to the ambiguity have to be counted separately and weighted
7/13 Where do the probabilities come from? 3)Learn them as you go along –Again, assumes some way of identifying the correct parse in case of ambiguity –Each time a rule is successfully used, its probability is adjusted –You have to start with some estimated probabilities, e.g. all equal –Does need human intervention, otherwise rules become self-fulfilling prophecies
8/13 Problems with PCFGs PCFGs assume that all rules are essentially independent –But, e.g. in English “NP pro” more likely when in subject position Difficult to incorporate lexical information –Pre-terminal rules can inherit important information from words which help to make choices higher up the parse, e.g. lexical choice can help determine PP attachment
9/13 Probabilistic Lexicalised CFGs One solution is to identify in each rule that one of the elements on the RHS (daughter) is more important: the “head” –This is quite intuitive, e.g. the n in an NP rule, though often controversial (from linguistic point of view) Head must be a lexical item Head value is percolated up the parse tree Added advantage is that PS tree has the feel of a dependency tree
10/13 the man shot an elephant NP detnv n NP VP S the man shot an elephant NP(man) detnv n NP(elephant) VP(shot) S(shot) shot man elephant the an
11/13 Dependency Parsing Not much different from PSG parsing Grammar rules still need to be stated as A B c –except that one daughter is identified as the head, e.g. A x h y –As structure is built, the trees are headed by “h” rather than “A” Can be probabilistic or not
12/13 Conclusion 1 Basic parsing approaches (without constraints) not practical in real applications Whatever approach taken, bear in mind that the lexicon is the real bottleneck There’s a real trade-off between coverage and efficiency, so it’s a good idea to sacrifice broad coverage (e.g. domain-specific parsers, controlled language), or use a scheme that minimizes the disadvantages (e.g. probabilistic parsing)
13/13 Conclusion 2 From computational perspective, a parser provides –a formalism for writing linguistic rules –an implementation which can apply the rules to an input text Also, as necessary –An interface to allow grammar development and testing (eg tracing rules, showing trees) –An interface with the application of which it is a part (may be hidden to the end-user) All of the above tailored to meet the needs