Download presentation
Presentation is loading. Please wait.
Published byVivian Simpson Modified over 9 years ago
1
Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1
2
Outline Definition and motivation behind unsupervised grammar learning Non-parametric Bayesian statistics Adaptor grammars vs. PCFG A short introduction to Chinese Restaurant Process Applications of Adaptor grammar 2
3
Unsupervised Learning How many categories of objects? How many features does an object have? How many words and rules are in a language? 3
4
Grammar Induction Goal: – study how a grammar and parses can be learnt from terminal strings alone Motivation: – Help us understand human language acquisition – Inducing parsers for low-resource languages 4
5
Nonparametric Bayesian statistics Learning the things people learn requires using rich, unbounded hypothesis spaces Language learning is non-parametric inference, no (obvious) bound on number of words, grammatical, morphemes. Use stochastic processes to define priors on infinite hypothesis spaces 5
6
Nonparametric Bayesian statistics 6
7
Is PCFG good enough for our purpose? PCFG can be learnt through Bayesian framework but … Set of rules is fixed in standard PCFG estimation PCFG rules are “too small” to be effective units of generalization How can we solve this problem? 7
8
1.let the set of non-terminals grow unboundedly: – Start with un-lexicalized short grammar – Split-Join of non-terminals 1.let the set of rules grow unboundedly: – Generate new rules when ever you need – Learn sub-trees and their probabilities ( Bigger units of generalization) Two Non-parametric Bayesian extensions to PCFGs 8
9
Adaptive Grammar CFG rules is used to generate the trees as in a CFG We have two types of non-terminals: – Un-adapted (normal) non-terminals Picking a rule and recursive expanding its children as in PCFG – Adapted non-terminals Picking a rule and recursive expanding its children Generating a previously generated tree (proportional to number of times that it is already generated) 9
10
The Story of Adaptor Grammars 10
11
Properties of Adaptor grammars In Adaptor grammars: – The probability of adapted sub-trees are learnt separately, not just product of probability of rules. – “Rich get richer” (Zipf distribution) – Useful compound structures are more probable than their parts. – there is no recursion amongst adapted non- terminals (an adapted non-terminal never expands to itself) 11
12
12 The Chinese Restaurant Process
13
n customers walk into a restaurant, choose tables z i with probability Defines an exchangeable distribution over seating arrangements (inc. counts on tables) 13
14
CRP 14
15
CRP 15
16
CRP 16
17
CRP 17
18
CRP 18
19
Application of Adaptor grammars No usage for parsing! Because grammar induction is hard. 1.Word Segmentation 2.Learning concatenative morphology 3.Learning the structure of NE NPs 4.Topic Modeling 19
20
Unsupervised Word Segmentation Input: phoneme sequences with sentence boundaries Task: identify words 20
21
Word segmentation with PCFG 21
22
Unigram word segmentation 22
23
Collocation word segmentation 23
24
Performance GeneralizationAccuracy Unigram56% + collocations76% + syllable structure87% Evaluated on Brent corpus 24
25
Morphology Input: raw text Task: identify stems and morphemes and decompose a word to its morphological components Adaptor grammars can just be applied for simple concatenative morphology. 25
26
CFG for morphological analysis 26
27
Adaptor grammar for morphological analysis Generated Words: 1. cats 2. dogs 3. cats 27
28
Performance For more sophisticated model: 116,129 tokens: 70% correctly segmented 7,170 verb types: 66% correctly segmented 28
29
Inference distribution of adapted trees are exchangeable : Gibbs sampling Variational Inference method is also provided for learning adaptor grammars. Covering this part is beyond the objectives of this talk. 29
30
Conclusion We are interested in inducing grammars without supervision for two reasons: – Language acquisition – Low-resource languages PCFG rules are too much small for bigger generalization Learning the things people learn requires using rich, unbounded hypothesis spaces Adaptor grammars using CRP to learn rules from this unbounded hypothesis spaces 30
31
References Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models, M. Johnson et al., ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 2007 Using adaptor grammars to identify synergies in the unsupervised acquisition of linguistic structure, Mark Johnson, ACL-08, HLT, 2008 Inferring Structure from Data, Tom Griffith, Machine Learning summer school, Sardinia, 2010 31
32
Thank you for your attention! 32
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.