Download presentation
Presentation is loading. Please wait.
1
A Survey of Unsupervised Grammar Induction Baskaran Sankaran Senior Supervisor: Dr Anoop Sarkar School of Computing Science Simon Fraser University
2
Motivation Languages have hidden regularities karuppu naay puunaiyai thurathiyathu iruttil karuppu uruvam marainthathu naay thurathiya puunai vekamaaka ootiyathu 2
3
Motivation Languages have hidden regularities karuppu naay puunaiyai thurathiyathu iruttil karuppu uruvam marainthathu naay thurathiya puunai vekamaaka ootiyathu 3
4
FORMAL STRUCTURES 4
5
Phrase-Structure Sometimes the bribed became partners in the company 5
6
Phrase-Structure 6 Binarize, CNF Sparsity issue with words Use POS tags S ADVP@S RBNP VP VBD @VP NP PP DT VBN NNSIN NP INDT NN S ADVP @S @S NP VP VP VBD @VP @VP NP PP NP DT VBN NP DT NN NP NNS PP IN NP ADVP RB IN
7
Evaluation Metric-1 Unsupervised Induction ◦ Binarized output tree Possibly unlabelled Evaluation ◦ Gold treebank parse ◦ Recall - % of true constituents found ◦ Also precision and F-score Wall Street Journal (WSJ) dataset 7 S XX XX VBDX X X IN X VBNDT RB NNS NNDT
8
Dependency Structure 8 VBD VBN NNS VBN*DT VBD* IN IN* NN DT NN* RB Sometimes the NNS* the company bribed partners became in
9
Dependency Structure 9 VBDDTNNSNN INDT VBN RB Sometimes the bribed became partners in the company
10
Evaluation Metric-2 Unsupervised Induction ◦ Generates directed dependency arcs Compute (directed) attachment accuracy ◦ Gold dependencies ◦ WSJ10 dataset 10 VBDDTNNSNN INDT VBN RB Sometimes the bribed became partners in the company
11
Unsupervised Grammar Induction To learn the hidden structure of a language ◦ POS tag sequences as input ◦ Generates phrase-structure/ dependencies ◦ No attempt to find the meaning Overview ◦ Phrase-structure and dependency grammars ◦ Mostly on English (few on Chinese, German etc.) ◦ Learning restricted to shorter sentences ◦ Significantly lags behind the supervised methods 11
12
PHRASE-STRUCTURE INDUCTION 12
13
Toy Example Corpus the dog bites a mandog sleeps a dog bites a bonethe man sleeps Grammar S NP VPNP NN man VP V NPDet aN bone VP VDet theV sleeps NP Det NN dogV bites 13
14
EM for PCFG (Baker ’79; Lari and Young ’90) Inside-Outside ◦ EM instance for probabilistic CFG Generalization of Forward-backward for HMMs ◦ Non-terminals are fixed ◦ Estimate maximum likelihood rule probabilities 14 S NP VPV --> dog NP --> Det NDet --> man NP --> NN --> man VP --> VV --> man VP --> V NPDet --> bone VP --> NP VN --> bone Det --> theV --> bone N --> theDet --> bites V --> theN --> bites Det --> aV --> bites N --> aDet --> sleeps V --> aN --> sleeps Det --> dogV --> sleeps N --> dog S NP VP1.0V --> dog NP --> Det N0.875Det --> man NP --> N0.125N --> man0.375 VP --> V0.5V --> man VP --> V NP0.5Det --> bone VP --> NP VN --> bone0.125 Det --> the0.428571V --> bone N --> theDet --> bites V --> theN --> bites Det --> a0.571429V --> bites0.5 N --> aDet --> sleeps V --> aN --> sleeps0.5 Det --> dogV --> sleeps N --> dog0.5
15
Inside-Outside Sometimes the bribed became partners in the 15 company @S NP VP P(NP the bribed) P(@S NP VP) P(VP became … company) P(S Sometimes @S)
16
Constraining Search 16 Sometimes the bribed became partners in the company (Pereira and Schabes ’92; Schabes et al. ’93)
17
Constraining Search (Pereira and Schabes ’92; Schabes et al. ’93; Hwa ’99) Treebank bracketings ◦ Bracketing boundaries constrain induction What happens with limited supervision? ◦ More bracketed data exposed iteratively ◦ 0% bracketed data ◦ 100% bracketed data Right-branching baseline 17 Recall: 50.0 Recall: 78.0 Recall: 76.0
18
Distributional clustering (Adriaans et al. ’00; Clark ’00; van Zaanen ’00) Cluster the word sequences ◦ Context: adjacent words or boundaries ◦ Relative frequency distribution of contexts the black dog bites the man the man eats an apple Identifies constituents ◦ Evaluation on ATIS corpus 18 Recall: 35.6
19
Constituent-Context Model 19 (Klein and Manning ’02) Valid constituents in a tree should not cross S XX X X VBD X X DT VBN X X X DT NN RB NNS IN S XX XX VBDX X X IN X VBNDT RB NNS NNDT
20
Constituent-Context Model 20 Sometimes the bribed became partners in the company DT VBN RB VBD Recall Right-branch: 70.0 CCM: 81.6 S XX X X VBD X X DT VBN X X X DT NN RB NNS IN
21
DEPENDENCY INDUCTION 21
22
Dependency Model w/ Valence (Klein and Manning ’04) Simple generative model ◦ Choose head – P(Root) ◦ End – P(End | h, dir, v) Attachment dir (right, left) Valence (head outward) ◦ Argument – P(a | h, dir) 22 Dir Accuracy CCM: 23.8 DMV: 43.2 Joint: 47.5 VBDDTNNSNN INDT VBN RB Sometimes the bribed became partners in the company Sometimes the bribed became partners in the company Head – P(Root) Argument – P(a | h, dir) End – P(End | h, dir, v) Head – P(Root) Argument – P(a | h, dir) End – P(End | h, dir, v)
23
DMV Extensions (Headden et al. ’09; Blunsom and Cohn ’10) Extended Valence (EVG) ◦ Valence frames for the head Allows different distributions over arguments Lexicalization (L-EVG) Tree Substitution Grammar ◦ Tree fragments instead of CFG rules 23 Dir Acc: 68.8 Dir Acc: 65.0 VBDDTNNSNN INDT VBN RB Sometimes the bribed became partners in the company Dir Acc: 67.7
24
MULTILINGUAL SETTING 24
25
Bilingual Alignment & Parsing (Wu ’97) Inversion Transduction Grammar (ITG) ◦ Allows reordering 25 S X X e2f4e2f4 e1f3e1f3 e4f2e4f2 e3f1e3f1 e 1 e 2 e 3 e 4 f 1 f 2 f 3 f 4
26
Bilingual Parsing (Snyder et al. ’09) Bilingual Parsing ◦ PP Attachment ambiguity I saw (the student (from MIT) 1 ) 2 ◦ Not ambiguous in Urdu میں ( یمآئٹی سے ) 1 ( طالب علم ) 2 کو دیکھا I ((MIT of) student) saw 26
27
Summary & Overview 27 EM for PCFG Constrain with bracketing Distributional Clustering CCM DMV Contrastive Estimation EVG & L-EVG TSG + DMV Data-oriented Parsing Parametric Search Methods Structural Search Methods EM for PCFG Constrain with bracketing Contrastive Estimation Distributional Clustering CCM DMV EVG & L-EVG TSG + DMV Data-oriented Parsing State-of-the-art Phrase-structure (CCM + DMV) Recall: 88.0 Dependency (Lexicalized EVG) Dir Acc: 68.8 Prototype
28
QUESTIONS ? Thanks! 28
29
Motivation Languages have hidden regularities 29
30
Motivation Languages have hidden regularities ◦ The guy in China ◦ … new leader in China ◦ That’s what I am asking you … ◦ I am telling you … 30
31
Issues with EM (Carroll and Charniak ’92; Periera and Schabes ’92; de Marcken ’05) (Liang and Klein ’08; Spitkovsky et al. ’10) Phrase-structure ◦ Finds local maxima instead of global ◦ Multiple ordered adjuctions Both phrase-structure & dependency ◦ Disconnect between likelihood and optimal grammar 31
32
Constituent-Context Model (Klein and Manning ’02) CCM ◦ Only constituent identity ◦ Valid constituents in a tree should not cross 32
33
Bootstrap phrases (Haghighi and Klein ’06) Bootstrap with seed examples for constituents types ◦ Chosen from most frequent treebank phrases ◦ Induces labels for constituents Integrate with CCM ◦ CCM generates brackets (constituents) ◦ Proto labels them 33 Recall: 59.6 Recall: 68.4
34
Dependency Model w/ Valence (Klein and Manning ’04) Simple generative model ◦ Choose head; attachment dir (right, left) ◦ Valence (head outward) End of generation modelled separately 34 Dir Acc: 43.2 VBDDTNNSNN INDT VBN RB Sometimes the bribed became partners in the company
35
Learn from how not to speak Contrastive Estimation (Smith and Eisner ’05) ◦ Log-linear Model of dependency Features: f(q, T) P(Root); P(a | h, dir); P(End | h, dir, v) Conditional likelihood 35
36
Learn from how not to speak (Smith and Eisner ’05) Contrastive Estimation Ex. the brown cat vs. cat brown the ◦ Neighborhoods Transpose (Trans), delete & transpose (DelOrTrans) 36 Dir Acc: 48.8
37
DMV Extensions-1 (Cohen and Smith ’08, ’09) Tying parameters ◦ Correlated Topic Model (CTM) Correlation between different word types ◦ Two types of tying parameters Logistic Normal (LN) Shared LN 37 Dir Acc: 61.3
38
DMV Extensions-2 38 VBD VBN NNS VBN*DT VBD* IN IN* NN DT NN* RB Sometimes the NNS* the company bribed partners became in VBD VBN VBD became VBD NNSVBD NNS (Blunsom and Cohn ’10) NNS IN in NN
39
DMV Extensions-2 (Blunsom and Cohn ’10) Tree Substitution Grammar (TSG) ◦ Lexicalized trees ◦ Hierarchical prior Different levels of backoff 39 Dir Acc: 67.7 VBD VBN VBD became VBD NNSVBD NNS IN in NN
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.