Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Survey of Unsupervised Grammar Induction Baskaran Sankaran Senior Supervisor: Dr Anoop Sarkar School of Computing Science Simon Fraser University.

Similar presentations


Presentation on theme: "A Survey of Unsupervised Grammar Induction Baskaran Sankaran Senior Supervisor: Dr Anoop Sarkar School of Computing Science Simon Fraser University."— Presentation transcript:

1 A Survey of Unsupervised Grammar Induction Baskaran Sankaran Senior Supervisor: Dr Anoop Sarkar School of Computing Science Simon Fraser University

2 Motivation Languages have hidden regularities karuppu naay puunaiyai thurathiyathu iruttil karuppu uruvam marainthathu naay thurathiya puunai vekamaaka ootiyathu 2

3 Motivation Languages have hidden regularities karuppu naay puunaiyai thurathiyathu iruttil karuppu uruvam marainthathu naay thurathiya puunai vekamaaka ootiyathu 3

4 FORMAL STRUCTURES 4

5 Phrase-Structure Sometimes the bribed became partners in the company 5

6 Phrase-Structure 6 Binarize, CNF Sparsity issue with words Use POS tags S ADVP@S RBNP VP VBD @VP NP PP DT VBN NNSIN NP INDT NN S  ADVP @S @S  NP VP VP  VBD @VP @VP  NP PP NP  DT VBN NP  DT NN NP  NNS PP  IN NP ADVP  RB IN 

7 Evaluation Metric-1 Unsupervised Induction ◦ Binarized output tree  Possibly unlabelled Evaluation ◦ Gold treebank parse ◦ Recall - % of true constituents found ◦ Also precision and F-score Wall Street Journal (WSJ) dataset 7 S XX XX VBDX X X IN X VBNDT RB NNS NNDT

8 Dependency Structure 8 VBD VBN NNS VBN*DT VBD* IN IN* NN DT NN* RB Sometimes the NNS* the company bribed partners became in

9 Dependency Structure 9 VBDDTNNSNN INDT VBN RB Sometimes the bribed became partners in the company

10 Evaluation Metric-2 Unsupervised Induction ◦ Generates directed dependency arcs Compute (directed) attachment accuracy ◦ Gold dependencies ◦ WSJ10 dataset 10 VBDDTNNSNN INDT VBN RB Sometimes the bribed became partners in the company

11 Unsupervised Grammar Induction To learn the hidden structure of a language ◦ POS tag sequences as input ◦ Generates phrase-structure/ dependencies ◦ No attempt to find the meaning Overview ◦ Phrase-structure and dependency grammars ◦ Mostly on English (few on Chinese, German etc.) ◦ Learning restricted to shorter sentences ◦ Significantly lags behind the supervised methods 11

12 PHRASE-STRUCTURE INDUCTION 12

13 Toy Example Corpus the dog bites a mandog sleeps a dog bites a bonethe man sleeps Grammar S  NP VPNP  NN  man VP  V NPDet  aN  bone VP  VDet  theV  sleeps NP  Det NN  dogV  bites 13

14 EM for PCFG (Baker ’79; Lari and Young ’90) Inside-Outside ◦ EM instance for probabilistic CFG  Generalization of Forward-backward for HMMs ◦ Non-terminals are fixed ◦ Estimate maximum likelihood rule probabilities 14 S  NP VPV --> dog NP --> Det NDet --> man NP --> NN --> man VP --> VV --> man VP --> V NPDet --> bone VP --> NP VN --> bone Det --> theV --> bone N --> theDet --> bites V --> theN --> bites Det --> aV --> bites N --> aDet --> sleeps V --> aN --> sleeps Det --> dogV --> sleeps N --> dog S  NP VP1.0V --> dog NP --> Det N0.875Det --> man NP --> N0.125N --> man0.375 VP --> V0.5V --> man VP --> V NP0.5Det --> bone VP --> NP VN --> bone0.125 Det --> the0.428571V --> bone N --> theDet --> bites V --> theN --> bites Det --> a0.571429V --> bites0.5 N --> aDet --> sleeps V --> aN --> sleeps0.5 Det --> dogV --> sleeps N --> dog0.5

15 Inside-Outside Sometimes the bribed became partners in the 15 company @S  NP VP P(NP  the bribed) P(@S  NP VP) P(VP  became … company) P(S  Sometimes @S)

16 Constraining Search 16 Sometimes the bribed became partners in the company (Pereira and Schabes ’92; Schabes et al. ’93)

17 Constraining Search (Pereira and Schabes ’92; Schabes et al. ’93; Hwa ’99) Treebank bracketings ◦ Bracketing boundaries constrain induction What happens with limited supervision? ◦ More bracketed data exposed iteratively ◦ 0% bracketed data ◦ 100% bracketed data Right-branching baseline 17 Recall: 50.0 Recall: 78.0 Recall: 76.0

18 Distributional clustering (Adriaans et al. ’00; Clark ’00; van Zaanen ’00) Cluster the word sequences ◦ Context: adjacent words or boundaries ◦ Relative frequency distribution of contexts the black dog bites the man the man eats an apple Identifies constituents ◦ Evaluation on ATIS corpus 18 Recall: 35.6

19 Constituent-Context Model 19 (Klein and Manning ’02) Valid constituents in a tree should not cross S XX X X VBD X X DT VBN X X X DT NN RB NNS IN S XX XX VBDX X X IN X VBNDT RB NNS NNDT

20 Constituent-Context Model 20 Sometimes the bribed became partners in the company DT VBN RB VBD Recall Right-branch: 70.0 CCM: 81.6 S XX X X VBD X X DT VBN X X X DT NN RB NNS IN

21 DEPENDENCY INDUCTION 21

22 Dependency Model w/ Valence (Klein and Manning ’04) Simple generative model ◦ Choose head – P(Root) ◦ End – P(End | h, dir, v)  Attachment dir (right, left)  Valence (head outward) ◦ Argument – P(a | h, dir) 22 Dir Accuracy CCM: 23.8 DMV: 43.2 Joint: 47.5 VBDDTNNSNN INDT VBN RB Sometimes the bribed became partners in the company Sometimes the bribed became partners in the company Head – P(Root) Argument – P(a | h, dir) End – P(End | h, dir, v) Head – P(Root) Argument – P(a | h, dir) End – P(End | h, dir, v)

23 DMV Extensions (Headden et al. ’09; Blunsom and Cohn ’10) Extended Valence (EVG) ◦ Valence frames for the head  Allows different distributions over arguments Lexicalization (L-EVG) Tree Substitution Grammar ◦ Tree fragments instead of CFG rules 23 Dir Acc: 68.8 Dir Acc: 65.0 VBDDTNNSNN INDT VBN RB Sometimes the bribed became partners in the company Dir Acc: 67.7

24 MULTILINGUAL SETTING 24

25 Bilingual Alignment & Parsing (Wu ’97) Inversion Transduction Grammar (ITG) ◦ Allows reordering 25 S X X e2f4e2f4 e1f3e1f3 e4f2e4f2 e3f1e3f1 e 1 e 2 e 3 e 4 f 1 f 2 f 3 f 4

26 Bilingual Parsing (Snyder et al. ’09) Bilingual Parsing ◦ PP Attachment ambiguity I saw (the student (from MIT) 1 ) 2 ◦ Not ambiguous in Urdu میں ( یمآئٹی سے ) 1 ( طالب علم ) 2 کو دیکھا I ((MIT of) student) saw 26

27 Summary & Overview 27 EM for PCFG Constrain with bracketing Distributional Clustering CCM DMV Contrastive Estimation EVG & L-EVG TSG + DMV Data-oriented Parsing Parametric Search Methods Structural Search Methods EM for PCFG Constrain with bracketing Contrastive Estimation Distributional Clustering CCM DMV EVG & L-EVG TSG + DMV Data-oriented Parsing State-of-the-art Phrase-structure (CCM + DMV) Recall: 88.0 Dependency (Lexicalized EVG) Dir Acc: 68.8 Prototype

28 QUESTIONS ? Thanks! 28

29 Motivation Languages have hidden regularities 29

30 Motivation Languages have hidden regularities ◦ The guy in China ◦ … new leader in China ◦ That’s what I am asking you … ◦ I am telling you … 30

31 Issues with EM (Carroll and Charniak ’92; Periera and Schabes ’92; de Marcken ’05) (Liang and Klein ’08; Spitkovsky et al. ’10) Phrase-structure ◦ Finds local maxima instead of global ◦ Multiple ordered adjuctions Both phrase-structure & dependency ◦ Disconnect between likelihood and optimal grammar 31

32 Constituent-Context Model (Klein and Manning ’02) CCM ◦ Only constituent identity ◦ Valid constituents in a tree should not cross 32

33 Bootstrap phrases (Haghighi and Klein ’06) Bootstrap with seed examples for constituents types ◦ Chosen from most frequent treebank phrases ◦ Induces labels for constituents Integrate with CCM ◦ CCM generates brackets (constituents) ◦ Proto labels them 33 Recall: 59.6 Recall: 68.4

34 Dependency Model w/ Valence (Klein and Manning ’04) Simple generative model ◦ Choose head; attachment dir (right, left) ◦ Valence (head outward)  End of generation modelled separately 34 Dir Acc: 43.2 VBDDTNNSNN INDT VBN RB Sometimes the bribed became partners in the company

35 Learn from how not to speak Contrastive Estimation (Smith and Eisner ’05) ◦ Log-linear Model of dependency  Features: f(q, T)  P(Root); P(a | h, dir); P(End | h, dir, v)  Conditional likelihood 35

36 Learn from how not to speak (Smith and Eisner ’05) Contrastive Estimation  Ex. the brown cat vs. cat brown the ◦ Neighborhoods  Transpose (Trans), delete & transpose (DelOrTrans) 36 Dir Acc: 48.8

37 DMV Extensions-1 (Cohen and Smith ’08, ’09) Tying parameters ◦ Correlated Topic Model (CTM)  Correlation between different word types ◦ Two types of tying parameters  Logistic Normal (LN)  Shared LN 37 Dir Acc: 61.3

38 DMV Extensions-2 38 VBD VBN NNS VBN*DT VBD* IN IN* NN DT NN* RB Sometimes the NNS* the company bribed partners became in VBD VBN VBD became VBD NNSVBD NNS (Blunsom and Cohn ’10) NNS IN in NN

39 DMV Extensions-2 (Blunsom and Cohn ’10) Tree Substitution Grammar (TSG) ◦ Lexicalized trees ◦ Hierarchical prior  Different levels of backoff 39 Dir Acc: 67.7 VBD VBN VBD became VBD NNSVBD NNS IN in NN


Download ppt "A Survey of Unsupervised Grammar Induction Baskaran Sankaran Senior Supervisor: Dr Anoop Sarkar School of Computing Science Simon Fraser University."

Similar presentations


Ads by Google