A Survey of Unsupervised Grammar Induction Baskaran Sankaran Senior Supervisor: Dr Anoop Sarkar School of Computing Science Simon Fraser University.

Slides:

Advertisements

Similar presentations

Expectation Maximization Dekang Lin Department of Computing Science University of Alberta.

Advertisements

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)

Albert Gatt Corpora and Statistical Methods Lecture 11.

Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.

Prototype-Driven Grammar Induction Aria Haghighi and Dan Klein Computer Science Division University of California Berkeley.

Unsupervised Dependency Parsing David Mareček Institute of Formal and Applied Linguistics Charles University in Prague Doctoral thesis defense September.

Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance July 27 EMNLP 2011 Shay B. Cohen Dipanjan Das Noah A. Smith Carnegie Mellon University.

CS 388: Natural Language Processing: Statistical Parsing

Iowa State University Department of Computer Science, Iowa State University Artificial Intelligence Research Laboratory Center for Computational Intelligence,

Probabilistic Parsing Chapter 14, Part 2 This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney.

A Joint Model For Semantic Role Labeling Aria Haghighi, Kristina Toutanova, Christopher D. Manning Computer Science Department Stanford University.

10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :

1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.

Learning with Probabilistic Features for Improved Pipeline Models Razvan C. Bunescu Electrical Engineering and Computer Science Ohio University Athens,

March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

In Search of a More Probable Parse: Experiments with DOP* and the Penn Chinese Treebank Aaron Meyers Linguistics 490 Winter 2009.

Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.

PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.

Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein.

Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.

SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.

Statistical NLP Winter 2009 Lecture 17: Unsupervised Learning IV Tree-structured learning Roger Levy [thanks to Jason Eisner and Mike Frank for some slides]

Learning PCFGs: Estimating Parameters, Learning Grammar Rules Many slides are taken or adapted from slides by Dan Klein.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.

BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.

Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.

Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1.

1 Statistical Parsing Chapter 14 October 2012 Lecture #9.

GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.

SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa

Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz.

1 CS546: Machine Learning and Natural Language Latent-Variable Models for Structured Prediction Problems: Syntactic Parsing Slides / Figures from Slav.

Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.

Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.

11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.

Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.

Albert Gatt Corpora and Statistical Methods Lecture 11.

Coarse-to-Fine Efficient Viterbi Parsing Nathan Bodenstab OGI RPE Presentation May 8, 2006.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.

Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-

CSA2050 Introduction to Computational Linguistics Parsing I.

Evaluating Models of Computation and Storage in Human Sentence Processing Thang Luong CogACLL 2015 Tim J. O’Donnell & Noah D. Goodman.

NLP. Introduction to NLP The probabilities don’t depend on the specific words –E.g., give someone something (2 arguments) vs. see something (1 argument)

NLP. Introduction to NLP Background –From the early ‘90s –Developed at the University of Pennsylvania –(Marcus, Santorini, and Marcinkiewicz 1993) Size.

Exploiting Reducibility in Unsupervised Dependency Parsing David Mareček and Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University.

Probabilistic Context Free Grammars Grant Schindler 8803-MDM April 27, 2006.

CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.

1 Semi-Supervised Approaches for Learning to Parse Natural Languages Slides are from Rebecca Hwa, Ray Mooney.

DERIVATION S RULES USEDPROBABILITY P(s) = Σ j P(T,S) where t is a parse of s = Σ j P(T) P(T) – The probability of a tree T is the product.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

/02/20161 Probabilistic Context Free Grammars Chris Brew Ohio State University.

CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)

A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.

Prototype-Driven Grammar Induction Aria Haghighi and Dan Klein Computer Science Division University of California Berkeley.

Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,

Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.

Chapter 12: Probabilistic Parsing and Treebanks Heshaam Faili University of Tehran.

Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:

Natural Language Processing Vasile Rus

CSC 594 Topics in AI – Natural Language Processing

CS 388: Natural Language Processing: Statistical Parsing

Prototype-Driven Learning for Sequence Models

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27

David Kauchak CS159 – Spring 2019

Presentation transcript:

A Survey of Unsupervised Grammar Induction Baskaran Sankaran Senior Supervisor: Dr Anoop Sarkar School of Computing Science Simon Fraser University

Motivation Languages have hidden regularities karuppu naay puunaiyai thurathiyathu iruttil karuppu uruvam marainthathu naay thurathiya puunai vekamaaka ootiyathu 2

Motivation Languages have hidden regularities karuppu naay puunaiyai thurathiyathu iruttil karuppu uruvam marainthathu naay thurathiya puunai vekamaaka ootiyathu 3

FORMAL STRUCTURES 4

Phrase-Structure Sometimes the bribed became partners in the company 5

Phrase-Structure 6 Binarize, CNF Sparsity issue with words Use POS tags S RBNP VP NP PP DT VBN NNSIN NP INDT NN S  NP VP VP  NP PP NP  DT VBN NP  DT NN NP  NNS PP  IN NP ADVP  RB IN 

Evaluation Metric-1 Unsupervised Induction ◦ Binarized output tree  Possibly unlabelled Evaluation ◦ Gold treebank parse ◦ Recall - % of true constituents found ◦ Also precision and F-score Wall Street Journal (WSJ) dataset 7 S XX XX VBDX X X IN X VBNDT RB NNS NNDT

Dependency Structure 8 VBD VBN NNS VBN*DT VBD* IN IN* NN DT NN* RB Sometimes the NNS* the company bribed partners became in

Dependency Structure 9 VBDDTNNSNN INDT VBN RB Sometimes the bribed became partners in the company

Evaluation Metric-2 Unsupervised Induction ◦ Generates directed dependency arcs Compute (directed) attachment accuracy ◦ Gold dependencies ◦ WSJ10 dataset 10 VBDDTNNSNN INDT VBN RB Sometimes the bribed became partners in the company

Unsupervised Grammar Induction To learn the hidden structure of a language ◦ POS tag sequences as input ◦ Generates phrase-structure/ dependencies ◦ No attempt to find the meaning Overview ◦ Phrase-structure and dependency grammars ◦ Mostly on English (few on Chinese, German etc.) ◦ Learning restricted to shorter sentences ◦ Significantly lags behind the supervised methods 11

PHRASE-STRUCTURE INDUCTION 12

Toy Example Corpus the dog bites a mandog sleeps a dog bites a bonethe man sleeps Grammar S  NP VPNP  NN  man VP  V NPDet  aN  bone VP  VDet  theV  sleeps NP  Det NN  dogV  bites 13

EM for PCFG (Baker ’79; Lari and Young ’90) Inside-Outside ◦ EM instance for probabilistic CFG  Generalization of Forward-backward for HMMs ◦ Non-terminals are fixed ◦ Estimate maximum likelihood rule probabilities 14 S  NP VPV --> dog NP --> Det NDet --> man NP --> NN --> man VP --> VV --> man VP --> V NPDet --> bone VP --> NP VN --> bone Det --> theV --> bone N --> theDet --> bites V --> theN --> bites Det --> aV --> bites N --> aDet --> sleeps V --> aN --> sleeps Det --> dogV --> sleeps N --> dog S  NP VP1.0V --> dog NP --> Det N0.875Det --> man NP --> N0.125N --> man0.375 VP --> V0.5V --> man VP --> V NP0.5Det --> bone VP --> NP VN --> bone0.125 Det --> the V --> bone N --> theDet --> bites V --> theN --> bites Det --> a V --> bites0.5 N --> aDet --> sleeps V --> aN --> sleeps0.5 Det --> dogV --> sleeps N --> dog0.5

Inside-Outside Sometimes the bribed became partners in the 15  NP VP P(NP  the bribed)  NP VP) P(VP  became … company) P(S 

Constraining Search 16 Sometimes the bribed became partners in the company (Pereira and Schabes ’92; Schabes et al. ’93)

Constraining Search (Pereira and Schabes ’92; Schabes et al. ’93; Hwa ’99) Treebank bracketings ◦ Bracketing boundaries constrain induction What happens with limited supervision? ◦ More bracketed data exposed iteratively ◦ 0% bracketed data ◦ 100% bracketed data Right-branching baseline 17 Recall: 50.0 Recall: 78.0 Recall: 76.0

Distributional clustering (Adriaans et al. ’00; Clark ’00; van Zaanen ’00) Cluster the word sequences ◦ Context: adjacent words or boundaries ◦ Relative frequency distribution of contexts the black dog bites the man the man eats an apple Identifies constituents ◦ Evaluation on ATIS corpus 18 Recall: 35.6

Constituent-Context Model 19 (Klein and Manning ’02) Valid constituents in a tree should not cross S XX X X VBD X X DT VBN X X X DT NN RB NNS IN S XX XX VBDX X X IN X VBNDT RB NNS NNDT

Constituent-Context Model 20 Sometimes the bribed became partners in the company DT VBN RB VBD Recall Right-branch: 70.0 CCM: 81.6 S XX X X VBD X X DT VBN X X X DT NN RB NNS IN

DEPENDENCY INDUCTION 21

Dependency Model w/ Valence (Klein and Manning ’04) Simple generative model ◦ Choose head – P(Root) ◦ End – P(End | h, dir, v)  Attachment dir (right, left)  Valence (head outward) ◦ Argument – P(a | h, dir) 22 Dir Accuracy CCM: 23.8 DMV: 43.2 Joint: 47.5 VBDDTNNSNN INDT VBN RB Sometimes the bribed became partners in the company Sometimes the bribed became partners in the company Head – P(Root) Argument – P(a | h, dir) End – P(End | h, dir, v) Head – P(Root) Argument – P(a | h, dir) End – P(End | h, dir, v)

DMV Extensions (Headden et al. ’09; Blunsom and Cohn ’10) Extended Valence (EVG) ◦ Valence frames for the head  Allows different distributions over arguments Lexicalization (L-EVG) Tree Substitution Grammar ◦ Tree fragments instead of CFG rules 23 Dir Acc: 68.8 Dir Acc: 65.0 VBDDTNNSNN INDT VBN RB Sometimes the bribed became partners in the company Dir Acc: 67.7

MULTILINGUAL SETTING 24

Bilingual Alignment & Parsing (Wu ’97) Inversion Transduction Grammar (ITG) ◦ Allows reordering 25 S X X e2f4e2f4 e1f3e1f3 e4f2e4f2 e3f1e3f1 e 1 e 2 e 3 e 4 f 1 f 2 f 3 f 4

Bilingual Parsing (Snyder et al. ’09) Bilingual Parsing ◦ PP Attachment ambiguity I saw (the student (from MIT) 1 ) 2 ◦ Not ambiguous in Urdu میں ( یمآئٹی سے ) 1 ( طالب علم ) 2 کو دیکھا I ((MIT of) student) saw 26

Summary & Overview 27 EM for PCFG Constrain with bracketing Distributional Clustering CCM DMV Contrastive Estimation EVG & L-EVG TSG + DMV Data-oriented Parsing Parametric Search Methods Structural Search Methods EM for PCFG Constrain with bracketing Contrastive Estimation Distributional Clustering CCM DMV EVG & L-EVG TSG + DMV Data-oriented Parsing State-of-the-art Phrase-structure (CCM + DMV) Recall: 88.0 Dependency (Lexicalized EVG) Dir Acc: 68.8 Prototype

QUESTIONS ? Thanks! 28

Motivation Languages have hidden regularities 29

Motivation Languages have hidden regularities ◦ The guy in China ◦ … new leader in China ◦ That’s what I am asking you … ◦ I am telling you … 30

Issues with EM (Carroll and Charniak ’92; Periera and Schabes ’92; de Marcken ’05) (Liang and Klein ’08; Spitkovsky et al. ’10) Phrase-structure ◦ Finds local maxima instead of global ◦ Multiple ordered adjuctions Both phrase-structure & dependency ◦ Disconnect between likelihood and optimal grammar 31

Constituent-Context Model (Klein and Manning ’02) CCM ◦ Only constituent identity ◦ Valid constituents in a tree should not cross 32

Bootstrap phrases (Haghighi and Klein ’06) Bootstrap with seed examples for constituents types ◦ Chosen from most frequent treebank phrases ◦ Induces labels for constituents Integrate with CCM ◦ CCM generates brackets (constituents) ◦ Proto labels them 33 Recall: 59.6 Recall: 68.4

Dependency Model w/ Valence (Klein and Manning ’04) Simple generative model ◦ Choose head; attachment dir (right, left) ◦ Valence (head outward)  End of generation modelled separately 34 Dir Acc: 43.2 VBDDTNNSNN INDT VBN RB Sometimes the bribed became partners in the company

Learn from how not to speak Contrastive Estimation (Smith and Eisner ’05) ◦ Log-linear Model of dependency  Features: f(q, T)  P(Root); P(a | h, dir); P(End | h, dir, v)  Conditional likelihood 35

Learn from how not to speak (Smith and Eisner ’05) Contrastive Estimation  Ex. the brown cat vs. cat brown the ◦ Neighborhoods  Transpose (Trans), delete & transpose (DelOrTrans) 36 Dir Acc: 48.8

DMV Extensions-1 (Cohen and Smith ’08, ’09) Tying parameters ◦ Correlated Topic Model (CTM)  Correlation between different word types ◦ Two types of tying parameters  Logistic Normal (LN)  Shared LN 37 Dir Acc: 61.3

DMV Extensions-2 38 VBD VBN NNS VBN*DT VBD* IN IN* NN DT NN* RB Sometimes the NNS* the company bribed partners became in VBD VBN VBD became VBD NNSVBD NNS (Blunsom and Cohn ’10) NNS IN in NN

DMV Extensions-2 (Blunsom and Cohn ’10) Tree Substitution Grammar (TSG) ◦ Lexicalized trees ◦ Hierarchical prior  Different levels of backoff 39 Dir Acc: 67.7 VBD VBN VBD became VBD NNSVBD NNS IN in NN