Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1.

Slides:



Advertisements
Similar presentations
Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.
Advertisements

Course: Neural Networks, Instructor: Professor L.Behera.
Hierarchical Dirichlet Process (HDP)
1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Ouyang Ruofei Topic Model Latent Dirichlet Allocation Ouyang Ruofei May LDA.
Tracking L2 Lexical and Syntactic Development Xiaofei Lu CALPER 2010 Summer Workshop July 14, 2010.
Hierarchical Dirichlet Processes
Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.
Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
HW 4. Nonparametric Bayesian Models Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
Parsing with CFG Ling 571 Fei Xia Week 2: 10/4-10/6/05.
Top-Down Parsing.
Stemming, tagging and chunking Text analysis short of parsing.
Nonparametric Bayes and human cognition Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
Distributional Cues to Word Boundaries: Context Is Important Sharon Goldwater Stanford University Tom Griffiths UC Berkeley Mark Johnson Microsoft Research/
Grammar induction by Bayesian model averaging Guy Lebanon LARG meeting May 2001 Based on Andreas Stolcke’s thesis UC Berkeley 1994.
The Use of Corpora for Automatic Evaluation of Grammar Inference Systems Andrew Roberts & Eric Atwell Corpus Linguistics ’03 – 29 th March Computer Vision.
Normal forms for Context-Free Grammars
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Top-Down Parsing - recursive descent - predictive parsing
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
Some Probability Theory and Computational models A short overview.
Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz.
Classification of grammars Definition: A grammar G is said to be 1)Right-linear if each production in P is of the form A  xB or A  x where A and B are.
Inferring structure from data Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
1 LING 696B: Midterm review: parametric and non-parametric inductive inference.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
Albert Gatt Corpora and Statistical Methods Lecture 11.
1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Chapter 23: Probabilistic Language Models April 13, 2004.
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
Stick-Breaking Constructions
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Evaluating Models of Computation and Storage in Human Sentence Processing Thang Luong CogACLL 2015 Tim J. O’Donnell & Noah D. Goodman.
csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.
Probabilistic Context Free Grammars Grant Schindler 8803-MDM April 27, 2006.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Data Mining and Decision Support
December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
Nonparametric Bayesian Models. HW 4 x x Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
Chapter 3 Language Acquisition: A Linguistic Treatment Jang, HaYoung Biointelligence Laborotary Seoul National University.
Understanding unstructured texts via Latent Dirichlet Allocation Raphael Cohen DSaaS, EMC IT June 2015.
General Information on Context-free and Probabilistic Context-free Grammars İbrahim Hoça CENG784, Fall 2013.
David Mareček and Zdeněk Žabokrtský
Non-Parametric Models
Formal Language Theory
Morphological Segmentation Inside-Out
A Non-Parametric Bayesian Method for Inferring Hidden Causes
CSA2050 Introduction to Computational Linguistics
A Joint Model of Orthography and Morphological Segmentation
Presentation transcript:

Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1

Outline Definition and motivation behind unsupervised grammar learning Non-parametric Bayesian statistics Adaptor grammars vs. PCFG A short introduction to Chinese Restaurant Process Applications of Adaptor grammar 2

Unsupervised Learning How many categories of objects? How many features does an object have? How many words and rules are in a language? 3

Grammar Induction Goal: – study how a grammar and parses can be learnt from terminal strings alone Motivation: – Help us understand human language acquisition – Inducing parsers for low-resource languages 4

Nonparametric Bayesian statistics Learning the things people learn requires using rich, unbounded hypothesis spaces Language learning is non-parametric inference, no (obvious) bound on number of words, grammatical, morphemes. Use stochastic processes to define priors on infinite hypothesis spaces 5

Nonparametric Bayesian statistics 6

Is PCFG good enough for our purpose? PCFG can be learnt through Bayesian framework but … Set of rules is fixed in standard PCFG estimation PCFG rules are “too small” to be effective units of generalization How can we solve this problem? 7

1.let the set of non-terminals grow unboundedly: – Start with un-lexicalized short grammar – Split-Join of non-terminals 1.let the set of rules grow unboundedly: – Generate new rules when ever you need – Learn sub-trees and their probabilities ( Bigger units of generalization) Two Non-parametric Bayesian extensions to PCFGs 8

Adaptive Grammar CFG rules is used to generate the trees as in a CFG We have two types of non-terminals: – Un-adapted (normal) non-terminals Picking a rule and recursive expanding its children as in PCFG – Adapted non-terminals Picking a rule and recursive expanding its children Generating a previously generated tree (proportional to number of times that it is already generated) 9

The Story of Adaptor Grammars 10

Properties of Adaptor grammars In Adaptor grammars: – The probability of adapted sub-trees are learnt separately, not just product of probability of rules. – “Rich get richer” (Zipf distribution) – Useful compound structures are more probable than their parts. – there is no recursion amongst adapted non- terminals (an adapted non-terminal never expands to itself) 11

12 The Chinese Restaurant Process

n customers walk into a restaurant, choose tables z i with probability Defines an exchangeable distribution over seating arrangements (inc. counts on tables) 13

CRP 14

CRP 15

CRP 16

CRP 17

CRP 18

Application of Adaptor grammars No usage for parsing! Because grammar induction is hard. 1.Word Segmentation 2.Learning concatenative morphology 3.Learning the structure of NE NPs 4.Topic Modeling 19

Unsupervised Word Segmentation Input: phoneme sequences with sentence boundaries Task: identify words 20

Word segmentation with PCFG 21

Unigram word segmentation 22

Collocation word segmentation 23

Performance GeneralizationAccuracy Unigram56% + collocations76% + syllable structure87% Evaluated on Brent corpus 24

Morphology Input: raw text Task: identify stems and morphemes and decompose a word to its morphological components Adaptor grammars can just be applied for simple concatenative morphology. 25

CFG for morphological analysis 26

Adaptor grammar for morphological analysis Generated Words: 1. cats 2. dogs 3. cats 27

Performance For more sophisticated model: 116,129 tokens: 70% correctly segmented 7,170 verb types: 66% correctly segmented 28

Inference distribution of adapted trees are exchangeable : Gibbs sampling Variational Inference method is also provided for learning adaptor grammars. Covering this part is beyond the objectives of this talk. 29

Conclusion We are interested in inducing grammars without supervision for two reasons: – Language acquisition – Low-resource languages PCFG rules are too much small for bigger generalization Learning the things people learn requires using rich, unbounded hypothesis spaces Adaptor grammars using CRP to learn rules from this unbounded hypothesis spaces 30

References Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models, M. Johnson et al., ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 2007 Using adaptor grammars to identify synergies in the unsupervised acquisition of linguistic structure, Mark Johnson, ACL-08, HLT, 2008 Inferring Structure from Data, Tom Griffith, Machine Learning summer school, Sardinia,

Thank you for your attention! 32