1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.

Slides:

Advertisements

Similar presentations

Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.

Advertisements

Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.

Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.

GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.

10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :

September PROBABILISTIC CFGs & PROBABILISTIC PARSING Universita’ di Venezia 3 Ottobre 2003.

March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

Ch 9. Markov Models 고려대학교 자연어처리연구실 한 경 수

Statistical NLP: Lecture 11

Statistical NLP: Hidden Markov Models Updated 8/12/2005.

Hidden Markov Models Fundamentals and applications to bioinformatics.

Tagging with Hidden Markov Models. Viterbi Algorithm. Forward-backward algorithm Reading: Chap 6, Jurafsky & Martin Instructor: Paul Tarau, based on Rada.

Part II. Statistical NLP Advanced Artificial Intelligence Probabilistic Context Free Grammars Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.

RNA Secondary Structure aagacuucggaucuggcgacaccc uacacuucggaugacaccaaagug aggucuucggcacgggcaccauuc ccaacuucggauuuugcuaccaua aagccuucggagcgggcguaacuc.

Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05.

Grammar induction by Bayesian model averaging Guy Lebanon LARG meeting May 2001 Based on Andreas Stolcke’s thesis UC Berkeley 1994.

1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.

Inside-outside algorithm LING 572 Fei Xia 02/28/06.

Fall 2001 EE669: Natural Language Processing 1 Lecture 13: Probabilistic CFGs (Chapter 11 of Manning and Schutze) Wen-Hsiang Lu ( 盧文祥 ) Department of Computer.

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.

CISC667, F05, Lec19, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) RNA secondary structure.

Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.

Fall 2001 EE669: Natural Language Processing 1 Lecture 9: Hidden Markov Models (HMMs) (Chapter 9 of Manning and Schutze) Dr. Mary P. Harper ECE, Purdue.

More on Text Management. Context Free Grammars Context Free Grammars are a more natural model for Natural Language Syntax rules are very easy to formulate.

SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.

1 Persian Part Of Speech Tagging Mostafa Keikha Database Research Group (DBRG) ECE Department, University of Tehran.

Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.

Text Models. Why? To “understand” text To assist in text search & ranking For autocompletion Part of Speech Tagging.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.

GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.

SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

1 *Introduction to Natural Language Processing ( ) Parsing: Introduction Dr. Jan Hajiè CS Dept., Johns Hopkins Univ.

12/06/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Parsing Dr. Jan Hajič CS Dept., Johns Hopkins Univ.

Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.

Text Models Continued HMM and PCFGs. Recap So far we have discussed 2 different models for text – Bag of Words (BOW) where we introduced TF-IDF Location.

11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.

Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.

1 Introduction to Natural Language Processing ( ) Parsing: Introduction.

CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-17: Probabilistic parsing; inside- outside probabilities.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.

NLP. Introduction to NLP Sequence of random variables that aren’t independent Examples –weather reports –text.

Chapter 23: Probabilistic Language Models April 13, 2004.

CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-16: Probabilistic parsing; computing probability of.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.

CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.

Evaluating Models of Computation and Storage in Human Sentence Processing Thang Luong CogACLL 2015 Tim J. O’Donnell & Noah D. Goodman.

CPSC 422, Lecture 28Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 28 Nov, 18, 2015.

Probabilistic Context Free Grammars Grant Schindler 8803-MDM April 27, 2006.

CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.

December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)

CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.

/02/20161 Probabilistic Context Free Grammars Chris Brew Ohio State University.

CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)

N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,

PCFG estimation with EM The Inside-Outside Algorithm.

Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.

Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:

1 Statistical methods in NLP Diana Trandabat

CS : Speech, NLP and the Web/Topics in AI

CSCI 5832 Natural Language Processing

Training Tree Transducers

N-Gram Model Formulas Word sequences Chain rule of probability

Stochastic Context Free Grammars for RNA Structure Modeling

CS : Language Technology for the Web/Natural Language Processing

CS 224n / Lx 237 section Tuesday, May

Presentation transcript:

1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars

2 Motivation 4 N-gram models and HMM Tagging only allowed us to process sentences linearly. 4 However, even simple sentences require a nonlinear model that reflects the hierarchical structure of sentences rather than the linear order of words. 4 Probabilistic Context Free Grammars are the simplest and most natural probabilistic model for tree structures and the algorithms for them are closely related to those for HMMs. 4 Note, however, that there are other ways of building probabilistic models of syntactic structure (see Chapter 12).

3 Formal Definition of PCFGs A PCFG consists of: –A set of terminals, {w k }, k= 1,…,V –A set of nonterminals, N i, i= 1,…, n –A designated start symbol N 1 –A set of rules, {N i -->  j }, (where  j is a sequence of terminals and nonterminals) –A corresponding set of probabilities on rules such that:  i  j P(N i -->  j ) = 1 4 The probability of a sentence (according to grammar G) is given by:. P(w 1m, t) where t is a parse tree of the sentence. =  {t: yield(t)=w1m} P(t)

4 Assumptions of the Model 4 Place Invariance: The probability of a subtree does not depend on where in the string the words it dominates are. 4 Context Free: The probability of a subtree does not depend on words not dominated by the subtree. 4 Ancestor Free: The probability of a subtree does not depend on nodes in the derivation outside the subtree.

5 Some Features of PCFGs 4 A PCFG gives some idea of the plausibility of different parses. However, the probabilities are based on structural factors and not lexical ones. 4 PCFG are good for grammar induction. 4 PCFGs are robust. 4 PCFGs give a probabilistic language model for English. 4 The predictive power of a PCFG tends to be greater than for an HMM. Though in practice, it is worse. 4 PCFGs are not good models alone but they can be combined with a tri-gram model. 4 PCFGs have certain biases which may not be appropriate.

6 Questions fo PCFGs 4 Just as for HMMs, there are three basic questions we wish to answer: 4 What is the probability of a sentence w 1m according to a grammar G: P(w 1m |G)? 4 What is the most likely parse for a sentence: argmax t P(t|w 1m,G)? 4 How can we choose rule probabilities for the grammar G that maximize the probability of a sentence, argmax G P(w1m|G) ?

7 Restriction 4 In this lecture, we only consider the case of Chomsky Normal Form Grammars, which only have unary and binary rules of the form: N i --> N j N k N i --> w j 4 The parameters of a PCFG in Chomsky Normal Form are: P(N j --> N r N s | G), an n3 matrix of parameters P(N j --> w k |G), nV parameters (where n is the number of nonterminals and V is the number of terminals) 4  r,s P(N j --> N r N s ) +  k P (N j --> w k ) =1

8 From HMMs to Probabilistic Regular Grammars (PRG) 4 A PRG has start state N 1 and rules of the form: –N i --> w j N k –N i --> w j  This is similar to what we had for an HMM except that in an HMM, we have  n  w1n P(w 1n ) = 1 whereas in a PCFG, we have  w  L P(w) = 1 where L is the language generated by the grammar. 4 PRG are related to HMMs in that a PRG is a HMM to which we should add a start state and a finish (or sink) state.

9 From PRGs to PCFGs 4 In the HMM, we were able to efficiently do calculations in terms of forward and backward probabilities. 4 In a parse tree, the forward probability corresponds to everything above and including a certain node, while the backward probability corresponds to the probability of everything below a certain node. 4 We introduce Outside (  j ) and Inside (  j ) Probs.: –  j (p,q)=P(w 1(p-1), N pq j,w (q+1)m |G) –  j (p,q)=P(w pq |N pq j, G)

10 The Probability of a String I: Using Inside Probabilities 4 We use the Inside Algorithm, a dynamic programming algorithm based on the inside probabilities: P(w 1m |G) = P(N 1 ==>* w 1m |G) =. P(w 1m |N 1m 1, G)=  1 (1,m) 4 Base Case:  j (k,k) = P(w k |N kk j, G)=P(N j --> w k |G) 4 Induction:  j (p,q) =  r,s  d=p q-1 P(N j --> N r N s )  r (p,d)  s (d+1,q)

11 The Probability of a String II: Using Outside Probabilities 4 We use the Outside Algorithm based on the outside probabilities: P(w 1m |G)=  j  j (k,k)P(N j --> w k ) 4 Base Case:  1 (1,m)= 1;  j (1,m)=0 for j  1 4 Inductive Case:  j (p,q)=. 4 Similarly to the HMM, we can combine the inside and the outside probabilities: P(w 1m, N pq |G)=  j  j (p,q)  j (p,q)

12 Finding the Most Likely Parse for a Sentence 4 The algorithm works by finding the highest probability partial parse tree spanning a certain substring that is rooted with a certain nonterminal. 4  i (p,q) = the highest inside probability parse of a subtree N pq i 4 Initialization:  i (p,p) = P(Ni --> w p ) 4 Induction:  i (p,q) = max 1  j,k  n,p  r N j N k )  j (p,r)  k (r+1,q) 4 Store backtrace:  i (p,q)=argmax (j,k,r) P(N i --> N j N k )  j (p,r)  k (r+1,q) 4 Termination: P(t ^ )=  1 (1,m)

13 Training a PCFG 4 Restrictions: We assume that the set of rules is given in advance and we try to find the optimal probabilities to assign to different grammar rules. 4 Like for the HMMs, we use an EM Training Algorithm called the Inside-Outside Algorithm which allows us to train the parameters of a PCFG on unannotated sentences of the language. 4 Basic Assumption: a good grammar is one that makes the sentences in the training corpus likely to occur ==> we seek the grammar that maximizes the likelihood of the training data.

14 Problems with the Inside-Outside Algorithm 4 Extremely Slow: For each sentence, each iteration of training is O(m 3 n 3 ). 4 Local Maxima are much more of a problem than in HMMs 4 Satisfactory learning requires many more nonterminals than are theoretically needed to describe the language. 4 There is no guarantee that the learned nonterminals will be linguistically motivated.