1 Statistical methods in NLP Diana Trandabat

Slides:

Advertisements

Similar presentations

Albert Gatt Corpora and Statistical Methods Lecture 11.

Advertisements

PARSING WITH CONTEXT-FREE GRAMMARS

Chapter 12 Lexicalized and Probabilistic Parsing Guoqiang Shan University of Arizona November 30, 2006.

1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.

March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

CKY Parsing Ling 571 Deep Processing Techniques for NLP January 12, 2011.

1 Earley Algorithm Chapter 13.4 October 2009 Lecture #9.

PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.

6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 11 Giuseppe Carenini.

Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05.

Inside-outside algorithm LING 572 Fei Xia 02/28/06.

Syntactic Parsing with CFGs CMSC 723: Computational Linguistics I ― Session #7 Jimmy Lin The iSchool University of Maryland Wednesday, October 14, 2009.

Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.

More on Text Management. Context Free Grammars Context Free Grammars are a more natural model for Natural Language Syntax rules are very easy to formulate.

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור תשע Bottom Up Parsing עידו דגן.

Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.

1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.

11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.

PARSING David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.

1 Statistical Parsing Chapter 14 October 2012 Lecture #9.

1 CKY and Earley Algorithms Chapter 13 October 2012 Lecture #8.

GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.

SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

Parsing with Context Free Grammars CSC 9010 Natural Language Processing Paula Matuszek and Mary-Angela Papalaskari This slide set was adapted from: Jim.

Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.

PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.

11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.

CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-17: Probabilistic parsing; inside- outside probabilities.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.

CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-16: Probabilistic parsing; computing probability of.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.

NLP. Introduction to NLP Motivation –A lot of the work is repeated –Caching intermediate results improves the complexity Dynamic programming –Building.

Quick Speech Synthesis CMSC Natural Language Processing April 29, 2003.

CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 27– SMT Assignment; HMM recap; Probabilistic Parsing cntd) Pushpak Bhattacharyya.

Natural Language Processing Lecture 15—10/15/2015 Jim Martin.

December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)

GRAMMARS David Kauchak CS457 – Spring 2011 some slides adapted from Ray Mooney.

October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.

PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,

Speech and Language Processing SLP Chapter 13 Parsing.

Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.

Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:

Natural Language Processing Vasile Rus

CSC 594 Topics in AI – Natural Language Processing

CSC 594 Topics in AI – Natural Language Processing

Parsing Recommended Reading: Ch th Jurafsky & Martin 2nd edition

Statistical NLP Winter 2009

CS60057 Speech &Natural Language Processing

Basic Parsing with Context Free Grammars Chapter 13

Parsing Recommended Reading: Ch th Jurafsky & Martin 2nd edition

CS : Speech, NLP and the Web/Topics in AI

CSCI 5832 Natural Language Processing

CPSC 503 Computational Linguistics

CSCI 5832 Natural Language Processing

Recitation Akshay Srivatsan

CS 388: Natural Language Processing: Syntactic Parsing

CSCI 5832 Natural Language Processing

Parsing and More Parsing

CS : Language Technology for the Web/Natural Language Processing

CSA2050 Introduction to Computational Linguistics

David Kauchak CS159 – Spring 2019

Parsing I: CFGs & the Earley Parser

David Kauchak CS159 – Spring 2019

Presentation transcript:

1 Statistical methods in NLP Diana Trandabat

CKY Parsing Cocke-Kasami-Younger parsing algorithm: – (Relatively) efficient bottom-up parsing algorithm based on tabulating substring parses to avoid repeated work – Approach: Use a Chomsky Normal Form grammar Build an (n+1) x (n+1) matrix to store subtrees – Upper triangular portion Incrementally build parse spanning whole input string

Reminder – A CNF grammar is a Context-Free Grammar in which: Every rule LHS is a non-terminal Every rule RHS consists of either a single terminal or two non-terminals. Examples: – A  BC – NP  Nominal PP – A  a – Noun  man But not: – NP  the Nominal – S  VP

Reminder Any CFG can be re-written in CNF, without any loss of expressiveness. – That is, for any CFG, there is a corresponding CNF grammar which accepts exactly the same set of strings as the original CFG.

Dynamic Programming in CKY Key idea: – For a parse spanning substring [i,j], there exists some k such there are parses spanning [i,k] and [k,j] We can construct parses for whole sentence by building up from these stored partial parses So, – To have a rule A -> B C in [i,j], We must have B in [i,k] and C in [k,j], for some i<k<j – CNF grammar forces this for all j>i+1

CKY Given an input string S of length n, – Build table (n+1) x (n+1) – Indexes correspond to inter-word positions W.g., 0 Book 1 That 2 Flight 3 Cells [i,j] contain sets of non-terminals of ALL constituents spanning i,j – [j-1,j] contains pre-terminals – If [0,n] contains Start, the input is recognized

Recognising strings with CKY Example input: The flight includes a meal. The CKY algorithm proceeds by: 1.Splitting the input into words and indexing each position. (0) the (1) flight (2) includes (3) a (4) meal (5) 2.Setting up a table. For a sentence of length n, we need (n+1) rows and (n+1) columns. 3.Traversing the input sentence left-to-right 4.Use the table to store constituents and their span.

The table DetS theflightincludesameal [0,1] for “the” Rule: Det  the

The table DetS 1 N theflightincludesameal [0,1] for “the”[1,2] for “flight” Rule1: Det  the Rule 2: N  flight

The table DetNPS 1 N theflightincludesameal [0,1] for “the” [0,2] for “the flight” [1,2] for “flight” Rule1: Det  the Rule 2: N  flight Rule 3: NP  Det N

A CNF CFG for CKY S  NP VP NP  Det N VP  V NP V  includes Det  the Det  a N  meal N  flight

CYK algorithm: two components Lexical step: for j from 1 to length(string) do: let w be the word in position j find all rules ending in w of the form X  w put X in table[j-1,1] Syntactic step: for i = j-2 to 0 do: for k = i+1 to j-1 do: for each rule of the form A  B C do: if B is in table[i,k] & C is in table[k,j] then add A to table[i,j]

CKY algorithm: two components We actually interleave the lexical and syntactic steps: for j from 1 to length(string) do: let w be the word in position j find all rules ending in w of the form X  w put X in table[j-1,1] for i = j-2 to 0 do: for k = i+1 to j-1 do: for each rule of the form A  B C do: if B is in table[i,k] & C is in table[k,j] then add A to table[i,j]

CKY: lexical step (j = 1) The flight includes a meal. Lexical lookup Matches Det  the Det

CKY: lexical step (j = 2) The flight includes a meal Det 1 N Lexical lookup Matches N  flight

CKY: syntactic step (j = 2) The flight includes a meal DetNP 1 N Syntactic lookup: look backwards and see if there is any rule that will cover what we’ve done so far.

CKY: lexical step (j = 3) The flight includes a meal DetNP 1 N 2 V Lexical lookup Matches V  includes

CKY: lexical step (j = 3) The flight includes a meal DetNP 1 N 2 V Syntactic lookup There are no rules in our grammar that will cover Det, NP, V

CKY: lexical step (j = 4) The flight includes a meal DetNP 1 N 2 V 3 Det 4 5 Lexical lookup Matches Det  a

CKY: lexical step (j = 5) The flight includes a meal DetNP 1 N 2 V 3 Det 4 N Lexical lookup Matches N  meal

CKY: syntactic step (j = 5) The flight includes a meal DetNP 1 N 2 V 3 DetNP 4 N Syntactic lookup We find that we have NP  Det N

CKY: syntactic step (j = 5) The flight includes a meal DetNP 1 N 2 VVP 3 DetNP 4 N Syntactic lookup We find that we have VP  V NP

CKY: syntactic step (j = 5) The flight includes a meal DetNPS 1 N 2 VVP 3 DetNP 4 N Syntactic lookup We find that we have S  NP VP

From recognition to parsing The procedure so far will recognise a string as a legal sentence in English. But we’d like to get a parse tree back! Solution: – We can work our way back through the table and collect all the partial solutions into one parse tree. – Cells will need to be augmented with “backpointers”, i.e. With a pointer to the cells that the current cell covers.

From recognition to parsing DetNPS 1 N 2 VVP 3 DetNP 4 N

From recognition to parsing DetNPS 1 N 2 VVP 3 DetNP 4 N NB: This algorithm always fills the top “triangle” of the table!

What about ambiguity? The algorithm does not assume that there is only one parse tree for a sentence. – (Our simple grammar did not admit of any ambiguity, but this isn’t realistic of course). There is nothing to stop it returning several parse trees. If there are multiple local solutions, then more than one non- terminal will be stored in a cell of the table.

Exercise Apply the CKY algrithm to the fllowing sentence: Astronomers saw stars with ears. given the following grammar: S - > NP VP 1.0NP-> NP PP0.4 PP -> P NP 1.0NP-> astronomers0.2 VP -> V NP 0.7NP-> ears0.18 VP - > VP PP 0.3NP-> saw0.04 P -> with 1.0NP -> stars0.18 V-> saw 1.0

Exercise

Now run the CKY algorithm considering also the probabilities of the rules. The probability of a cell [i, j] is P(rule learning to the cell)*P(cell[I, j-1])*P(cell[j+1, i]

CKY Discussions Running time: where n is the length of the input string Inner loop grows as square of # of non-terminals Expressiveness: – As implemented, requires CNF Weakly equivalent to original grammar Doesn’t capture full original structure – Back-conversion? » Can do binarization, terminal conversion » Unit non-terminals require change in CKY

Parsing Efficiently With arbitrary grammars – Earley algorithm Top-down search Dynamic programming – Tabulated partial solutions Some bottom-up constraints

Interesting Probabilities The gunman sprayed the building with bullets N1N1 NP What is the probability of having a NP at this position such that it will derive “the building” ? - What is the probability of starting from N 1 and deriving “The gunman sprayed”, a NP and “with bullets” ? - Inside Probabilities Outside Probabilities

Interesting Probabilities Random variables to be considered – The non-terminal being expanded. E.g., NP – The word-span covered by the non-terminal. E.g., (4,5) refers to words “the building” While calculating probabilities, consider: – The rule to be used for expansion : E.g., NP  DT NN – The probabilities associated with the RHS non-terminals : E.g., DT subtree’s inside/outside probabilities & NN subtree’s inside/outside probabilities.

Outside Probabilities  j (p,q) : The probability of beginning with N 1 & generating the non-terminal N j pq and all words outside w p..w q Outside probability : w 1 ………w p-1 w p …w q w q+1 ……… w m  N1N1 NjNj

Inside Probabilities  j (p,q) : The probability of generating the words w p..w q starting with the non-terminal N j pq. Inside probability : w 1 ………w p-1 w p …w q w q+1 ……… w m   N1N1 NjNj

Outside & Inside Probabilities The gunman sprayed the building with bullets N1N1 NP

Inside probabilities  j (p,q) Base case: Base case is used for rules which derive the words or terminals directly E.g., Suppose N j = NN is being considered & NN  building is one of the rules with probability 0.5

Induction Step Induction step : wpwp NjNj NrNr NsNs wdwd w d+1 wqwq Consider different splits of the words - indicated by d E.g., the huge building Consider different non-terminals to be used in the rule: NP  DT NN, NP  DT NNS are available options Consider summation over all these. Split here for d=2 d=3

The Bottom-Up Approach The idea of induction Consider “the gunman” Base cases : Apply unary rules DT  the Prob = 1.0 NN  gunmanProb = 0.5 Induction : Prob that a NP covers these 2 words = P (NP  DT NN) * P (DT deriving the word “the”) * P (NN deriving the word “gunman”) = 0.5 * 1.0 * 0.5 = 0.25 The gunman NP 0.5 DT 1.0 NN 0.5

Parse Triangle A parse triangle is constructed for calculating  j (p,q) Probability of a sentence using  j (p,q):

Example PCFG Rules & Probabilities S  NP VP1.0 NP  DT NN0.5 NP  NNS0.3 NP  NP PP 0.2 PP  P NP1.0 VP  VP PP 0.6 VP  VBD NP0.4 DT  the1.0 NN  gunman0.5 NN  building0.5 VBD  sprayed 1.0 NNS  bullets1.0 P  with1.0

Parse Triangle The (1) gunman (2) sprayed (3) the (4) building (5) with (6) bullets (7) Fill diagonals with

Parse Triangle The (1) gunman (2) sprayed (3) the (4) building (5) with (6) bullets (7) Calculate using induction formula

Example Parse t 1` S 1.0 NP 0.5 VP 0.6 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullets with buildingthe Thegunman sprayed VP 0.4 Rule used here is VP  VP PP The gunman sprayed the building with bullets.

Another Parse t 2 S 1.0 NP 0.5 VP 0.4 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullets with buildingthe Thegunmansprayed NP 0.2 Rule used here is VP  VBD NP The gunman sprayed the building with bullets.

Parse Triangle The (1)gunman (2) sprayed (3) the (4) building (5) with (6) bullets (7)

Different Parses Consider – Different splitting points : E.g., 5th and 3 rd position – Using different rules for VP expansion : E.g., VP  VP PP, VP  VBD NP Different parses for the VP “sprayed the building with bullets” can be constructed this way.

Outside Probabilities  j (p,q) Base case: Inductive step for calculating : wpwp N f pe N j pq N g (q+1)e wqwq w q+1 wewe w p-1 w1w1 wmwm w e+1 N1N1 Summation over f, g & e

Probability of a Sentence Joint probability of a sentence w 1m and that there is a constituent spanning words w p to w q is given as: The gunman sprayed the building with bullets N1N1 NP

Further readings Michael Collins - The Inside-Outside AlgorithmThe Inside-Outside Algorithm

Great! See you next time!