Lexicalized and Probabilistic Parsing Read J & M Chapter 12.

Slides:



Advertisements
Similar presentations
Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
Advertisements

Probabilistic and Lexicalized Parsing CS Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.
Natural Language Processing - Parsing 1 - Language, Syntax, Parsing Problems in Parsing Ambiguity, Attachment / Binding Bottom vs. Top Down Parsing.
Chapter 12 Lexicalized and Probabilistic Parsing Guoqiang Shan University of Arizona November 30, 2006.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
Introduction and Jurafsky Model Resource: A Probabilistic Model of Lexical and Syntactic Access and Disambiguation, Jurafsky 1996.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language, Syntax, Parsing Problems in Parsing Ambiguity, Attachment.
1 Pertemuan 23 Syntatic Processing Matakuliah: T0264/Intelijensia Semu Tahun: 2005 Versi: 1/0.
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 11 Giuseppe Carenini.
Context-Free Parsing. 2/37 Basic issues Top-down vs. bottom-up Handling ambiguity –Lexical ambiguity –Structural ambiguity Breadth first vs. depth first.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05.
Features and Unification
Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky.
Fall 2004 Lecture Notes #5 EECS 595 / LING 541 / SI 661 Natural Language Processing.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language Syntax Parsing.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Context Free Grammars Reading: Chap 12-13, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
1 Natural Language Processing Lecture 11 Efficient Parsing Reading: James Allen NLU (Chapter 6)
December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
Context-Free Parsing Read J & M Chapter 10.. Basic Parsing Facts Regular LanguagesContext-Free Languages Required Automaton FSMPDA Algorithm to get rid.
1 Natural Language Processing Lecture Notes 11 Chapter 15 (part 1)
PS: Introduction to Psycholinguistics Winter Term 2005/06 Instructor: Daniel Wiechmann Office hours: Mon 2-3 pm Phone:
Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Context Free Grammars Reading: Chap 9, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Rada Mihalcea.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.
Albert Gatt Corpora and Statistical Methods Lecture 11.
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
Linguistic Essentials
Rules, Movement, Ambiguity
CSA2050 Introduction to Computational Linguistics Parsing I.
Natural Language - General
CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:
LING 001 Introduction to Linguistics Spring 2010 Syntactic parsing Part-Of-Speech tagging Apr. 5 Computational linguistics.
Dec 11, Human Parsing Do people use probabilities for parsing?! Sentence processing Study of Human Parsing.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)
GRAMMARS David Kauchak CS457 – Spring 2011 some slides adapted from Ray Mooney.
October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.
NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.
Entropy Reduction Model Resource: The Information Conveyed by Words in Sentences, John Hale.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
Week 3. Clauses and Trees English Syntax. Trees and constituency A sentence has a hierarchical structure Constituents can have constituents of their own.
Chapter 12: Probabilistic Parsing and Treebanks Heshaam Faili University of Tehran.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Probabilistic and Lexicalized Parsing. Probabilistic CFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations –Use weights to.
General Information on Context-free and Probabilistic Context-free Grammars İbrahim Hoça CENG784, Fall 2013.
Natural Language Processing Vasile Rus
Natural Language Processing Vasile Rus
CS 388: Natural Language Processing: Statistical Parsing
Probabilistic and Lexicalized Parsing
CSCI 5832 Natural Language Processing
Probabilistic and Lexicalized Parsing
CSCI 5832 Natural Language Processing
Natural Language - General
Linguistic Essentials
CPSC 503 Computational Linguistics
Presentation transcript:

Lexicalized and Probabilistic Parsing Read J & M Chapter 12.

Using Probabilities Resolving ambiguities: I saw the Statue of Liberty flying over New York. Predicting for recognition: I have to go. vs.I half to go. vs. I half way thought I’d go.

It’s Mostly About Semantics He drew one card. I saw the Statue of Liberty flying over New York. I saw a plane flying over New York. Workers dumped sacks into a bin. Moscow sent more than 100,000 soldiers into Afghanistan. John hit the ball with the bat. John hit the ball with the autograph. Visiting relatives can be trying. Visiting museums can be trying.

How to Add Semantics to Parsing? The classic approach to this problem: Ask a semantics module to choose. Two ways to do that: Cascade the two systems. Build all the parses, then pass them to semantics to rate them. Combinatorially awful. Do semantics incrementally. Pass constituents, get ratings and filter. In either case, we need to reason about the world.

The “Modern” Approach The modern approach: Skip “meaning” and the corresponding need for a knowledge base and an inference engine. Notice that the facts about meaning manifest themselves in probabilities of observed sentences if there are enough sentences. Why is this approach in vogue? Building world models is a lot harder than early researchers realized. But, we do have huge text corpora from which we can draw statistics.

Probabilistic Context-Free Grammars A PCFG is a context-free grammar in which each rule has been augmented with a probability: A   [p] is the probability that a given nonterminal symbol A will be rewritten as  via this rule. Another way to think of this is: P(A   |A) So the sum of all the probabilities of rules with left hand side A must be 1.

A Toy Example

How Can We Use These? In a top-down parser, we can follow the more likely path first. In a bottom-up parser, we can build all the constituents and then compare them.

The Probability of Some Parse T P(T) =where p(r(n)) means the probability that rule r will apply to expand the nonterminal n. Note the independence assumption. So what we want is: where  (S) is the set of possible parses for S.

An Example Can you book TWA flights?

An Example – The Probabilities = 1.5  = 1.7  Note how small the probabilities are, even with this tiny grammar.

Using Probabilities for Language Modeling Since there are fewer grammar rules than there are word sequences, it can be useful, in language modeling, to use grammar probabilities instead of flat n-gram frequencies. So the probability of some sentence S is the sum of the probabilities of its possible parses: Contrast with:

Adding Probabilities to a Parser Adding probabilities to a top-down parser, e.g., Earley: This is easy since we’re going top-down, we can choose which rule to prefer. Adding probabilities to a bottom-up parser: At each step, build the pieces, then add probabilities to them.

Limitations to Attaching Probabilities Just to Rules Sometimes it’s enough to know that one rule applies more often than another: Can you book TWA flights? But often it matters what the context is. Consider: S  NP VP NP  Pronoun[.8] NP  LexNP[.2] But, when the NP is the subject, the true probability of a pronoun is.91. When the NP is the direct object, the true probability of a pronoun is.34.

Often the Probabilities Depend on Lexical Choices I saw the Statue of Liberty flying over New York. I saw a plane flying over New York. Workers dumped sacks into a bin. Workers dumped sacks of potatoes. John hit the ball with the bat. John hit the ball with the autograph. Visiting relatives can be trying. Visiting museums can be trying. There were dogs in houses and cats. There were dogs in houses and cages.

The Dogs in Houses Example The problem is that both parses used the same rules so they will get the same probabilities assigned to them.

The Fix – Use the Lexicon The lexicon is an approximation to a knowledge base. It will let us treat into and of differently with respect to dumping without any clue what dumping means or what into and of mean. Note the difference between this approach and subcategorization rules, e.g., dump [SUBCAT NP] [SUBCAT LOCATION] Subcategorization rules specify requirements, not preferences.

Lexicalized Trees Key idea: Each constituent has a HEAD word:

Adding Lexical Items to the Rules VP(dumped)  VBD (dumped) NP (sacks) PP (into)3  VP(dumped)  VBD (dumped) NP (cats) PP (into)8  VP(dumped)  VBD (dumped) NP (hats) PP (into)4  VP(dumped)  VBD (dumped) NP (sacks) PP (above)1  We need fewer numbers than we would for N-gram frequencies: The workers dumped sacks of potatoes into a bin. The workers dumped sacks of onions into a bin. The workers dumped all the sacks of potatoes into a bin. But there are still too many and most will be 0 in any given corpus.

Collapsing These Cases Instead of caring about specific rules like: VP(dumped)  VBD (dumped) NP (sacks) PP (into) 3  Or about very general rules like: VP  VBD NP PP We’ll do something partway in between: VP(dumped)  VBD NP PPp(r(n) | n, h(n))

Computing Probabilities of Heads We’ll let the probability of some node n having head h depend on two factors: the syntactic category of the node, and the head of the node’s mother (h(m(n))) So we will compute: P(h(n) = word i | n, h(m(n))) VP (dumped)VP (dumped)NP (sacks) p = p 1 p = p 2 p = p 3 PP (into)PP (of) PP (of) So now we’ve got probabilistic subcat information.

Revised Rule for Probability of a Parse Our initial rule: P(T) =where p(r(n)) means the probability that rule r will apply to expand the nonterminal n. Our new rule: P(T) = probability of choosing this rule given the nonterminal and its head  probability that this node has head h given the nonterminal and the head of its mother

So We Can Solve the Dumped Sacks Problem From the Brown corpus: p(VP  VBD NP PP | VP, dumped) =.67 p(VP  VBD NP | VP, dumped) = 0 p(into | PP, dumped) =.22 p(into | PP, sacks) = 0 So, the contribution of this part of the parse to the total scores for the two candidates is: [dumped into].67 .22 =.147 [sacks into] 0  0= 0

It’s Mostly About Semantics But It’s Also About Psychology What do people do? People have limited memory for processing language. So we should consider two aspects of language skill: competence (what could we in principle do?), and performance (what do we actually do, including mistakes?)

Garden Path Sentences Are people deterministic parsers? Consider garden path sentences such as: The horse raced past the barn fell. The complex houses married and single students and their families. I told the boy the dog bit Sue would help him.

Embedding Limitations There are limits to the theoretical ability to apply recursion in grammar rules: # The Republicans who the senator who she voted for chastised were trying to cut all benefits for veterans. # Tom figured that that Susan wanted to take the cat out bothered Betsy out. (Church) Harold heard [that John told the teacher that Bill said that Sam thought that Mike threw the first punch] yesterday. (Church)

Building Deterministic Parsers What if we impose performance constraints on our parsers? Will they work? Require that the parser be deterministic. At any point, it must simply choose the best parse given what has come so far and, perhaps, some limited number of lookahead constituents (Marcus allowed 3). Limit the amount of memory that the parser may use. This effectively makes the parser an FSM, in fact a deterministic FSM.