Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27

Slides:



Advertisements
Similar presentations
Chapter 12 Lexicalized and Probabilistic Parsing Guoqiang Shan University of Arizona November 30, 2006.
Advertisements

10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 11 Giuseppe Carenini.
Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05.
CPSC 422, Lecture 14Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14 Feb, 4, 2015 Slide credit: some slides adapted from Stuart.
PARSING David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.
BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
CPSC 322, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Nov, 30, 2015 Slide source: from David Page (MIT) (which were.
CPSC 422, Lecture 19Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of.
CPSC 422, Lecture 28Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 28 Nov, 18, 2015.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
CPSC 422, Lecture 17Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17 Oct, 19, 2015 Slide Sources D. Koller, Stanford CS - Probabilistic.
December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
Chapter 12: Probabilistic Parsing and Treebanks Heshaam Faili University of Tehran.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Probabilistic and Lexicalized Parsing. Probabilistic CFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations –Use weights to.
Natural Language Processing Vasile Rus
CSC 594 Topics in AI – Natural Language Processing
CSC 594 Topics in AI – Natural Language Processing
Statistical NLP Winter 2009
CS60057 Speech &Natural Language Processing
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 10
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 28
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
CS 388: Natural Language Processing: Statistical Parsing
Probabilistic and Lexicalized Parsing
CSCI 5832 Natural Language Processing
CPSC 503 Computational Linguistics
CS 388: Natural Language Processing: Syntactic Parsing
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 20
UBC Department of Computer Science Undergraduate Events More
CSCI 5832 Natural Language Processing
Probabilistic and Lexicalized Parsing
CSCI 5832 Natural Language Processing
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 2
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 26
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11
CPSC 503 Computational Linguistics
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
CPSC 503 Computational Linguistics
David Kauchak CS159 – Spring 2019
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
David Kauchak CS159 – Spring 2019
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
CPSC 503 Computational Linguistics
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Presentation transcript:

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 15, 2017 Lecture 28 CPSC 422, Lecture 27

Lecture Overview Recap Probabilistic Context Free Grammars (PCFG) CKY parsing for PCFG (only key steps) PCFG in practice: Modeling Structural and Lexical Dependencies CPSC 422, Lecture 27

Sample PCFG CPSC 422, Lecture 27

PCFGs are used to…. Estimate Prob. of parse tree Estimate Prob. to sentences The probability of a derivation (tree) is just the product of the probabilities of the rules in the derivation. Product because rule applications are independent (because CFG) integrate them with n-grams The probability of a word sequence (sentence) is the probability of its tree in the unambiguous case. It’s the sum of the probabilities of the trees in the ambiguous case. CPSC 422, Lecture 27

Head of a Phrase CPSC 422, Lecture 27 For each phrase type Simple set of hand-written rules to find the head of such a phrase. This rules are often called head percolation CPSC 422, Lecture 27

Acquiring Grammars and Probabilities Manually parsed text corpora (e.g., PennTreebank) Grammar: read it off the parse trees Ex: if an NP contains an ART, ADJ, and NOUN then we create the rule NP -> ART ADJ NOUN. Probabilities: We can create a PCFG automatically by exploiting manually parsed text corpora, such as the Penn Treebank. We can read off them grammar found in the treebank. Probabilities: can be assigned by counting how often each item is found in the treebank Ex: if the NP -> ART ADJ NOUN rule is used 50 times and all NP rules are used 5000 times, then the rule’s probability is 50/5000 = .01 CPSC 422, Lecture 27

Lecture Overview Recap Probabilistic Context Free Grammars (PCFG) CKY parsing for PCFG (only key steps) PCFG in practice: Modeling Structural and Lexical Dependencies CPSC 422, Lecture 27

Probabilistic Parsing: (Restricted) Task is to find the max probability tree for an input We will look at a solution to a restricted version of the general problem of finding all possible parses of a sentence and their corresponding probabilities CPSC 422, Lecture 27

Probabilistic CKY Algorithm Ney, 1991 Collins, 1999 CYK (Cocke-Kasami-Younger) algorithm A bottom-up parser using dynamic programming Assume the PCFG is in Chomsky normal form (CNF) Definitions w1… wn an input string composed of n words wij a string of words from word i to word j µ[i, j, A] : a table entry holds the maximum probability for a constituent with non-terminal A spanning words wi…wj A First described by Ney, but the version we are seeing here is adapted from Collins CPSC 422, Lecture 27

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 15, 2017 Lecture 28 CPSC 422, Lecture 27

Probabilistic CKY Algorithm Definitions w1… wn an input string composed of n words wij a string of words from word i to word j µ[i, j, A] : a table entry holds the maximum probability for a constituent with non-terminal A spanning words wi…wj A First described by Ney, but the version we are seeing here is adapted from Collins CPSC 422, Lecture 27

Probabilistic CKY Algorithm Definitions w1… wn an input string composed of n words wij a string of words from word i to word j µ[i, j, A] : a table entry holds the maximum probability for a constituent with non-terminal A spanning words wi…wj A First described by Ney, but the version we are seeing here is adapted from Collins CPSC 422, Lecture 27

CKY: Base Case Fill out the table entries by induction: Base case Consider the input strings of length one (i.e., each individual word wi) Since the grammar is in CNF: A * wi iff A  wi So µ[i, i, A] = P(A  wi) “Can1 you2 book3 TWA4 flights5 ?” Aux 1 .4 Noun 5 .5 …… CPSC 422, Lecture 27

CKY: Recursive Case A C B Recursive case For strings of words of length = 2, A * wij iff there is at least one rule A  BC where B derives the first k words (between i and i+k-1 ) and C derives the remaining ones (between i+k and j) A C B i i+k-1 i+k j µ[i, j, A] = µ [i, i+k-1, B] * µ [i+k, j, C ] * P(A  BC) (for each non-terminal)Choose the max among all possibilities Compute the probability by multiplying together the probabilities of these two pieces (note that they have been calculated in the recursion) 2<= k <= j – i - 1 CPSC 422, Lecture 27

Probabilistic CKY Algorithm (visual) CPSC 422, Lecture 27

CKY: Termination S The max prob parse will be µ [ ] 1 5 1.7x10-6 “Can1 you2 book3 TWA4 flight5 ?” Any other filler for this matrix? 1,3 and 1,4 and 3,5 ! CPSC 422, Lecture 27

CKY: Termination S Any other entry in this matrix for S? A. Yes C. Cannot Tell B. No “Can1 you2 book3 TWA4 flight5 ?” 5 S 1 1.7x10-6 Any other filler for this matrix? 1,3 and 1,4 and 3,5 ! 1,3 and 1,4 and 3,5 …. ! 5 CPSC 422, Lecture 27

CKY: anything missing? S The max prob parse will be µ [ ] 1 5 1.7x10-6 “Can1 you2 book3 TWA4 flight5 ?” CPSC 422, Lecture 27

Lecture Overview Recap Probabilistic Context Free Grammars (PCFG) CKY parsing for PCFG (only key steps) PCFG in practice: Modeling Structural and Lexical Dependencies CPSC 422, Lecture 27

Problems with PCFGs Most current PCFG models are not vanilla PCFGs Usually augmented in some way Vanilla PCFGs assume independence of non-terminal expansions But statistical analysis shows this is not a valid assumption Structural and lexical dependencies Probabilities for NP expansions do not depend on context. CPSC 422, Lecture 27

Structural Dependencies: Problem E.g. Syntactic subject of a sentence tends to be a pronoun Subject tends to realize “old information” “Mary bought a new book for her trip. She didn’t like the first chapter. So she decided to watch a movie.” In Switchboard corpus: I do not get good estimates for the pro Parent annotation technique 91% of subjects in declarative sentences are pronouns 66% of direct objects are lexical (nonpronominal) (i.e., only 34% are pronouns) Subject tends to realize the topic of a sentence Topic is usually old information Pronouns are usually used to refer to old information So subject tends to be a pronoun CPSC 422, Lecture 27

How would you address this problem? CPSC 422, Lecture 27

Structural Dependencies: Solution Split non-terminal. E.g., NPsubject and NPobject Parent Annotation: Hand-write rules for more complex struct. dependencies Splitting problems? I do not get good estimates for the pro Parent annotation technique Splitting problems - Increase size of the grammar -> reduces amount of training data for each rule -> leads to overfitting Automatic/Optimal split – Split and Merge algorithm [Petrov et al. 2006- COLING/ACL] CPSC 422, Lecture 27

Lexical Dependencies: Problem The verb send subcategorises for a destination, which could be a PP headed by “into” CPSC 422, Lecture 27

Lexical Dependencies: Problem Two parse trees for the sentence “Moscow sent troops into Afghanistan” (b) (a) The verb send subcategorises for a destination, which could be a PP headed by “into” VP-attachment NP-attachment Typically NP-attachment more frequent than VP-attachment CPSC 422, Lecture 27

Lexical Dependencies: Solution Add lexical dependencies to the scheme… Infiltrate the influence of particular words into the probabilities of the rules All the words? (a) P(VP -> V NP PP | VP = “sent troops into Afg.”) P(VP -> V NP | VP = “sent troops into Afg.”) (b) No only the key ones A. Good Idea B. Bad Idea C. Cannot Tell CPSC 422, Lecture 27

Lexical Dependencies: Solution Add lexical dependencies to the scheme… Infiltrate the influence of particular words into the probabilities of the rules All the words? (a) P(VP -> V NP PP | VP = “sent troops into Afg.”) P(VP -> V NP | VP = “sent troops into Afg.”) (b) No only the key ones Not likely to have significant counts in any treebank! A. Good Idea B. Bad Idea C. Cannot Tell CPSC 422, Lecture 27

Use only the Heads To do that we’re going to make use of the notion of the head of a phrase The head of an NP is its noun The head of a VP is its verb The head of a PP is its preposition (but for other phrases can be more complicated and somewhat controversial) Should the complementizer TO or the verb be the head of an infinite VP? Most linguistic theories of syntax of syntax generally include a component that defines heads. CPSC 422, Lecture 27

More specific rules We used to have rule r Now we have rule r VP -> V NP PP P(r|VP) That’s the count of this rule divided by the number of VPs in a treebank Now we have rule r VP(h(VP))-> V(h(VP)) NP PP P(r | VP, h(VP)) VP(sent)-> V(sent) NP PP P(r | VP, sent) What is the estimate for P(r | VP, sent) ? How many times was this rule used with sent, divided by the number of VPs that sent appears in total Also associate the head tag e.g., NP(sacks,NNS) where NNS is noun, plural CPSC 422, Lecture 27

NLP Practical Goal for FOL (and Prob NLP Practical Goal for FOL (and Prob. Parsing) the ultimate Web question-answering system? Map NL queries into FOPC so that answers can be effectively computed What African countries are not on the Mediterranean Sea? Was 2007 the first El Nino year after 2001? We didn’t assume much about the meaning of words when we talked about sentence meanings Verbs provided a template-like predicate argument structure Nouns were practically meaningless constants There has be more to it than that View assuming that words by themselves do not refer to the world, cannot be Judged to be true or false… CPSC 422, Lecture 27

PCFG Parsing State of the art(~2010) Most effective: Combining lexicalized and unlexicalized From C. Manning (Stanford NLP) CPSC 422, Lecture 27

Grammar as a Foreign Language Computation and Language [cs.CL] Published 24 Dec 2014 Updated 9 Jun 2015  Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, Geoffrey Hinton Google + A attention mechanism Fast and Accurate Shift-Reduce Constituent Parsing by Muhua Zhu, Yue Zhang, Wenliang Chen, Min Zhang and Jingbo Zhu (ACL - 2013) Announcing SyntaxNet: The World’s Most Accurate Parser Goes Open Source, 2016; Posted by Slav Petrov, Senior Staff Research Scientist (different parsing framework) CPSC 422, Lecture 27

Beyond NLP……. Planning….. CKY/PCFG Beyond syntax……. Discourse Parsing….. And Dialog CKY Probabilistic parsing Paper in Reading Conversation Trees: A Grammar Model for Topic Structure in Forums, Annie Louis and Shay B. Cohen, EMNLP 2015. [corpus] Beyond NLP……. Planning….. Li, N., Cushing, W., Kambhampati, S., & Yoon, S. (2012). Learning probabilistic hierarchical task networks as probabilistic context-free grammars to capture user preferences. ACM Transactions on Intelligent Systems and Technology. (CMU+Arizona State) Probabilistic Context-Free Grammars (PCFG) and Probabilistic Linear Context-Free Rewriting Systems (PLCFRS). In the PCFG model, a non-terminal spans a contiguous sequence of posts. In the PLCFRS model, non-terminals are allowed to span discontinuous segments of posts. CPSC 422, Lecture 27

Discovering Discourse Structure: Computational Tasks The bank was hamstrung in its efforts to face the challenges of a changing market by its links to the government, analysts say. Discourse Segmentation The bank was hamstrung in its efforts to face the challenges of a changing market by its links to the government, analysts say. 3 1 2 Discourse Parsing Discourse analysis in RST involves two subtasks. 1. breaking the text into EDUs, known as discourse segmentation, 2. linking the EDUs into a labeled hierarchical tree structure, known as discourse parsing. CPSC 422, Lecture 27

Applications of AI 422 big picture: Where are we? Query Planning Deterministic Stochastic Value Iteration Approx. Inference Full Resolution SAT Logics Belief Nets Markov Decision Processes and Partially Observable MDP Markov Chains and HMMs First Order Logics Ontologies Applications of AI Approx. : Gibbs Undirected Graphical Models Markov Networks Conditional Random Fields Reinforcement Learning Representation Reasoning Technique Prob CFG Prob Relational Models Markov Logics StarAI (statistical relational AI) Hybrid: Det +Sto Forward, Viterbi…. Approx. : Particle Filtering 422 big picture: Where are we? First Order Logics (will build a little on Prolog and 312) Temporal reasoning from NLP 503 lectures Reinforcement Learning (not done in 340, at least in the latest offering) Markov decision process Set of states S, set of actions A Transition probabilities to next states P(s’| s, a′) Reward functions R(s, s’, a) RL is based on MDPs, but Transition model is not known Reward model is not known While for MDPs we can compute an optimal policy RL learns an optimal policy CPSC 422, Lecture 27

Learning Goals for today’s class You can: Describe the key steps of CKY probabilistic parsing Motivate introduction of structural and lexical dependencies Describe how to deal with these dependencies within the PCFG framework CPSC 422, Lecture 27

Next class on Fri : paper discussion Portions of our Journal of Computational Linguistics paper only sections 1, 3 and 4 are mandatory CODRA: A Novel Discriminative Framework for Rhetorical Analysis Assignment-3 due on Mon Assignment-4 will be out on the same day CPSC 422, Lecture 27