NLP. Introduction to NLP The probabilities don’t depend on the specific words –E.g., give someone something (2 arguments) vs. see something (1 argument)

Slides:



Advertisements
Similar presentations
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Advertisements

Albert Gatt Corpora and Statistical Methods Lecture 11.
Probabilistic and Lexicalized Parsing CS Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Dependency Parsing Joakim Nivre. Dependency Grammar Old tradition in descriptive grammar Modern theroretical developments: –Structural syntax (Tesnière)
Dependency Parsing Some slides are based on:
Statistical NLP: Lecture 3
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
Semantic Role Labeling Abdul-Lateef Yussiff
CS 388: Natural Language Processing: Statistical Parsing
Probabilistic Parsing Chapter 14, Part 2 This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney.
A Joint Model For Semantic Role Labeling Aria Haghighi, Kristina Toutanova, Christopher D. Manning Computer Science Department Stanford University.
Learning with Probabilistic Features for Improved Pipeline Models Razvan C. Bunescu Electrical Engineering and Computer Science Ohio University Athens,
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Natural Language Processing
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
Features and Unification
1/17 Probabilistic Parsing … and some other approaches.
Fall 2004 Lecture Notes #5 EECS 595 / LING 541 / SI 661 Natural Language Processing.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
PARSING David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.
Hand video 
1 CS546: Machine Learning and Natural Language Preparation to the Term Project: - Dependency Parsing - Dependency Representation for Semantic Role Labeling.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
1 Data-Driven Dependency Parsing. 2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn.
Corpus-based computational linguistics or computational corpus linguistics? Joakim Nivre Uppsala University Department of Linguistics and Philology.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Inductive Dependency Parsing Joakim Nivre
Dependency Parsing Prashanth Mannem
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Albert Gatt Corpora and Statistical Methods Lecture 11.
Dependency Parser for Swedish Project for EDA171 by Jonas Pålsson Marcus Stamborg.
CSA2050 Introduction to Computational Linguistics Parsing I.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
NLP. Introduction to NLP Motivation –A lot of the work is repeated –Caching intermediate results improves the complexity Dynamic programming –Building.
Supertagging CMSC Natural Language Processing January 31, 2006.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Dependency Parsing Parsing Algorithms Peng.Huang
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Slides are from Rebecca Hwa, Ray Mooney.
GRAMMARS David Kauchak CS457 – Spring 2011 some slides adapted from Ray Mooney.
NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.
NLP. Introduction to NLP #include int main() { int n, reverse = 0; printf("Enter a number to reverse\n"); scanf("%d",&n); while (n != 0) { reverse =
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
Computational lexicology, morphology and syntax Diana Trandabă
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Statistical Parsing IP disclosure: Content borrowed from J&M 3 rd edition and Raymond Mooney.
Natural Language Processing Vasile Rus
COSC 6336 Natural Language Processing Statistical Parsing
Raymond J. Mooney University of Texas at Austin
CSC 594 Topics in AI – Natural Language Processing
Authorship Attribution Using Probabilistic Context-Free Grammars
Statistical NLP Spring 2011
CS 388: Natural Language Processing: Statistical Parsing
Probabilistic and Lexicalized Parsing
LING/C SC 581: Advanced Computational Linguistics
Probabilistic and Lexicalized Parsing
David Kauchak CS159 – Spring 2019
Probabilistic Parsing
David Kauchak CS159 – Spring 2019
Presentation transcript:

NLP

Introduction to NLP

The probabilities don’t depend on the specific words –E.g., give someone something (2 arguments) vs. see something (1 argument) It is not possible to disambiguate sentences based on semantic information –E.g., eat pizza with pepperoni vs. eat pizza with fork Lexicalized grammars - idea –Use the head of a phrase as an additional source of information –VP[ate] -> V[ate]

cake with fork ate child ate

Generative, lexicalized model Types of rules –LHS → L n L n  1 …L 1 H R 1 …R m  1 R m –H gets generated first –L gets generated next –R gets generated last

Maximum likelihood estimates Smoothing smoothedP (PP of-IN | VP think-VB ) = 1 P (PP of-IN | VP think-VB ) P (PP of-IN | VP- VB ) + (1  1  2 ) P (PP of-IN | VP)) P ML (PPof-IN | VPthink-VB) = Count (PPof-IN right of the head VPthink-VB) / Count (symbols right of the head VPthink-VB)

Sparseness of training data –Many probabilities are difficult to estimate from the Penn Treebank Combinatorial explosion –Need for parameterization

A parser may return many parses of a sentence, with small differences in probabilities The top returned parse may not necessarily be the best because the PCFG may be deficient Other considerations may need to be taken into account –parse tree depth –left attachment vs. right attachment –discourse structure Can you think of others features that may affect the reranking?

Considerations that may affect the reranking –parse tree depth –left attachment vs. right attachment –discourse structure Can you think of others? –consistency across sentences –or other stages of the NLU pipeline

n-best list –Get the parser to produce a list of n-best parses (where n can be in the thousands) reranking –Train a discriminative classifier to rerank these parses based on external information such as a bigram probability score or the amount of right branching in the tree

F1 (sentences <= 40 words) –Charniak (2000) – 90.1% –Charniak and Johnson (2005) – 92% (discriminative reranking)

NLP

Introduction to NLP

blue –modifier, dependent, child, subordinate house –head, governor, parent, regent bluehouse

Unionized workers are usually better paid than their non-union counterparts

Unionized workers are usually better paid than their non-union counterparts

Unionized workers are usually better paid than their non-union counterparts

Unionized workers are usually better paid than their non-union counterparts

Unionized workers are usually better paid than their non-union counterparts PRP$ JJ NNSVBPRB RBR IN VBN JJ NNS amodposs nsubjpass auxpassadvmod prep poss amod

Unionized workers are usually better paid than their non-union counterparts PRP$ JJ NNS VBPRB RBR IN VBN JJ NNS NP PP ADVP VP S

Characteristics –Lexical/syntactic dependencies between words –The top-level predicate of a sentence is the root –Simpler to parse than context-free grammars –Particularly useful for free word order languages

H=head, M=modifier –H determines the syntactic category of the construct –H determines the semantic category of the construct –H is required; M may be skipped –Fixed linear position of M with respect to H

Dynamic programming –CKY – similar to lexicalized PCFG, cubic complexity (Eisner 96) Mary likes cats likes Mary likes cats → nsubj dobj nsubjdobj

Constraint-based methods –Maruyama 1990, Karlsson 1990 –Example word(pos(x)) = DET ⇒ (label(X) = NMOD, word(mod(x)) = NN, pos(x) < mod(x)) A determiner (DET) modifies a noun (NN) on the right with the label NMOD. –NP complete problem; heuristics needed Constraint graph –For initial constraint graph using a core grammar: nodes, domains, constraints –Find an assignment that doesn’t contradict any constraints. If more than one assignment exists, add more constraints.

Deterministic parsing –Covington 2001 –MaltParser by Nivre shift/reduce as in a shift/reduce parser reduce creates dependencies with the head on either the left or the right Graph-based methods –Maximum spanning trees (MST) MST Parser by McDonald et al.

1 Where Where WRB WRB - 2 SBJ did did VBD VBD - 0 ROOT you you PRP PRP - 4 SBJ come come VBP VBP - 2 VC from from IN IN - 4 DIR last last JJ JJ - 7 NMOD year year NN NN - 5 PMOD ? ? P - - did Where youfrom come last year Output of (the non-projective) MSTParser advmod(come-4, Where-1) aux(come-4, did-2) nsubj(come-4, you-3) root(ROOT-0, come-4) prep(come-4, from-5) amod(year-7, last-6) pobj(from-5, year-7) Output of Stanford parser

Background –McDonald et al Projectivity –English dependency trees are mostly projective (can be drawn without crossing dependencies). –Other languages are not. Idea –Dependency parsing is equivalent to search for a maximum spanning tree in a directed graph. –Chu and Liu (1965) and Edmonds (1967) give an efficient algorithm for finding MST for directed graphs.

Consider the sentence “John saw Mary” The Chu-Liu-Edmonds algorithm gives the MST on the right hand side (right). This is in general a non-projective tree root John saw Mary root John saw Mary

Very similar to shift-reduce parsing. It includes the following components –A stack –A buffer –Set of dependencies (arcs) The reduce operations combine an element from the stack and one from the buffer Arc-eager parser –The actions are shift, reduce, left-arc, right-arc

[Example from Nivre and Kuebler]

People want to be free – [ROOT] [People, want, to, be, free] Ø –Shift [ROOT, People] [want, to, be, free] –LA nsubj [ROOT] [want, to, be, free] A 1 = {nsubj(want, people)} –RA root [ROOT, want][to, be, free] A 2 = A 1 ∪ {root(ROOT, want)} The next action is chosen using a classifier There is no search The final list of arcs is returned as the dependency tree Very fast method

Labeled dependency accuracy # correct deps/# deps 1 Unionized Unionized VBN VBN - 2 NMOD workers workers NNS NNS - 3 SBJ are are VBP VBP - 0 ROOT usually usually RB RB - 3 TMP better better RBR RBR - 4 ADV paid paid VBN VBN - 5 AMOD than than IN IN - 5 AMOD their their PRP$ PRP$ - 10 NMOD non-union non-union JJ JJ - 10 NMOD counterparts counterparts NNS NNS - 7 PMOD - -

Projective (CKY) O(n 5 ) Projective (Eisner) O(n 3 ) Non-projective (MST - Chu-Liu-Edmonds) O(n 2 ) Projective (Malt) O(n)

[Erkan et al. 2007]

[Bunescu and Mooney 2005]

–CONLL-X Shared task –Prague Dependency Treebank –Joakim Nivre’s Maltparser –Dekang Lin’s Minipar –Daniel Sleator and Davy Temperley’s Link parser

NLP