LING 581: Advanced Computational Linguistics Lecture Notes March 2nd.

Slides:



Advertisements
Similar presentations
Expectation Maximization Dekang Lin Department of Computing Science University of Alberta.
Advertisements

LING 581: Advanced Computational Linguistics Lecture Notes February 2nd.
Albert Gatt Corpora and Statistical Methods Lecture 11.
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
Semantic Role Labeling Abdul-Lateef Yussiff
Probabilistic Parsing Chapter 14, Part 2 This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney.
A Joint Model For Semantic Role Labeling Aria Haghighi, Kristina Toutanova, Christopher D. Manning Computer Science Department Stanford University.
Towards Parsing Unrestricted Text into PropBank Predicate- Argument Structures ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey.
LING 581: Advanced Computational Linguistics Lecture Notes January 19th.
LING 581: Advanced Computational Linguistics Lecture Notes February 23rd.
LING 581: Advanced Computational Linguistics Lecture Notes March 9th.
Using Treebanks tgrep2 Lecture 2: 07/12/2011. Using Corpora For discovery For evaluation of theories For identifying tendencies – distribution of a class.
Introduction to treebanks Session 1: 7/08/
LING 581: Advanced Computational Linguistics Lecture Notes February 16th.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Partial Prebracketing to Improve Parser Performance John Judge NCLT Seminar Series 7 th December 2005.
Recovering empty categories. Penn Treebank The Penn Treebank Project annotates naturally occurring text for linguistic structure. It produces skeletal.
1 More Xkwic and Tgrep LING 5200 Computational Corpus Linguistics Martha Palmer March 2, 2006.
Extracting LTAGs from Treebanks Fei Xia 04/26/07.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
TopicTrend By: Jovian Lin Discover Emerging and Novel Research Topics.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
LING/C SC/PSYC 438/538 Lecture 27 Sandiway Fong. Administrivia 2 nd Reminder – 538 Presentations – Send me your choices if you haven’t already.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
GALE Banks 11/9/06 1 Parsing Arabic: Key Aspects of Treebank Annotation Seth Kulick Ryan Gabbard Mitch Marcus.
LING 581: Advanced Computational Linguistics Lecture Notes February 12th.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Syntax The study of how words are ordered and grouped together Key concept: constituent = a sequence of words that acts as a unit he the man the short.
Ling 570 Day 17: Named Entity Recognition Chunking.
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with.
LING 581: Advanced Computational Linguistics Lecture Notes February 16th.
AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
LING 581: Advanced Computational Linguistics Lecture Notes February 19th.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-16: Probabilistic parsing; computing probability of.
Rules, Movement, Ambiguity
Chart Parsing and Augmenting Grammars CSE-391: Artificial Intelligence University of Pennsylvania Matt Huenerfauth March 2005.
NLP. Introduction to NLP Background –From the early ‘90s –Developed at the University of Pennsylvania –(Marcus, Santorini, and Marcinkiewicz 1993) Size.
LING 6520: Comparative Topics in Linguistics (from a computational perspective) Martha Palmer Jan 15,
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.
NLP. Parsing ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (,,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (,,) ) (VP (MD will) (VP (VB join) (NP (DT.
NLP. Introduction to NLP #include int main() { int n, reverse = 0; printf("Enter a number to reverse\n"); scanf("%d",&n); while (n != 0) { reverse =
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 3 rd.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
LING 581: Advanced Computational Linguistics Lecture Notes February 24th.
LING 581: Advanced Computational Linguistics Lecture Notes March 2nd.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Natural Language Processing Vasile Rus
Statistical NLP: Lecture 3
LING/C SC/PSYC 438/538 Lecture 23 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.
CS 388: Natural Language Processing: Statistical Parsing
LING 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
LING 581: Advanced Computational Linguistics
LING 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
LING 581: Advanced Computational Linguistics
CSCI 5832 Natural Language Processing
LING/C SC 581: Advanced Computational Linguistics
Constraining Chart Parsing with Partial Tree Bracketing
LING/C SC/PSYC 438/538 Lecture 23 Sandiway Fong.
LING/C SC 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
Presentation transcript:

LING 581: Advanced Computational Linguistics Lecture Notes March 2nd

Report on Homework Task Part 1 – Run the examples you showed on your slides from Homework Task 1 using the Bikel Collins parser. – Evaluate how close the parses are to the “gold standard” Part 2 – WSJ corpus: sections 00 through 24 – Evaluation: on section 23 – Training: normally (20 sections) – How does the Bikel Collins vary in accuracy if you randomly pick 1, 2, 3,…20 sections to do the training with… plot graph with evalb…

Results Part 2: doesn’t seem to require all 20 sections to achieve its (limit) performance But on the other hand, modifying even one training example can change a parse …

Last Time: Sensitivity to perturbation Often assumed that statistical models – are less brittle than symbolic models parses for ungrammatical data are they sensitive to noise or small perturbations? (high) (low)

Last Time: Sensitivity to perturbation PP attachment (frequency 1) in the WSJ Just one sentence out of 39,832 training examples can affect attachment (mod ((with IN) (milk NN) PP (+START+) ((+START+ +START+)) NP-A NPB () false right) 1.0) Recorded event in wsj observed.gz

Last Time: An experiment with passives Comparison: – Wow! as object plus passive morphology – Wow! Inserted as NP object trace – Baseline (passive morphology) – Wow! 4 th word

Verb Alternations Verb alternations – range of VP frames for a given verb (sense) – There are VPs in the PTB – Q: what kinds of frames are attested in the PTB? Example: – spray/load or locative alternation (Levin 1993): – (1) a. Sharon sprayed water on the plants – b. Sharon sprayed the plants with water – (2) a. The farmer loaded apples into the cart – b. The farmer loaded the cart with apples – cf. fill and cover, dump and pour

Verb Alternations

Reference Book: EVCA Contains listings of verbs classified by: – alternations (Part 1) – semantic classes (Part 2) Book contains an index of verbs: – references sections of the book – 3104 verbs listed – (thumb drive) evca93.index abandon 51.2 abash 1.2.5, , 31.1 abate , 45.4 abduct 2.2, 2.3.2, 10.5 abhor 2.10, , , , 31.2 abound 2.3.4, absent 8.2 absolve 2.3.2, 10.6 abstract 2.3.2, 10.1 abuse 2.10, , , , 33 abut 47.8 accelerate , 45.4 accept 2.2, 2.14, , 29.2 acclaim 2.10, , , , 33 accompany 51.7 accord 2.1 accumulate 2.2, 2.3.4, 6.1, 6.2, , acetify , 45.4 ache 31.3, 32.2, acidify , 45.4 acknowledge 2.1, 2.14, 29.1

Bikel Collins Raw Output Example PROB TOP S INTJ UH 0 No NP NPB PRP 0 it VP VBD 0 was RB 0 n't NP NPB NNP 0 Black NNP 0 Monday (TOP~was~1~1 (S~was~3~3 (INTJ~No~1~1 No/UH,/PUNC, ) (NPB~it~1~1 it/PRP ) (VP~was~3~1 was/VBD n't/RB (NPB~Monday~2~2 Black/NNP Monday/NNP./PUNC. ) ) ) ) TIME 1 The "raw" output format is as follows: First line is "PROB num_edges_in_chart log_prob 0" e.g. PROB Next few lines are the parse tree printed, one word per line, with log probs on each constituent Next line is the full parse output Final line is "TIME time" e.g. "TIME 10" meaning the parse took 10 seconds

Homework Task Pick verbs that exist in EVCA and also in the PTB Produce a report that compares EVCA with what is present in the corpus

Case Study: join

Example first (non-light) verb is “join” from sentence #1: Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. /^join[esi ]/ 161 matches

Example Sentence #1 – [ VP join NP PP-CLR NP-TMP] a general temporal adjunct, not part of join verb frame

The verb “join” Look for VP nodes that: – immediately dominates VB*, and – that VB* is the 1st child

Matches (143) 1 join [NP,PP-CLR,NP-TMP] 76 joined [NP,PP] 486 join [NP] 488 join [NP] 952 join [NP] 993 joined [NP] 1219 joins [NP,PP-TMP] 1877 joined [NP] 1940 joining [NP,PP] 2189 joined [NP,PP-TMP] 2370 joins [NP,PP-CLR] 2401 joining [NP,PP-LOC] 2417 joining [NP,PP-CLR] 3983 joining [PP-CLR,PP-LOC] 4131 join [NP] 5027 join [NP] 5421 joined [NP,PP-TMP] 5708 joining [NP] 5710 joins [NP-TMP,,,S-ADV] 5824 joins [PRT,PP-CLR] 6044 joined [NP,PP-TMP] 6849 joined [NP] 7274 join [NP] 7673 joined [NP] 8500 joined [PP-LOC,S-PRP] 8850 joined [NP,ADVP-TMP] 8965 joined [NP] 9198 join [NP] 9213 join [NP] 9926 joins [NP,PP] joining [NP] join [NP] join [NP] joining [PP-TMP,S-PRP] joining [NP,ADVP-MNR,NP-TMP] join [SBAR-TMP] joining [NP,PP-TMP] joined [NP] join [] joined [PP] joining [NP,PP-TMP,PP] joined [PP-CLR,PP] joining [NP] joined [NP] joined [NP,PP] joined [NP,PP,PP-TMP,,,PP-TMP] join [NP] join [NP,PP] join [NP] joined [NP,PP-TMP,PP] joining [NP] joined [NP] joined [PP-TMP,PP-CLR] join [NP] joined [NP,PP-LOC] joined [NP,PP-LOC] joined [NP] joined [NP,PP-CLR] joined [NP,PP-CLR,PP-TMP,ADVP-TMP] joins [NP] joins [NP,PP] joining [NP,PP-TMP,S-PRP] join [NP] join [NP,,,PP] joined [NP,PP-LOC] join [NP] join [PP-CLR,S-PRP] joining [NP,PP] join [NP,PP,PP-TMP] joined [NP,PP-CLR] joining [NP] join [] joined [NP] joined [NP-CLR,PP-CLR] joining [NP] joined [NP,PP] joined [ADVP-CLR,S-PRP] joining [NP,PP-TMP] join [NP] joining [NP,PP-TMP] joins [NP] joined [NP,PP-CLR,ADVP-TMP] join [NP] joined [NP,SBAR-TMP] joined [NP,PP-LOC] join [NP] join [NP] join [NP,PP] joining [SBAR-NOM] joined [NP,PP-LOC] joined [NP,PP] joined [NP,PP] join [NP,PP] join [NP] joined [NP] joined [NP] join [NP] join [PRT] joined [NP,PP-TMP] join [PP-CLR] joining [NP] join [NP] joined [NP] join [NP] join [NP-TMP] joining [NP,ADVP-TMP] joined [NP,ADVP-TMP] joined [PP-CLR,PP] joins [PP-CLR] joining [NP,PP] join [PP-CLR] joining [PRT,PP-CLR] join [NP] joined [NP,ADVP-TMP] joining [NP,PP-TMP] joined [NP] join [NP,ADVP-TMP,ADVP-PRP] joined [NP,PP,PP-TMP] join [NP] join [] joined [PP-CLR,S-CLR] joined [NP,PP-TMP] joining [NP,PP-TMP] join [NP,PP,NP-TMP] join [NP,PP-LOC] joined [NP,PP] joins [NP,,,PP] join [NP] join [NP] joined [NP,PP-CLR,PP-LOC] join [NP,ADVP] joining [NP] joined [NP,PP] join [NP] joined [NP] joining [NP] joining [NP] joined [NP] join [NP,PP] joined [NP,,,S-ADV] join [NP] join [NP,PP-LOC] joined [NP]

Some Caveats Some verbs have multiple senses... Not all instances of a category label hold the same “semantic role” To be precise, we’d have to view each tree and label each node with a semantic role very carefully To get a rough idea, let’s just conflate category labels

Patterns (39) 1 [NP,PP-CLR,NP-TMP] 1 [PP-CLR,PP-LOC] 1 [NP-TMP,,,S-ADV] 1 [PP-LOC,S-PRP] 1 [PP-TMP,S-PRP] 1 [NP,ADVP-MNR,NP-TMP] 1 [SBAR-TMP] 1 [NP-CLR,PP-CLR] 1 [ADVP-CLR,S-PRP] 1 [NP,PP-CLR,ADVP-TMP] 1 [NP,SBAR-TMP] 1 [SBAR-NOM] 1 [PRT] 1 [NP-TMP] 3 [PP-CLR] 2 [PRT,PP-CLR] 4 [NP,ADVP-TMP] 1 [NP,ADVP-TMP,ADVP-PRP] 1 [PP-CLR,S-CLR] 1 [NP,PP,NP-TMP] 1 [NP,PP-CLR,PP-LOC] 1 [NP,ADVP] 1 [NP,,,S-ADV] 11 [NP,PP-TMP] 3 [] 1 [PP] 2 [PP-CLR,PP] 1 [NP,PP,PP-TMP,,,PP-TMP] 2 [NP,PP-TMP,PP] 1 [PP-TMP,PP-CLR] 1 [NP,PP-CLR,PP-TMP,ADVP-TMP] 1 [NP,PP-TMP,S-PRP] 2 [NP,,,PP] 8 [NP,PP-LOC] 1 [PP-CLR,S-PRP] 16 [NP,PP] 2 [NP,PP,PP-TMP] 4 [NP,PP-CLR] 58 [NP]

Patterns (27) 1 [PP-CLR,PP-LOC] 1 [,,S-ADV] 1 [PP-LOC,S-PRP] 1 [S-PRP] 1 [NP,ADVP-MNR] 1 [NP-CLR,PP-CLR] 1 [ADVP-CLR,S-PRP] 1 [SBAR-NOM] 1 [PRT] 2 [PRT,PP-CLR] 1 [NP,ADVP-PRP] 1 [PP-CLR,S-CLR] 1 [NP,PP-CLR,PP-LOC] 1 [NP,ADVP] 1 [NP,,,S-ADV] 5 [] 1 [PP] 2 [PP-CLR,PP] 1 [NP,PP,,] 4 [PP-CLR] 1 [NP,S-PRP] 2 [NP,,,PP] 8 [NP,PP-LOC] 1 [PP-CLR,S-PRP] 21 [NP,PP] 7 [NP,PP-CLR] 74 [NP]

1 [NP,PP,,] 1 [NP,PP,PP-TMP,,,PP-TMP]

Case Mr. Craven joined Morgan Grenfell as group chief executive in May 1987, a few months after the resignations of former Chief Executive Christopher Reeves and other top officials because of the merchant bank 's role in Guinness PLC 's controversial takeover of Distiller 's Co. in [ VP joined NP PP PP-TMP, PP-TMP]

Delete Comma Nodes

Patterns (25) 1 [PP-CLR,PP-LOC] 1 [S-ADV] 1 [PP-LOC,S-PRP] 1 [S-PRP] 1 [NP,ADVP-MNR] 1 [NP-CLR,PP-CLR] 1 [ADVP-CLR,S-PRP] 1 [SBAR-NOM] 1 [PRT] 2 [PRT,PP-CLR] 1 [NP,ADVP-PRP] 1 [PP-CLR,S-CLR] 1 [NP,PP-CLR,PP-LOC] 1 [NP,ADVP] 1 [NP,S-ADV] 5 [] 1 [PP] 2 [PP-CLR,PP] 4 [PP-CLR] 1 [NP,S-PRP] 8 [NP,PP-LOC] 1 [PP-CLR,S-PRP] 24 [NP,PP] 7 [NP,PP-CLR] 74 [NP]

PP-LOC Cases 2401, 3983 and 43738

ADVP and -MNR Cases (ADVP-MNR) and (ADVP)

-ADV adverbial Cases 5710 (S-ADV) and (S-ADV)

-PRP purpose Case (S-PRP)

Patterns Delete ADV(P), -LOC, -MNR, _PRP – 1 [NP-CLR,PP-CLR] – 1 [SBAR-NOM] – 1 [PRT] – 2 [PRT,PP-CLR] – 1 [PP-CLR,S-CLR] – 9 [] – 1 [PP] – 2 [PP-CLR,PP] – 6 [PP-CLR] – 24 [NP,PP] – 8 [NP,PP-CLR] – 87 [NP] Delete ADV(P) but not anything with -CLR, -LOC, -MNR, _PRP – 1 [NP-CLR,PP-CLR] – 1 [ADVP-CLR] – 1 [SBAR-NOM] – 1 [PRT] – 2 [PRT,PP-CLR] – 1 [PP-CLR,S-CLR] – 8 [] – 1 [PP] – 2 [PP-CLR,PP] – 6 [PP-CLR] – 24 [NP,PP] – 8 [NP,PP-CLR] – 87 [NP] can’t simply delete ADV everywhere

SBAR-NOM headless relative Case 16409

-CLR closely related (“middle ground between arguments and adjuncts”) Cases and (PP-CLR PP)

PRT particle Cases 18521, and 5824 do we treat join up/in as different from join?

EVCA Join belongs to section 22.1 Mix verbs Syntactic frames: – NP PP-with – NP-and (together) – PP-with – [] – ADJ PP-with – ADJ (together)

WSJ PTB vs. EVCA WSJ PTB – 1 [NP-CLR,PP-CLR] – 1 [ADVP-CLR] – 1 [PP-CLR,S-CLR] – 8 [] – 1 [PP] – 2 [PP-CLR,PP] – 6 [PP-CLR] – 24 [NP,PP] – 8 [NP,PP-CLR] – 87 [NP] EVCA – NP PP-with – PP-with – [] – NP-and (together) – ADJ PP-with – ADJ (together) Note: ADJ is JJ in PTB tagset

Further work on WSJ PTB PP-CLR for join always headed by with? – in 4 – with 11 – as 5 PP for join headed by? – for 1 – upon 1 – by 3 – on 1 – from 4 – in 8 – as 8 with is always a PP-CLR for join