LING 581: Advanced Computational Linguistics Lecture Notes March 2nd
Report on Homework Task Part 1 – Run the examples you showed on your slides from Homework Task 1 using the Bikel Collins parser. – Evaluate how close the parses are to the “gold standard” Part 2 – WSJ corpus: sections 00 through 24 – Evaluation: on section 23 – Training: normally (20 sections) – How does the Bikel Collins vary in accuracy if you randomly pick 1, 2, 3,…20 sections to do the training with… plot graph with evalb…
Results Part 2: doesn’t seem to require all 20 sections to achieve its (limit) performance But on the other hand, modifying even one training example can change a parse …
Last Time: Sensitivity to perturbation Often assumed that statistical models – are less brittle than symbolic models parses for ungrammatical data are they sensitive to noise or small perturbations? (high) (low)
Last Time: Sensitivity to perturbation PP attachment (frequency 1) in the WSJ Just one sentence out of 39,832 training examples can affect attachment (mod ((with IN) (milk NN) PP (+START+) ((+START+ +START+)) NP-A NPB () false right) 1.0) Recorded event in wsj observed.gz
Last Time: An experiment with passives Comparison: – Wow! as object plus passive morphology – Wow! Inserted as NP object trace – Baseline (passive morphology) – Wow! 4 th word
Verb Alternations Verb alternations – range of VP frames for a given verb (sense) – There are VPs in the PTB – Q: what kinds of frames are attested in the PTB? Example: – spray/load or locative alternation (Levin 1993): – (1) a. Sharon sprayed water on the plants – b. Sharon sprayed the plants with water – (2) a. The farmer loaded apples into the cart – b. The farmer loaded the cart with apples – cf. fill and cover, dump and pour
Verb Alternations
Reference Book: EVCA Contains listings of verbs classified by: – alternations (Part 1) – semantic classes (Part 2) Book contains an index of verbs: – references sections of the book – 3104 verbs listed – (thumb drive) evca93.index abandon 51.2 abash 1.2.5, , 31.1 abate , 45.4 abduct 2.2, 2.3.2, 10.5 abhor 2.10, , , , 31.2 abound 2.3.4, absent 8.2 absolve 2.3.2, 10.6 abstract 2.3.2, 10.1 abuse 2.10, , , , 33 abut 47.8 accelerate , 45.4 accept 2.2, 2.14, , 29.2 acclaim 2.10, , , , 33 accompany 51.7 accord 2.1 accumulate 2.2, 2.3.4, 6.1, 6.2, , acetify , 45.4 ache 31.3, 32.2, acidify , 45.4 acknowledge 2.1, 2.14, 29.1
Bikel Collins Raw Output Example PROB TOP S INTJ UH 0 No NP NPB PRP 0 it VP VBD 0 was RB 0 n't NP NPB NNP 0 Black NNP 0 Monday (TOP~was~1~1 (S~was~3~3 (INTJ~No~1~1 No/UH,/PUNC, ) (NPB~it~1~1 it/PRP ) (VP~was~3~1 was/VBD n't/RB (NPB~Monday~2~2 Black/NNP Monday/NNP./PUNC. ) ) ) ) TIME 1 The "raw" output format is as follows: First line is "PROB num_edges_in_chart log_prob 0" e.g. PROB Next few lines are the parse tree printed, one word per line, with log probs on each constituent Next line is the full parse output Final line is "TIME time" e.g. "TIME 10" meaning the parse took 10 seconds
Homework Task Pick verbs that exist in EVCA and also in the PTB Produce a report that compares EVCA with what is present in the corpus
Case Study: join
Example first (non-light) verb is “join” from sentence #1: Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. /^join[esi ]/ 161 matches
Example Sentence #1 – [ VP join NP PP-CLR NP-TMP] a general temporal adjunct, not part of join verb frame
The verb “join” Look for VP nodes that: – immediately dominates VB*, and – that VB* is the 1st child
Matches (143) 1 join [NP,PP-CLR,NP-TMP] 76 joined [NP,PP] 486 join [NP] 488 join [NP] 952 join [NP] 993 joined [NP] 1219 joins [NP,PP-TMP] 1877 joined [NP] 1940 joining [NP,PP] 2189 joined [NP,PP-TMP] 2370 joins [NP,PP-CLR] 2401 joining [NP,PP-LOC] 2417 joining [NP,PP-CLR] 3983 joining [PP-CLR,PP-LOC] 4131 join [NP] 5027 join [NP] 5421 joined [NP,PP-TMP] 5708 joining [NP] 5710 joins [NP-TMP,,,S-ADV] 5824 joins [PRT,PP-CLR] 6044 joined [NP,PP-TMP] 6849 joined [NP] 7274 join [NP] 7673 joined [NP] 8500 joined [PP-LOC,S-PRP] 8850 joined [NP,ADVP-TMP] 8965 joined [NP] 9198 join [NP] 9213 join [NP] 9926 joins [NP,PP] joining [NP] join [NP] join [NP] joining [PP-TMP,S-PRP] joining [NP,ADVP-MNR,NP-TMP] join [SBAR-TMP] joining [NP,PP-TMP] joined [NP] join [] joined [PP] joining [NP,PP-TMP,PP] joined [PP-CLR,PP] joining [NP] joined [NP] joined [NP,PP] joined [NP,PP,PP-TMP,,,PP-TMP] join [NP] join [NP,PP] join [NP] joined [NP,PP-TMP,PP] joining [NP] joined [NP] joined [PP-TMP,PP-CLR] join [NP] joined [NP,PP-LOC] joined [NP,PP-LOC] joined [NP] joined [NP,PP-CLR] joined [NP,PP-CLR,PP-TMP,ADVP-TMP] joins [NP] joins [NP,PP] joining [NP,PP-TMP,S-PRP] join [NP] join [NP,,,PP] joined [NP,PP-LOC] join [NP] join [PP-CLR,S-PRP] joining [NP,PP] join [NP,PP,PP-TMP] joined [NP,PP-CLR] joining [NP] join [] joined [NP] joined [NP-CLR,PP-CLR] joining [NP] joined [NP,PP] joined [ADVP-CLR,S-PRP] joining [NP,PP-TMP] join [NP] joining [NP,PP-TMP] joins [NP] joined [NP,PP-CLR,ADVP-TMP] join [NP] joined [NP,SBAR-TMP] joined [NP,PP-LOC] join [NP] join [NP] join [NP,PP] joining [SBAR-NOM] joined [NP,PP-LOC] joined [NP,PP] joined [NP,PP] join [NP,PP] join [NP] joined [NP] joined [NP] join [NP] join [PRT] joined [NP,PP-TMP] join [PP-CLR] joining [NP] join [NP] joined [NP] join [NP] join [NP-TMP] joining [NP,ADVP-TMP] joined [NP,ADVP-TMP] joined [PP-CLR,PP] joins [PP-CLR] joining [NP,PP] join [PP-CLR] joining [PRT,PP-CLR] join [NP] joined [NP,ADVP-TMP] joining [NP,PP-TMP] joined [NP] join [NP,ADVP-TMP,ADVP-PRP] joined [NP,PP,PP-TMP] join [NP] join [] joined [PP-CLR,S-CLR] joined [NP,PP-TMP] joining [NP,PP-TMP] join [NP,PP,NP-TMP] join [NP,PP-LOC] joined [NP,PP] joins [NP,,,PP] join [NP] join [NP] joined [NP,PP-CLR,PP-LOC] join [NP,ADVP] joining [NP] joined [NP,PP] join [NP] joined [NP] joining [NP] joining [NP] joined [NP] join [NP,PP] joined [NP,,,S-ADV] join [NP] join [NP,PP-LOC] joined [NP]
Some Caveats Some verbs have multiple senses... Not all instances of a category label hold the same “semantic role” To be precise, we’d have to view each tree and label each node with a semantic role very carefully To get a rough idea, let’s just conflate category labels
Patterns (39) 1 [NP,PP-CLR,NP-TMP] 1 [PP-CLR,PP-LOC] 1 [NP-TMP,,,S-ADV] 1 [PP-LOC,S-PRP] 1 [PP-TMP,S-PRP] 1 [NP,ADVP-MNR,NP-TMP] 1 [SBAR-TMP] 1 [NP-CLR,PP-CLR] 1 [ADVP-CLR,S-PRP] 1 [NP,PP-CLR,ADVP-TMP] 1 [NP,SBAR-TMP] 1 [SBAR-NOM] 1 [PRT] 1 [NP-TMP] 3 [PP-CLR] 2 [PRT,PP-CLR] 4 [NP,ADVP-TMP] 1 [NP,ADVP-TMP,ADVP-PRP] 1 [PP-CLR,S-CLR] 1 [NP,PP,NP-TMP] 1 [NP,PP-CLR,PP-LOC] 1 [NP,ADVP] 1 [NP,,,S-ADV] 11 [NP,PP-TMP] 3 [] 1 [PP] 2 [PP-CLR,PP] 1 [NP,PP,PP-TMP,,,PP-TMP] 2 [NP,PP-TMP,PP] 1 [PP-TMP,PP-CLR] 1 [NP,PP-CLR,PP-TMP,ADVP-TMP] 1 [NP,PP-TMP,S-PRP] 2 [NP,,,PP] 8 [NP,PP-LOC] 1 [PP-CLR,S-PRP] 16 [NP,PP] 2 [NP,PP,PP-TMP] 4 [NP,PP-CLR] 58 [NP]
Patterns (27) 1 [PP-CLR,PP-LOC] 1 [,,S-ADV] 1 [PP-LOC,S-PRP] 1 [S-PRP] 1 [NP,ADVP-MNR] 1 [NP-CLR,PP-CLR] 1 [ADVP-CLR,S-PRP] 1 [SBAR-NOM] 1 [PRT] 2 [PRT,PP-CLR] 1 [NP,ADVP-PRP] 1 [PP-CLR,S-CLR] 1 [NP,PP-CLR,PP-LOC] 1 [NP,ADVP] 1 [NP,,,S-ADV] 5 [] 1 [PP] 2 [PP-CLR,PP] 1 [NP,PP,,] 4 [PP-CLR] 1 [NP,S-PRP] 2 [NP,,,PP] 8 [NP,PP-LOC] 1 [PP-CLR,S-PRP] 21 [NP,PP] 7 [NP,PP-CLR] 74 [NP]
1 [NP,PP,,] 1 [NP,PP,PP-TMP,,,PP-TMP]
Case Mr. Craven joined Morgan Grenfell as group chief executive in May 1987, a few months after the resignations of former Chief Executive Christopher Reeves and other top officials because of the merchant bank 's role in Guinness PLC 's controversial takeover of Distiller 's Co. in [ VP joined NP PP PP-TMP, PP-TMP]
Delete Comma Nodes
Patterns (25) 1 [PP-CLR,PP-LOC] 1 [S-ADV] 1 [PP-LOC,S-PRP] 1 [S-PRP] 1 [NP,ADVP-MNR] 1 [NP-CLR,PP-CLR] 1 [ADVP-CLR,S-PRP] 1 [SBAR-NOM] 1 [PRT] 2 [PRT,PP-CLR] 1 [NP,ADVP-PRP] 1 [PP-CLR,S-CLR] 1 [NP,PP-CLR,PP-LOC] 1 [NP,ADVP] 1 [NP,S-ADV] 5 [] 1 [PP] 2 [PP-CLR,PP] 4 [PP-CLR] 1 [NP,S-PRP] 8 [NP,PP-LOC] 1 [PP-CLR,S-PRP] 24 [NP,PP] 7 [NP,PP-CLR] 74 [NP]
PP-LOC Cases 2401, 3983 and 43738
ADVP and -MNR Cases (ADVP-MNR) and (ADVP)
-ADV adverbial Cases 5710 (S-ADV) and (S-ADV)
-PRP purpose Case (S-PRP)
Patterns Delete ADV(P), -LOC, -MNR, _PRP – 1 [NP-CLR,PP-CLR] – 1 [SBAR-NOM] – 1 [PRT] – 2 [PRT,PP-CLR] – 1 [PP-CLR,S-CLR] – 9 [] – 1 [PP] – 2 [PP-CLR,PP] – 6 [PP-CLR] – 24 [NP,PP] – 8 [NP,PP-CLR] – 87 [NP] Delete ADV(P) but not anything with -CLR, -LOC, -MNR, _PRP – 1 [NP-CLR,PP-CLR] – 1 [ADVP-CLR] – 1 [SBAR-NOM] – 1 [PRT] – 2 [PRT,PP-CLR] – 1 [PP-CLR,S-CLR] – 8 [] – 1 [PP] – 2 [PP-CLR,PP] – 6 [PP-CLR] – 24 [NP,PP] – 8 [NP,PP-CLR] – 87 [NP] can’t simply delete ADV everywhere
SBAR-NOM headless relative Case 16409
-CLR closely related (“middle ground between arguments and adjuncts”) Cases and (PP-CLR PP)
PRT particle Cases 18521, and 5824 do we treat join up/in as different from join?
EVCA Join belongs to section 22.1 Mix verbs Syntactic frames: – NP PP-with – NP-and (together) – PP-with – [] – ADJ PP-with – ADJ (together)
WSJ PTB vs. EVCA WSJ PTB – 1 [NP-CLR,PP-CLR] – 1 [ADVP-CLR] – 1 [PP-CLR,S-CLR] – 8 [] – 1 [PP] – 2 [PP-CLR,PP] – 6 [PP-CLR] – 24 [NP,PP] – 8 [NP,PP-CLR] – 87 [NP] EVCA – NP PP-with – PP-with – [] – NP-and (together) – ADJ PP-with – ADJ (together) Note: ADJ is JJ in PTB tagset
Further work on WSJ PTB PP-CLR for join always headed by with? – in 4 – with 11 – as 5 PP for join headed by? – for 1 – upon 1 – by 3 – on 1 – from 4 – in 8 – as 8 with is always a PP-CLR for join