Statistical NLP Winter 2009

Slides:



Advertisements
Similar presentations
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Advertisements

Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”
Albert Gatt Corpora and Statistical Methods Lecture 13.
Statistical NLP Winter 2008 Lecture 5: word-sense disambiguation Roger Levy Thanks to Jason Eisner and Dan Klein.
Statistical NLP: Lecture 3
A Joint Model For Semantic Role Labeling Aria Haghighi, Kristina Toutanova, Christopher D. Manning Computer Science Department Stanford University.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
I256 Applied Natural Language Processing Fall 2009
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
9/8/20151 Natural Language Processing Lecture Notes 1.
Lecture 12: Deep semantics dependencies & semantic roles StatNLP, Winter 2008 Roger Levy UCSD Linguistics.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Intro to NLP - J. Eisner1 Grouping Words.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Improving Subcategorization Acquisition using Word Sense Disambiguation Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
Statistical NLP Spring 2010 Lecture 5: WSD / Maxent Dan Klein – UC Berkeley.
Linguistic Essentials
Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text.
Rules, Movement, Ambiguity
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Supertagging CMSC Natural Language Processing January 31, 2006.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
Natural Language Processing Vasile Rus
Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”
CSC 594 Topics in AI – Natural Language Processing
Coarse-grained Word Sense Disambiguation
Deep learning David Kauchak CS158 – Fall 2016.
PRESENTED BY: PEAR A BHUIYAN
Statistical NLP: Lecture 3
Searching corpora.
Information Retrieval: Models and Methods
Natural Language Processing (NLP)
Lecture 21 Computational Lexical Semantics
Efficient Estimation of Word Representation in Vector Space
Structural relations Carnie 2013, chapter 4 Kofi K. Saah.
Probabilistic and Lexicalized Parsing
Machine Learning in Natural Language Processing
Statistical NLP: Lecture 9
CSCI 5832 Natural Language Processing
Probabilistic and Lexicalized Parsing
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
Linguistic Essentials
Ensemble learning.
Grouping Words Intro to NLP - J. Eisner.
CSCI 5832 Natural Language Processing
Natural Language Processing (NLP)
CS224N Section 3: Corpora, etc.
Probabilistic Parsing
David Kauchak CS159 – Spring 2019
Information Retrieval
Statistical NLP : Lecture 9 Word Sense Disambiguation
Statistical NLP: Lecture 10
CS249: Neural Language Model
Natural Language Processing (NLP)
Presentation transcript:

Statistical NLP Winter 2009 Lecture 13: word-sense disambiguation & semantic roles Roger Levy Thanks to Jason Eisner and Dan Klein for slides

A Concordance for “party” from www.webcorp.org.uk

A Concordance for “party” from www.webcorp.org.uk thing. She was talking at a party thrown at Daphne's restaurant in have turned it into the hot dinner-party topic. The comedy is the selection for the World Cup party, which will be announced on May 1 in the 1983 general election for a party which, when it could not bear to to attack the Scottish National Party, who look set to seize Perth and that had been passed to a second party who made a financial decision the by-pass there will be a street party. "Then," he says, "we are going number-crunchers within the Labour party, there now seems little doubt political tradition and the same party. They are both relatively Anglophilic he told Tony Blair's modernised party they must not retreat into "warm "Oh no, I'm just here for the party," they said. "I think it's terrible A future obliges each party to the contract to fulfil it by be signed by or on behalf of each party to the contract." Mr David N

What Good are Word Senses? thing. She was talking at a party thrown at Daphne's restaurant in have turned it into the hot dinner-party topic. The comedy is the selection for the World Cup party, which will be announced on May 1 in the 1983 general election for a party which, when it could not bear to to attack the Scottish National Party, who look set to seize Perth and that had been passed to a second party who made a financial decision the by-pass there will be a street party. "Then," he says, "we are going number-crunchers within the Labour party, there now seems little doubt political tradition and the same party. They are both relatively Anglophilic he told Tony Blair's modernised party they must not retreat into "warm "Oh no, I'm just here for the party," they said. "I think it's terrible A future obliges each party to the contract to fulfil it by be signed by or on behalf of each party to the contract." Mr David N

What Good are Word Senses? thing. She was talking at a party thrown at Daphne's restaurant in have turned it into the hot dinner-party topic. The comedy is the selection for the World Cup party, which will be announced on May 1 the by-pass there will be a street party. "Then," he says, "we are going "Oh no, I'm just here for the party," they said. "I think it's terrible in the 1983 general election for a party which, when it could not bear to to attack the Scottish National Party, who look set to seize Perth and number-crunchers within the Labour party, there now seems little doubt political tradition and the same party. They are both relatively Anglophilic he told Tony Blair's modernised party they must not retreat into "warm that had been passed to a second party who made a financial decision A future obliges each party to the contract to fulfil it by be signed by or on behalf of each party to the contract." Mr David N

What Good are Word Senses? Replace word w with sense s Splits w into senses: distinguishes this token of w from tokens with sense t Groups w with other words: groups this token of w with tokens of x that also have sense s

Word senses create groupings number-crunchers within the Labour party, there now seems little doubt political tradition and the same party. They are both relatively Anglophilic he told Tony Blair's modernised party they must not retreat into "warm thing. She was talking at a party thrown at Daphne's restaurant in have turned it into the hot dinner-party topic. The comedy is the selection for the World Cup party, which will be announced on May 1 the by-pass there will be a street party. "Then," he says, "we are going "Oh no, I'm just here for the party," they said. "I think it's terrible an appearance at the annual awards bash , but feels in no fit state to -known families at a fundraising bash on Thursday night for Learning Who was paying for the bash? The only clue was the name Asprey, Mail, always hosted the annual bash for the Scottish Labour front- popular. Their method is to bash sense into criminals with a short, just cut off people's heads and bash their brains out over the floor,

Word senses create groupings number-crunchers within the Labour party, there now seems little doubt political tradition and the same party. They are both relatively Anglophilic he told Tony Blair's modernised party they must not retreat into "warm thing. She was talking at a party thrown at Daphne's restaurant in have turned it into the hot dinner-party topic. The comedy is the selection for the World Cup party, which will be announced on May 1 the by-pass there will be a street party. "Then," he says, "we are going "Oh no, I'm just here for the party," they said. "I think it's terrible an appearance at the annual awards bash, but feels in no fit state to -known families at a fundraising bash on Thursday night for Learning Who was paying for the bash? The only clue was the name Asprey, Mail, always hosted the annual bash for the Scottish Labour front- popular. Their method is to bash sense into criminals with a short, just cut off people's heads and bash their brains out over the floor,

What Good are Word Senses? Semantics / Text understanding Many systems Axioms about TRANSFER apply to (some tokens of) throw Axioms about BUILDING apply to (some tokens of) bank Machine translation bass has Spanish translation of lubina or bajo depending on sense Info retrieval / Question answering / Text categ. Query or pattern might not match document exactly Backoff for just about anything what word comes next? (speech recognition, language ID, …) trigrams are sparse but tri-meanings might not be lexicalized probabilistic grammars (coming up later in the course) Speaker’s real intention is senses; words are a noisy channel

Cues to Word Sense Adjacent words (or their senses) Grammatically related words (subject, object, …) Other nearby words Topic of document Sense of other tokens of the word in the same document

More details about word Senses Words have multiple distinct meanings, or senses: Plant: living plant, manufacturing plant, … Title: name of a work, ownership document, form of address, material at the start of a film, … Many levels of sense distinctions Homonymy: totally unrelated meanings (river bank, money bank) Polysemy: related meanings (star in sky, star on tv) Systematic polysemy: productive meaning extensions (organizations to their buildings) or metaphor Sense distinctions can be extremely subtle (or not) Granularity of senses needed depends a lot on the task

Word Sense Disambiguation Example: living plant vs. manufacturing plant How do we tell these senses apart? “context” Maybe it’s just text categorization Each word sense represents a topic Run the naive-bayes classifier from last class? Bag-of-words classification works ok for noun senses 90% on classic, shockingly easy examples (line, interest, star) 80% on senseval-1 nouns 70% on senseval-1 verbs The manufacturing plant which had previously sustained the town’s economy shut down after an extended labor strike.

More on WSD A key component of machine translation English bank could be Russian банк or берег An “AI-hard” problem: Sam’s money and credit cards slipped out of his pocket as he fell off the boat. He struggled mightily and finally managed to get to the bank. Is bank a river bank or a money bank? Fortunately, simple features of local context are usually more discriminating than this!

Verb WSD Why are verbs harder? Verb Example: “Serve” Verbal senses less topical More sensitive to structure, argument choice Verb Example: “Serve” [function] The tree stump serves as a table [enable] The scandal served to increase his popularity [dish] We serve meals for the homeless [enlist] He served his country [jail] He served six years for embezzlement [tennis] It was Agassi's turn to serve [legal] He was served by the sheriff

Various Approaches to WSD Unsupervised learning Bootstrapping (Yarowsky 95) Clustering Indirect supervision From thesauri From WordNet From parallel corpora Supervised learning Most systems do some kind of supervised learning Many competing classification technologies perform about the same (it’s all about the knowledge sources you tap) Problem: training data available for only a few words

Resources WordNet SensEval SemCor OtherResources Hand-build (but large) hierarchy of word senses Basically a hierarchical thesaurus SensEval A WSD competition, of which there have been 3 iterations Training / test sets for a wide range of words, difficulties, and parts-of-speech Bake-off where lots of labs tried lots of competing approaches SemCor A big chunk of the Brown corpus annotated with WordNet senses OtherResources The Open Mind Word Expert Parallel texts Flat thesauri

Knowledge Sources c w1 w2 wn . . . So what do we need to model to handle “serve”? There are distant topical cues …. point … court ………………… serve ……… game … We could just use a Naïve Bayes classifier (same as texcat!) c w1 w2 wn . . .

Weighted Windows with NB Distance conditioning Some words are important only when they are nearby …. as …. point … court ………………… serve ……… game … …. ………………………………………… serve as…………….. Distance weighting Nearby words should get a larger vote … court …… serve as……… game …… point boost relative position i

Better Features There are smarter features: Argument selectional preference: serve NP[meals] vs. serve NP[papers] vs. serve NP[country] Subcategorization: [function] serve PP[as] [enable] serve VP[to] [tennis] serve <intransitive> [food] serve NP {PP[to]} Can capture poorly (but robustly) with local windows … but we can also use a parser and get these features explicitly Other constraints (Yarowsky 95) One-sense-per-discourse (only true for broad topical distinctions) One-sense-per-collocation (pretty reliable when it kicks in: manufacturing plant, flowering plant)

Complex Features with NB? Example: So we have a decision to make based on a set of cues: context:jail, context:county, context:feeding, … local-context:jail, local-context:meals subcat:NP, direct-object-head:meals Not clear how build a generative derivation for these because the features cannot be independent: Choose topic, then decide on having a transitive usage, then pick “meals” to be the object’s head, then generate other words? How about the words that appear in multiple features? Hard to make this work (though maybe possible) No real reason to try Washington County jail served 11,166 meals last month - a figure that translates to feeding some 120 people three times daily for 31 days.

A Discriminative Approach View WSD as a discrimination task (regression, really) Have to estimate multinomial (over senses) where there are a huge number of things to condition on History is too complex to think about this as a smoothing / back-off problem Many feature-based classification techniques out there We tend to need ones that output distributions over classes P(sense | context:jail, context:county, context:feeding, … local-context:jail, local-context:meals subcat:NP, direct-object-head:meals, ….)

Feature Representations Features are indicator functions fi which count the occurrences of certain patterns in the input We map each input to a vector of feature predicate counts Washington County jail served 11,166 meals last month - a figure that translates to feeding some 120 people three times daily for 31 days. context:jail = 1 context:county = 1 context:feeding = 1 context:game = 0 … local-context:jail = 1 local-context:meals = 1 subcat:NP = 1 subcat:PP = 0 object-head:meals = 1 object-head:ball = 0

Linear Classifiers For a pair (c,d), we take a weighted vote for each class: There are many ways to set these weights Perceptron: find a currently misclassified example, and nudge weights in the direction of a correct classification Other discriminative methods usually work in the same way: try out various weights until you maximize some objective We’ll look at an elegant probabilistic framework for weight-setting: maximum entropy Feature Food Jail Tennis context:jail -0.5 * 1 +1.2 * 1 -0.8 * 1 subcat:NP +1.0 * 1 -0.3 * 1 object-head:meals +2.0 * 1 -1.5 * 1 object-head:years = 0 -1.8 * 0 +2.1 * 0 -1.1 * 0 TOTAL +3.5 +0.7 -2.6

Interpreting language We’ve seen how to recover hierarchical structure from sentences via parsing Why on earth would we want that kind of structure? Insight: that structure is a major cue to sentence meaning We’ll look in the next few classes into how to recover aspects of those meanings

Interpreting language: today Today we’ll look at two interrelated problems 1:Semantic Roles Different verbs/nouns assign different semantic properties to the phrases that stand in structural relations with them We want to figure out how those semantic properties are assigned, and recover them for input text 2: Discontinuous/nonlocal Dependencies In some cases, a given role is assigned to a semantic unit that is discontinuous in “surface” phrase structure Role assignment & interpretation can be improved by “reuniting” the fragments in these discontinuities

Semantic Role Labeling (SRL) Characterize clauses as relations with roles: We want to know more than which NP is the subject: Relations like subject are syntactic; relations like agent or message are semantic Typical pipeline: Parse, then label roles Almost all errors locked in by parser Really, SRL is (or should be) quite a lot easier than parsing

SRL: applications You don’t have to look very far to see where SRL can be useful Example: Information Extraction: who bought what? Home Depot sells lumber Home Depot was sold by its parent company Home Depot sold its subsidiary company for $10 million Home Depot sold for $100 billion Home Depot’s lumber sold well last year.

SRL Example Gildea & Jurafsky 2002

Roles: PropBank / FrameNet FrameNet: roles shared between verbs PropBank: each verb has it’s own roles PropBank more used, because it’s layered over the treebank (and so has greater coverage, plus parses) Note: some linguistic theories postulate even fewer roles than FrameNet (e.g. 5-20 total: agent, patient, instrument, etc.)

PropBank Example

PropBank Example http://www.cs.rochester.edu/~gildea/PropBank/Sort/

FrameNet example http://framenet.icsi.berkeley.edu/index.php?option=com_wrapper&Itemid=118&frame=Motion_directional&

Shared Arguments

Path Features

Results Features: Gold vs parsed source trees Path from target to filler Filler’s syntactic type, headword, case Target’s identity Sentence voice, etc. Lots of other second-order features Gold vs parsed source trees SRL is fairly easy on gold trees Harder on automatic parses

Motivation: Computing sentence meaning How are the meanings of these sentences computed? A man arrived who I knew. Who was believed to know the answer? The story was about the children that the witch wanted to eat. Cries of pain could be heard from below. We discussed plans yesterday to redecorate the house. Add roadmap before getting anywhere! (say “I’m going to introduce the problem, then do xyz, ...”) ambiguity of “witch wanted to eat” sentence --and show how there’s no different CF tree structure for it, but how I would get two structures out of it This slide was up for too long!

Discontinuous string fragments, unified semantic meaning These sentences have a common property. They all involve a unified semantic proposition broken up into two pieces across the sentence The intervening material is part of an unrelated proposition A man arrived who I knew. Who was believed to know the answer? The story was about the children that the witch wanted to eat. Cries of pain could be heard from below. We discussed plans yesterday to redecorate the house.

a man arrived yesterday What is the state of the art in robust computation of linguistic meaning? Probabilistic context-free grammars trained on syntactically annotated corpora (Treebanks) yield robust, high-quality syntactic parse trees Nodes of these parse trees are often reliable indicators of phrases corresponding to semantic units (Gildea & Palmer 2002) Agent 0.7 0.15 0.35 0.4 0.3 0.03 0.02 0.07 Total: 0.7*0.35*0.15*0.3*0.03*0.02*0.4*0.07= 1.8510-7 Time use of tree probability was unclear use an ambiguous example for this slide to show that PCFGs do ambiguity resolution TARGET a man arrived yesterday

Dependency trees Alternatively, syntactic parse trees can directly induce dependency trees Can be interpreted pseudo-propositionally;high utility for Question Answering (Pasca and Harabagiu 2001)

Parse trees to dependency trees A man who I knew arrived. I ate what you cooked.

Parse trees to dependency trees A man who I knew arrived. A man arrived who I knew.

Limits of context-free parse trees Agent? Agent? Agent

Recovering correct dependency trees Context-free trees can’t transparently encode such non-local semantic dependency relations Treebanks typically do encode non-local dependencies in some form But nobody has used them, probably because: it’s not so clear how to do probability models for non-local dependencies Linguists think non-local dependencies are relatively rare, in English (unquantified, though!) Nobody’s really tried to do much deep semantic interpretation of parse trees yet, anyway

Today’s agenda How to quantify the frequency and difficulty of non-local dependencies? Are non-local dependencies really a big problem for English? For other languages? How can we reliably and robustly recover non-local dependencies so that: The assumptions and representation structure are as theory-neutral as possible? The recovery algorithm is independent of future semantic analysis we might want to apply? Mark transition points later in the talk according to this agenda.

Quantifying non-local dependencies Kruijff (2002) compared English, German, Dutch, and Czech treebanks for frequency of nodes-with-holes. Result: English < {Dutch,German} < Czech This metric gives no similarity standard of comparison of two trees Also, this metric can be sensitive to the details of syntactic tree construction Typed dependency evaluation provides an alternative measure; initial results suggest good correlation Say explicitly that how we’re going to quantify non-local dependency by SCORING gold-standard dependencies against CF tree dependencies “Holes” thing was pretty damn unclear

Other work on nonlocal dependencies Johnson 2002: pattern matching on CF tree parses Dienes & Dubey 2003: gap threading in a lexicalized PCFG Campbell 2004: recovery based on linguistic principles (non statistical) Schmid 2006: gap threading in an unlexicalized PCFG

Johnson 2002: labeled-bracketing style * Previously proposed evaluation metric (Johnson 2002) requires correctly identified antecedent and correct string position of relocation This metric underdetermines dependency relations * totally unclear WHNP

Proposed new metric Given a sentence S and a context-free parse tree T for S, compare the dependency tree D induced from T with the correct dependency tree for S Dependency trees can be scored by the edges connecting different words S(arrived,man) NP(man,a) VP(arrived,yesterday)

Proposed new metric Dependencies induced from a CF tree will be counted as wrong when the true dependency is non-local S(arrived,man) NP(man,a) VP(arrived,knew) RC(knew,who) S(knew,I)   S(arrived,man) NP(man,a) NP(man,knew) RC(knew,who) S(knew,I)

Two types of non-local dependencies Treebanks give us gold-standard dependency trees: Most dependencies can be read off the context-free tree structure of the annotation Non-local dependencies can be obtained from special treebank-specific annotations Two types of non-local dependency: Dislocated dependency A man arrived who I knew who I knew is related to man, NOT to arrived Shared dependency I promised to eat it I is related to BOTH promised AND eat

Nonlocal dependency annotation in treebanks Penn treebanks annotate: null complementizers (which can mediate relativization) dislocated dependencies sharing dependencies

Nonlocal dependencies in treebanks The NEGRA treebank (German) permits crossing dependencies during annotation, then algorithmically maps to a context-free representation. Null complementizers and shared dependencies are unannotated.

Nonlocal dependency, quantified distinguish between “CF deps” this page and “CF deps” next page

Nonlocal dependency, quantified (parsing done with state-of-the-art Charniak 2000 parser)

Cross-linguistic comparison (parsing done with vanilla PCFG; sharing dependencies and relativizations excluded from non-local dependency)

Underlying dependency evaluation: conclusions Non-local dependency errors increase in prominence as parser quality improves In German, non-local dependency a much more serious problem than for English. Context-free assumption less accurate on gold-standard trees for categories involving combinations of phrases (SBAR, S, VP) than for lower-level phrasal categories (NP, ADVP, ADJP)

Recovery of non-local dependencies Context-free trees can directly recover only local (non-crossing) dependencies. To recover non-local dependencies, we have three options: Treat the parsing task as initially context-free, then correct the parse tree post-hoc (Johnson 2002; present work) Incorporate non-local dependency into the category structure of parse trees (Collins 1999; Dienes & Dubey 2003). Incorporate non-local dependency into the edge structure of parse trees (Plaehn 2000; elsewhere in my thesis)

Linguistically motivated tree reshaping 1. Null complementizers (mediate relativization) A. Identify sites for null node insertion B. Find best daughter position and insert. 2. Dislocated dependencies A. Identify dislocated nodes B. find original/“deep” mother node C. Find best daughter position in mother and insert 3. Shared dependencies A. Identify sites of nonlocal shared dependency B. Identify best daughter position and insert. C. Find controller for each control locus. (Inactive for NEGRA) (Active for NEGRA) null complementizer insertion is a surprise here insert some mini-trees on the right to explain what’s going on (Inactive for NEGRA)

Full example

A context-free parse

1a: null insertion sites

1b: location of null insertions

2a: identify dislocations

2b: identify origin sites

2c: insert dislocations in origins

2c: insert dislocations in origins

3a: identify sites of non-local shared dependency

3b: insert non-local shared dependency sites

3c: find controllers of shared dependencies

End.

Application of maximum entropy models Discriminative classification model with well-founded probabilistic interpretation, close relationship to log-linear models and logistic regression Node-by-node discriminative classification of the form Quadratic regularization and thresholding by feature token count to prevent overfitting In node relocation and controller identification, candidate nodes are ranked by binary (yes/no) classification scores and the highest-ranked node is selected

Larger feature space Words in strings have only one primitive measure of proximity: linear order Tree nodes have several: precedence, sisterhood, domination Results in much richer feature space to explore w-2 w-1 w+1 w+2 A man arrived who I knew parent grandparent right-sister precedes

Feature types Syntactic categories, mothers, grandmothers (infinitive VP) Head words (wanted vs to vs eat) Syntactic path: <SBAR,S,VP,S,VP> Presence of daughters (NP under S) origin?

Evaluation on new metric: gold-standard input trees

Evaluation on new metric: parsed input trees

English-language evaluation Deep dependency error reduction of 76% over baseline gold-standard context-free dependencies, and 42% over Johnson 2002 (95.5%/98.1%/91.9%) Performance particularly good on individual subtasks -- over 50% error reduction (95.2%) over Johnson 2002 on identification of non-local shared dependency sites Relative error reduction degraded across the board on parsed trees -- performance on parsed trees doesn’t match Johnson 2002 (could be due to overfitting on gold-standard dataset)

Cross-linguistic comparison: dislocated dependencies only Compare performance of classifier on NEGRA and similar-sized subset of WSJ (~350K words) NEGRA has no annotations of null complementizers or sharing dependencies, so evaluate only on dislocations and exclude relativization WSJ context-free parsing is far more accurate than NEGRA parsing, so to even the field, use unlexicalized, untuned PCFGs for parsing

Algorithm performance: cross-linguistic comparison

Major error types -- German Major ambiguity between local and non-local dependency for clause-final VP and S nodes hold this in reserve for questions? “The RMV will not begin to be formed until much later.”

Major error types -- German Scrambling -- clause-initial NPs don’t always belong in S “The researcher Otto Schwabe documented the history of the excavation.”