Presentation is loading. Please wait.

Presentation is loading. Please wait.

NomBank 1.0: ULA08 Workshop March 18, 2007 NomBank 1.0 Released 12/2007 Unified Linguistic Annotation Workshop Adam Meyers New York University March 18,

Similar presentations


Presentation on theme: "NomBank 1.0: ULA08 Workshop March 18, 2007 NomBank 1.0 Released 12/2007 Unified Linguistic Annotation Workshop Adam Meyers New York University March 18,"— Presentation transcript:

1 NomBank 1.0: ULA08 Workshop March 18, 2007 NomBank 1.0 Released 12/2007 Unified Linguistic Annotation Workshop Adam Meyers New York University March 18, 2008

2 NomBank 1.0: ULA08 Workshop March 18, 2007 Outline NomBank 1.0 Statistics How I finished NomBank Quality Control ULA-style Summary

3 NomBank 1.0: ULA08 Workshop March 18, 2007 NomBank 1.0 114,500 NomBank propositions 113,000 common nouns not markable –35,000 instances of nouns that are never markable, e.g., cheese Arguments of verbal/adjectival nominalizations and 16 other classes of nouns Wall Street Journal – Penn Treebank II corpus

4 NomBank 1.0: ULA08 Workshop March 18, 2007 New for NomBank 1.0 Updated Specifications Annotation and Frames –Less Frequent Classes –Less Frequent Words Cleaning Up Special Classes –Adjective Nominalizations distance, accident, wisdom, truculence –Attribute and Environment Nouns accuracy, acronym, race (2 nd roleset), radioactivity Cleaning Up ARGMs –Standardization using ADJUNCT dictionaries ADJADV & NOMADV

5 NomBank 1.0: ULA08 Workshop March 18, 2007 Difficulties with NomBank Long Training –2 months (or more) for graduate linguistic student –Qualification: 85% F score on Gold Test Corpora Slow to Annotate –25 instances/hour is difficult to maintain Twice as slow as PropBank New Task –Many new phenomena

6 NomBank 1.0: ULA08 Workshop March 18, 2007 Increasing Annotation Speed More Preprocessing available for final 9% –Automatically Produced NomBank via GLARF –Saves time even if incorrect Correct Constituent and Incorrect Argument Numbers –Marked More Quickly than from scratch –ARGMs are often consistent across predicates Dictionary Lookup (effect on quality control as well) Speedup at least 50%

7 NomBank 1.0: ULA08 Workshop March 18, 2007 Increasing Annotation Speed 2 Frequent nouns annotated first Nouns with Few Instances –More time per Instance Grouping nouns together –500 or more instances –Based on usage/meaning –Similar frame entries –Annotated whole groups –Result: No slowdown for low-frequency nouns

8 NomBank 1.0: ULA08 Workshop March 18, 2007 Quality Control: Auto Error Detection 25% of NomBank Reviewed by Expert Annotator Detect Instances with Likely Errors –Treebank-based constraints (mediated by GLARF) –Compatibility with Dictionaries Also detects dictionary errors Rationale –Multiple annotation plus adjudication = too expensive –Detecting compatibility with other annotation = feasible Hard to Quantify Improvements –ML system scores? standard vs. degenerate version of NomBank

9 NomBank 1.0: ULA08 Workshop March 18, 2007 Constraints on NP-internal Arguments Relative Clauses are unlikely arguments –the banner that proclaims the renewal of socialism NOT an ARG1 of banner –Relative clauses are detectable by GLARF Presence of empty categories, POS of that, etc. Does noun take that complement in COMLEX Syntax? Det + premodifier = unlikely constituent –their/ARG3 financial/ARG1 viability/ARG2-REF –no. 1/ARG1. victory (the annotator was correct) Discontinuous constituent = unlikely –conversion/ARG1 rights on the stock/ARG3 NB: Secondary theme (ARG3) – recursive ARG1

10 NomBank 1.0: ULA08 Workshop March 18, 2007 Empty Categories Empty Categories usually have antecedents –Mr. Bush/ARG0 is drawn t to the idea of t trying/SUPPORT out a line-item veto The original annotator failed to make the link from the passive empty category to Mr. Bush PTB provides many, but not all links to ECs –Illinois Supreme Court’s/ARG0 decision t to institute/SUPPORT changes.

11 NomBank 1.0: ULA08 Workshop March 18, 2007 External Arguments of Nouns External Argument = argument outside of NP headed by NomBank predicate Licensed Cases: –Predication –PP Constructions –Support System has constraints to identify licensed cases. Unlicensed external argument = likely error.

12 NomBank 1.0: ULA08 Workshop March 18, 2007 Arguments via Predication predicate + copula + (non-NP) argument –The real/ARGM-ADV battle is over spam/ARG1. argument + copula + predicate –adjnoms, attribute nouns, limited distribution –Trying to time the economy/ARG1 is a mistake. –The package/ARG1 is three pounds/ARG2 in weight SUPPORT = pounds + in

13 NomBank 1.0: ULA08 Workshop March 18, 2007 PP Constructs with External Args Well-formedness conditions implemented –The relationship in tree between the PP and the external arguments –The lexical class of the noun (COMLEX Syntax or NOMLEX-PLUS) Parenthetical PPs –George/ARG1, at the behest of Fred /ARG0, bought a Ford/ARG1. Subject-Oriented PPs –They/ARG0 exercise for enjoyment. Adverbial/Discourse PPs –In important particulars, they are different/ARG1. Noun-Modifying PPs –Participants/ARG0 in the meeting. Extraposition Construction –She/ARG1 was under consideration to be the next president/ARG2.

14 NomBank 1.0: ULA08 Workshop March 18, 2007 Support Chain

15 NomBank 1.0: ULA08 Workshop March 18, 2007 Support Chain Consists of Lexical Items (no phrases) Forms of be, modals, auxilliary verbs omitted –Verb Group analysis extended to copula constructs Must contain: noun, adj, verb or determiner (no preposition only chains) The 1 st item shares an argument with the noun The last item takes the noun as an argument The Nth item takes the N+1 item as an argument

16 NomBank 1.0: ULA08 Workshop March 18, 2007 Summary Preprocessing and Error Detection –Possible due to other annotation –Resulted in improvements of speed and quality NomBank 1.0 –Finished –Part of CONLL 2008


Download ppt "NomBank 1.0: ULA08 Workshop March 18, 2007 NomBank 1.0 Released 12/2007 Unified Linguistic Annotation Workshop Adam Meyers New York University March 18,"

Similar presentations


Ads by Google