Ruth Reeves CBA 2010 Barcelona, Spain Beyond Verb Centrism in Capturing Events.

Slides:



Advertisements
Similar presentations
Ontology Assessment – Proposed Framework and Methodology.
Advertisements

SWG Strategy (C) Copyright IBM Corp. 2006, All Rights Reserved. P4 Task 2 Fact Extraction using a CNL Current Status David Mott, Dave Braines, ETS,
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
KR-2002 Panel/Debate Are Upper-Level Ontologies worth the effort? Chris Welty, IBM Research.
Advances in the PARCC Mathematics Assessment August
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
Layering Semantics (Putting meaning into trees) Treebank Workshop Martha Palmer April 26, 2007.
Overview of the Hindi-Urdu Treebank Fei Xia University of Washington 7/23/2011.
Language Data Resources Treebanks. A treebank is a … database of syntactic trees corpus annotated with morphological and syntactic information segmented,
Statistical NLP: Lecture 3
April 26th, 2007 Workshop on Treebanking, HLT/NAACL, Rochester 1 Layering of Annotations in the Penn Discourse TreeBank (PDTB) Rashmi Prasad Institute.
Towards Parsing Unrestricted Text into PropBank Predicate- Argument Structures ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey.
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
Steven Schoonover.  What is VerbNet?  Levin Classification  In-depth look at VerbNet  Evolution of VerbNet  What is FrameNet?  Applications.
Semantic Annotation Meeting April 14, 2005 NomBank & the Down-to-Earth Parts of Pie-in-the-Sky Adam Meyers New York University April 14, 2004.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
1 Adaptive Management Portal April
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
NomBank 1.0: ULA08 Workshop March 18, 2007 NomBank 1.0 Released 12/2007 Unified Linguistic Annotation Workshop Adam Meyers New York University March 18,
PSY 369: Psycholinguistics Some basic linguistic theory part3.
Learning Subjective Adjectives from Corpora Janyce M. Wiebe Presenter: Gabriel Nicolae.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Albert Gatt LIN 3098 Corpus Linguistics. In this lecture Some more on corpora and grammar Construction Grammar as a theoretical framework Collostructional.
Framework for K-12 Science Education
Educator’s Guide Using Instructables With Your Students.
1 DEVELOPING ASSESSMENT TOOLS FOR ESL Liz Davidson & Nadia Casarotto CMM General Studies and Further Education.
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
PropBank, VerbNet & SemLink Edward Loper. PropBank 1M words of WSJ annotated with predicate- argument structures for verbs. –The location & type of each.
THOMSON SCIENTIFIC Web of Science Using the specialized search and analyze features Jackie Stapleton, librarian Fall 2006.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
The Current State of FrameNet CLFNG June 26, 2006 Fillmore.
Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz.
AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg.
AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
Comparing Political Systems. Why Compare To develop perspective on the mix of constants and variability which characterize the world’s governments and.
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
Proposed NWI KIF/CG --> Common Logic Standard A working group was recently formed from the KIF working group. John Sowa is the only CG representative so.
Linguistic Essentials
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
Combining Lexical Resources: Mapping Between PropBank and VerbNet Edward Loper,Szu-ting Yi, Martha Palmer September 2006.
Rules, Movement, Ambiguity
Introduction Chapter 1 Foundations of statistical natural language processing.
Communicative and Academic English for the EFL Professional.
Supertagging CMSC Natural Language Processing January 31, 2006.
The Model-Driven DDI Approach Arofan Gregory, Jon Johnson, Flavio Rizzolo, Marcel Hebing.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 12 RDF, OWL, Minimax.
Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Annotation Framework & ImageCLEF 2014 JAN BOTOREK, PETRA BUDÍKOVÁ
Knowledge Structure Vijay Meena ( ) Gaurav Meena ( )
Overview of Statistical NLP IR Group Meeting March 7, 2006.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. SOA-RM Overview and relation with SEE Adrian Mocan
Lec. 10.  In this section we explain which constituents of a sentence are minimally required, and why. We first provide an informal discussion and then.
Understanding Depth of Knowledge. Depth of Knowledge (DOK) Adapted from the model used by Norm Webb, University of Wisconsin, to align standards with.
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
PRESENTED BY: PEAR A BHUIYAN
IB Assessments CRITERION!!!.
Statistical NLP: Lecture 3
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
CS224N Section 3: Corpora, etc.
Progress report on Semantic Role Labeling
Information Retrieval
Presentation transcript:

Ruth Reeves CBA 2010 Barcelona, Spain Beyond Verb Centrism in Capturing Events

Ruth Reeves CBA 2010 Barcelona, Spain Presentation Summary Resources and Scientific Community NomBank Construction and Content – Predicate-Argument lexicon – Support Chain discovery Temporal Information in Medical Records – Current work on sequence of medical events

Ruth Reeves CBA 2010 Barcelona, Spain Value of Shared Resources Community annotations to common corpora – Decides issues of interoperability Formal representation Feature relationship (e.g. syntactic to semantic mapping) Clarifies which information is ‘orphaned’ – Training data for NLP systems to test against Strata of annotation allows divergent development

Ruth Reeves CBA 2010 Barcelona, Spain Scientific Community and Information Where data and resources can be shared Inter-organizational co-operation is fostered Extra-organizational communities form – Example: VA health data is scrupulously secured, allowing scalable solutions to national data standards – Without a mechanism for transparency across community, silos (intentional & un-) form Intelligence agencies, health insurance, credit marketing, etc. Copyright, privacy & security issues often prevent access to all development resources – Scientists offer knowledge-bases, tools and evaluation methods rather than corpus level detail

Ruth Reeves CBA 2010 Barcelona, Spain Access issues Pre-annotation access to corpora – Publically available language data Some published material, newspaper, transcribed broadcasts, etc. Lots of copyright issues – Security protected language data Financial transactions and reports, intelligence and military reports – Privacy protected text Patient health records Annotated corpora – Open-source – Fee-based license – Authorization limits Dependency with pre-annotation protection status

Ruth Reeves CBA 2010 Barcelona, Spain Annotated Corpora versus Systemized Results (Knowledge Bases) Free, open source corpora and annotation – Examples: NomBank, TimeBank, British National Corpus, … Fee for license to corpora and/or annotation – Examples: Linguistic Data Consortium, … Free open source access to ontology, lexicon, terminology, annotation data systemization – NomBank, TimeBank, WordNet, National Center for Biomedical Ontology, Unified Medical Language System, LARGE LIST Fee for license to ontologies (etc.) – MedLee, Oxford English Dictionary, HUGE LIST

Ruth Reeves CBA 2010 Barcelona, Spain Linguistic Resource Construction Traditional Dichotomy: Corpus-driven vs. Theory-driven Resource Labor Point of View – Labeling goals determines taxonomy type – Coverage goals determine taxonomy content – Resource re-use goals determine description logic – Resource target determines locus of complexity – Time, money and curiosity determine depth of detail

Ruth Reeves CBA 2010 Barcelona, Spain Linguistic Labor: Corpora & Discovery NomBank Noun Classes NomBank Support Chains

Ruth Reeves CBA 2010 Barcelona, Spain NomBank Resource References Principal Investigator: Adam Meyers, Courant Institute of Mathematics, New York University Co-Investigator: Ruth Reeves, VA Medical Informatics, Health Services Research & Development Supported under grants from National Science Foundation, and Space & Naval Warfare Systems – The reports and papers of the NomBank project do not necessarily reflect the position or the policy of the U.S. Government (nor vice-versa) Open Source Distribution:

Ruth Reeves CBA 2010 Barcelona, Spain NomBank Tasks Recognize predicate-argument noun complexes in Penn TreeBank II corpus Semantic categories for NomBank nouns Semantic role labels for NomBank nouns Notation for arguments shared between more than one predicate (Support chains) Mapping from NomBank nouns to semantically similar predicates of PropBank

Ruth Reeves CBA 2010 Barcelona, Spain Definition of a Bank Pred-Arg Bank = a set of propositions A proposition = a set of feature/value pairs  {f 1 = v 1, f 2 = v 2, f 3 = v 3...} For each pair f = v,  f є {REL, SUPPORT, ARG0, ARG1, ARG2,... ARGM}  Optional function tags: ARGM-MNR, ARGM-LOC, etc.  v = pointer to one or more nodes in the Penn Treebank II  Interpretation = The value of REL is the predicate and all other feature values are arguments (or adjuncts, etc.)

Ruth Reeves CBA 2010 Barcelona, Spain Legacy Resources = Constraints or Framework Wall Street Journal Penn TreeBank II Corpus – Syntactic terminal labels, syntactic structure & labeling PropBank labeled offset of PTB II Corpus – Semantic Roles for verb predicate-argument structure Nomlex Lexicon – Noun–verb mappings, Subcategorization patterns Comlex Lexicon – Syntactic subcategorization patterns

Ruth Reeves CBA 2010 Barcelona, Spain NomBank Results: Lexicons & Annotation Layers NomBank Annotated Corpus – 114,576 propositions derived from – 202,965 argument-bearing noun instances – 4704 distinct nouns – Support chain standardization Lexicons produced via PTB annotation data plus cross-breeding with other resources – NomBank Dictionary 10,128 nouns with NomBank classes map to related verb, adjective or adverb predicate 10,279 PropBank style rolesets, examples & Comlex-style syntactic subcategorization

Ruth Reeves CBA 2010 Barcelona, Spain NomBank Cross-breeding Results 3 More Banks Created or Enriched With Comlex-style syntactic subcategorization features PropBank-style pred-arg features Predicate-level Comlex & NomBank style semantic features Specific mapping relations: –Nomlex-Plus: 8,084 noun to verb, adjective, or adverb mappings + NomBank classes – NomAdv: 392 adverb-noun mappings –AdjAdv: 6,268 adjective-adverb mappings

Ruth Reeves CBA 2010 Barcelona, Spain Freedoms and Constraints of Goals PTB Syntactic labels and structure – Quote from the Penn TreeBank FAQ: "The grammar of noun phrases is fabulously complex, so extracting the head of a noun phrase can be a very difficult thing." – Inherited PTB flat internal structure for nouns For many nouns its is possible to: (1) use or extend current verb frames to number the arguments (2) model a new set of frames based on existing ones Elsewhere, impose a semantic classification into which a predicate must fall

Ruth Reeves CBA 2010 Barcelona, Spain Argument Labeling Constraints – Framework for Noun Category Principle: similar arguments = same role numbers Use similar verb (or adjective) as model & change to fit Trends akin to Relational Grammar Universal Alignment Hypothesis – ARG0 = agent, causer, actor – ARG1 = theme, patient – ARG2 = recipient, beneficiary, others Motivation: regularity within Pred-Arg Banks A holdover of verb-centrism, but that’s OK

Ruth Reeves CBA 2010 Barcelona, Spain Formation of NomBank Noun Classes PropBankRelational GrammarTheta-Roles ARG 0 Subject or 1 Causer, Agent, Actor ARG 1 Object or 2 Theme, Patient, Criss-Cross ARG 2 Indirect Object or 3 Recipient, Beneficiary VARIESOBLIQUEInstrument VARIESOBLIQUESource VARIESOBLIQUEGoal NomBank Predicates –Markable nouns of the PTB corpus: nouns co- occurring with argument structure –Class determination: argument structure s that conform to PropBank RoleSets

Ruth Reeves CBA 2010 Barcelona, Spain NomBank Noun Classes Verbal Nominalizations Adjective nominalizations Nouns w/ nominalization-like arguments 16 Argument-taking noun classes Relational, Ability, Job, Work-of-Art, Hallmark, Version, Partitive, Type, Share, Attribute, Group, Issue, Environment, Field, Criss-Cross, Event

Ruth Reeves CBA 2010 Barcelona, Spain Simple Verb/Nominalization Examples He promised to give a talk – REL = promise, ARG0 = he, ARG2 = to give a talk His promise to give a talk – REL = promise, ARG0 = his, ARG2 = to give a talk They behaved recklessly – REL = behaved, ARG0 = They, ARG1 = recklessly Their reckless behavior – REL = behavior, ARG0 = Their, ARG1 = reckless

Ruth Reeves CBA 2010 Barcelona, Spain Adjective Nominalizations Bessie’s singing ability – related adjective = able – REL = ability, ARG0 = Bessie, ARG1 = singing The absence of patent lawyers on the court – related adjective = absent – REL = absence, ARG1 = patent lawyers, ARGM-LOC = on the court

Ruth Reeves CBA 2010 Barcelona, Spain Nouns not related to Verbs A different sense from the related verb/adjective – SCO’s copyright violation complaint against IBM Argument structure like sue, not complain REL = complaint, ARG0 = SCO, ARG1 = copyright violation, ARG3 = against IBM NB: ARG2 = recipient of complaint, e.g., Federal court No related verb/adjective (in common use) – The 200 th anniversary of the Bill of Rights REL = anniversary, ARG1 = the Bill of Rights, ARG2 = 200 th NB: The ARG1 is sort of like the ARG1 of commemorate

Ruth Reeves CBA 2010 Barcelona, Spain 4 of the Noun Classes A period of industrial consolidation [Environment] – REL = period, ARG1 = of industrial consolidation Everyone’s right to talk [Ability] – REL = right, ARG0 = Everyone, ARG1 = to talk Congress’s idea of reform [WORK-OF-ART] – REL = idea, ARG0 = Congress, ARG1 = of reform The topic of discussion [Criss-Cross] – REL = topic, ARG1 = of discussion

Ruth Reeves CBA 2010 Barcelona, Spain 2 Other Classes (with Subclasses) Her husband [Relational-defrel] – REL = husband, ARG0 = husband, ARG1 = her His math professor [Relational-actrel] – REL = professor, ARG0 = professor, ARG1 = math, ARG2 = His A set of tasks [Partitive] – REL = set, ARG1 = of tasks The back of your hand [Partitive-piece] – REL = back, ARG1 = your hand

Ruth Reeves CBA 2010 Barcelona, Spain Sample NomBank Frame-File

Ruth Reeves CBA 2010 Barcelona, Spain Orphaned Information Textually distant relationships between predicates and their modifiers & arguments – highly researched: anaphora resolution, scrambling, movement, agreement, etc. Predicate-Argument annotation forced any ‘orphaned’ argument or modifier to be accounted for by some predicate – Raising, Equi, Support Verbs did not cover all orphans

Ruth Reeves CBA 2010 Barcelona, Spain Support Chains Support Verbs The judge made demands on his staff – REL = demands, SUPPORT = made, ARG0 = the judge, ARG2 = on his staff A savings institution needs your help – REL = help, ARG0 = your, SUPPORT = needs, ARG2 = a savings institution

Ruth Reeves CBA 2010 Barcelona, Spain Support Chains Support-like Nouns His share of liability –REL = liability, SUPPORT = share, ARG0 = his Their responsibility for hard decisions –REL = decisions, ARG0 = Their, ARGM-MNR = hard His first batch of questions –REL = questions, ARG0 = His Its first stage of development –REL = development, ARG1 = its

Ruth Reeves CBA 2010 Barcelona, Spain Semantic Content in Support Negation/modality (as with equi/raising) – Mary refused the nomination – Oracle attempted a hostile takeover Degrees of agentivity – Mary planned an attack – Mary participated in the attack Other meaning changes – Mary wrought destruction on San Francisco

Ruth Reeves CBA 2010 Barcelona, Spain Support Chain Examples She made most of the decisions – REL = decisions, Support = made + most + of, ARG0 = she They are considering a variety of actions – REL = actions, Support = (are) + considering + variety + of, ARG0 = they I take advantage of this opportunity to make a plea to the readers – REL = plea, Support = take + advantage + opportunity (+ to) + make, ARG0 = I, ARG2 = to the readers Saab is looking for a partner for financial cooperation – REL = cooperation, Support = (is) + looking + for + partner + for, ARG0 = Saab, ARG2 = partner, ARGM- MNR = financial

Ruth Reeves CBA 2010 Barcelona, Spain Syntax of Support Chains Support chains begin with a verb (so far) Can include: support verbs, support-like nouns, transparent nouns, criss-cross nouns, quantifiers in partitive constructions Similar to chains of raising/equi predicates – He seemed to want to be likely to be chosen List of heads representation conforms to Penn Treebank

Ruth Reeves CBA 2010 Barcelona, Spain Arguments across copulas, etc. Predication can link noun to a non-NP Argument The real battle is over who will win – REL = battle, ARGM-ADV = real, ARG2 = over who will win His claim was that I should never eat herring – REL = claim, ARG0 = His, ARG1 = that I should never eat herring Why not NP arguments? Argument Nominalizations – John is his math teacher REL = teacher, ARG0 = teacher, ARG1 = math, ARG2 = his John and his math teacher linked by equative predication

Ruth Reeves CBA 2010 Barcelona, Spain Support Chains and Event Modifiers – Some partitives may add a locative or temporal modifier to their ARG1 7 years of bitter debate REL = debate, ARGM-MNR = bitter, ARGM-TMP = 7 years – 7 years partitive function of year, temporal modifier of debate – NomBank class [EVENT] Last year's drought in the Midwest REL = drought, ARGM-TMP = Last year, ARGM-LOC = in the Midwest

Ruth Reeves CBA 2010 Barcelona, Spain Temporal Data in Medical Contexts Temporal expressions: degrees of expressivity – Absolute – time-stamp – Relative – dependency for reference – Granularity requirements – topic specific Absolute times and dates are sparse in some sections of medical records Sequence of events is often good enough – Case 1: Mr. Smith fell then had chest pain. – Case 2: Mr. Smith had chest pain then fell. Event order may be implied or asserted, or an admixture

Ruth Reeves CBA 2010 Barcelona, Spain Electronic Health Record Corpus Veterans Affairs HSR&D, NLP research efforts: – Annotation to Common Ontology match-up: Unified effort to collect textual data in electronic health records –Systematized Nomenclature of Medicine-Clinical Terms SnoMed Ontology maintained by National Library of Medicine – Textual data matched to medical ontology for selected set of event types Event-Sequence training data

Ruth Reeves CBA 2010 Barcelona, Spain Textual Event Capture Current Resource Pooling Medically relevant events: capture underway TimeBank tool, TARSQI tool kit Med_TTK: optimized for medical record use – Extract temporal expressions, – Linkage to medical events PropBank and NomBank – Define scope of temporal adverbs and adjectives Current work with EHR corpora and next generation of SnoMed-enhanced Med_TTK – Recognize sub-event taxonomy