807 - TEXT ANALYTICS Massimo Poesio Lecture 8: Relation extraction.

Slides:

Advertisements

Similar presentations

CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.

Advertisements

INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Relation Extraction.

Machine learning continued Image source:

© Copyright 2008 STI INNSBRUCK The Google Knowledge Graph Ioan Toma.

Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,

Outline Linguistic Theories of semantic representation  Case Frames – Fillmore – FrameNet  Lexical Conceptual Structure – Jackendoff – LCS  Proto-Roles.

NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.

Albert Gatt LIN 1080 Semantics Lecture 13. In this lecture We take a look at argument structure and thematic roles these are the parts of the sentence.

ELABORAZIONE DEL LINGUAGGIO NATURALE SEMANTICA: NAMED ENTITIES RELAZIONI.

Steven Schoonover.  What is VerbNet?  Levin Classification  In-depth look at VerbNet  Evolution of VerbNet  What is FrameNet?  Applications.

Automatic Metaphor Interpretation as a Paraphrasing Task Ekaterina Shutova Computer Lab, University of Cambridge NAACL 2010.

The Case for Case Reopened ‘Agents and Agency Revisited’

Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.

CS4705.  Idea: ‘extract’ or tag particular types of information from arbitrary text or transcribed speech.

KnowItNow: Fast, Scalable Information Extraction from the Web Michael J. Cafarella, Doug Downey, Stephen Soderland, Oren Etzioni.

Open Information Extraction From The Web Rani Qumsiyeh.

Methods for Domain-Independent Information Extraction from the Web An Experimental Comparison Oren Etzioni et al. Prepared by Ang Sun

Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.

1 Natural Language Processing for the Web Prof. Kathleen McKeown 722 CEPSR, Office Hours: Wed, 1-2; Tues 4-5 TA: Yves Petinot 719 CEPSR,

A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK

Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.

Information Retrieval in Practice

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.

AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.

Logic Programming for Natural Language Processing Menyoung Lee TJHSST Computer Systems Lab Mentor: Matt Parker Analytic Services, Inc.

INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Unsupervised and Semi-Supervised Relation Extraction.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.

PropBank, VerbNet & SemLink Edward Loper. PropBank 1M words of WSJ annotated with predicate- argument structures for verbs. –The location & type of each.

Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.

Web-scale Information Extraction in KnowItAll Oren Etzioni etc. U. of Washington WWW’2004 Presented by Zheng Shao, CS591CXZ.

Interpreting Dictionary Definitions Dan Tecuci May 2002.

Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.

PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.

A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.

A Language Independent Method for Question Classification COLING 2004.

Domain-Specific Iterative Readability Computation Jin Zhao 13/05/2011.

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

Constructing Knowledge Graph from Unstructured Text Image Source: Kundan Kumar Siddhant Manocha.

A Cross-Lingual ILP Solution to Zero Anaphora Resolution Ryu Iida & Massimo Poesio (ACL-HLT 2011)

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Supervised Relation Extraction.

Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

Combining Lexical Resources: Mapping Between PropBank and VerbNet Edward Loper,Szu-ting Yi, Martha Palmer September 2006.

Rules, Movement, Ambiguity

1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )

KnowItAll April William Cohen. Announcements Reminder: project presentations (or progress report) –Sign up for a 30min presentation (or else) –First.

Syntactic Annotation of Slovene Corpora (SDT, JOS) Nina Ledinek ISJ ZRC SAZU

LING 6520: Comparative Topics in Linguistics (from a computational perspective) Martha Palmer Jan 15,

4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.

CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.

ARDA Visit 1 Penn Lexical Semantics at Penn: Proposition Bank and VerbNet Martha Palmer, Dan Gildea, Paul Kingsbury, Olga Babko-Malaya, Bert Xue, Karin.

Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.

FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.

NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.

Automatically Labeled Data Generation for Large Scale Event Extraction

What Is Cluster Analysis?

Learning Attributes and Relations

INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.

Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin

Improving a Pipeline Architecture for Shallow Discourse Parsing

Semantic Network & Knowledge Graph

KnowItAll and TextRunner

Presentation transcript:

807 - TEXT ANALYTICS Massimo Poesio Lecture 8: Relation extraction

OTHER ASPECTS OF SEMANTIC INTERPRETATION Identification of RELATIONS between entities mentioned – Focus of interest in modern CL since 1993 or so Identification of TEMPORAL RELATIONS – From about 2003 on QUALIFICATION of such relations (modality, epistemicity) – From about 2010 on

TYPES OF RELATIONS Predicate-argument structure (verbs and nouns) – John kicked the ball Nominal relations – The red ball Relations between events / temporal relations – John kicked the ball and scored a goal Domain-dependent relations (MUC/ACE) – John works for IBM

TYPES OF RELATIONS Predicate-argument structure (verbs and nouns) – John kicked the ball Nominal relations – The red ball Relations between events / temporal relations – John kicked the ball and scored a goal Domain-dependent relations (MUC/ACE) – John works for IBM

PREDICATE/ARGUMENT STRUCTURE Powell met Zhu Rongji Proposition: meet(Powell, Zhu Rongji ) Powell met with Zhu Rongji Powell and Zhu Rongji met Powell and Zhu Rongji had a meeting... When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane. meet(Powell, Zhu) discuss([Powell, Zhu], return(X, plane)) debate consult join wrestle battle meet(Somebody1, Somebody2)

PREDICATE-ARGUMENT STRUCTURE Linguistic Theories – Case Frames – Fillmore  FrameNet – Lexical Conceptual Structure – Jackendoff  LCS – Proto-Roles – Dowty  PropBank – English verb classes (diathesis alternations) - Levin  VerbNet – Talmy, Levin and Rappaport

Fillmore’s Case Theory Sentences have a DEEP STRUCTURE with CASE RELATIONS A sentence is a verb + one or more NPs – Each NP has a deep-structure case A(gentive) I(nstrumental) D(ative) F(actitive) L(ocative) O(bjective) – Subject is no more important than Object Subject/Object are surface structure

THEMATIC ROLES Following on Fillmore’s original work, many theories of predicate argument structure / thematic roles were proposed, among which the best known perhaps – Jackendoff’s LEXICAL CONCEPTUAL SEMANTICS – Dowty’s PROTO-ROLES theory

Dowty’s PROTO-ROLES Event-dependent Prototypes based on shared entailments Grammatical relations such as subject related to observed (empirical) classification of participants Typology of grammatical relations Proto-Agent Proto-Patient

Proto-Agent Properties – Volitional involvement in event or state – Sentience (and/or perception) – Causing an event or change of state in another participant – Movement (relative to position of another participant) – (exists independently of event named) *may be discourse pragmatic

Proto-Patient Properties: – Undergoes change of state – Incremental theme – Causally affected by another participant – Stationary relative to movement of another participant – (does not exist independently of the event, or at all) *may be discourse pragmatic

Semantic role labels: Jan broke the LCD projector. break (agent(Jan), patient(LCD-projector)) cause(agent(Jan), change-of-state(LCD-projector)) (broken(LCD-projector)) agent(A) -> intentional(A), sentient(A), causer(A), affector(A) patient(P) -> affected(P), change(P),… Filmore, 68 Jackendoff, 72 Dowty, 91

VERBNET AND PROPBANK Dowty’s theory of proto-roles was the basis for the development of PROPBANK, the first corpus annotated with information about predicate-argument structure

PROPBANK REPRESENTATION a GM-Jaguar pact that would give *T*-1 the US car maker an eventual 30% stake in the British company Arg0 Arg2 Arg1 give(GM-J pact, US car maker, 30% stake) a GM-Jaguar pact that would give the U.S. car maker an eventual 30% stake in the British company.

ARGUMENTS IN PROPBANK Arg0 = agent Arg1 = direct object / theme / patient Arg2 = indirect object / benefactive / instrument / attribute / end state Arg3 = start point / benefactive / instrument / attribute Arg4 = end point Per word vs frame level – more general?

FROM PREDICATES TO FRAMES In one of its senses, the verb observe evokes a frame called Compliance: this frame concerns people’s responses to norms, rules or practices. The following sentences illustrate the use of the verb in the intended sense: – Our family observes the Jewish dietary laws. – You have to observe the rules or you’ll be penalized. – How do you observe Easter? – Please observe the illuminated signs.

FrameNet FrameNet records information about English words in the general vocabulary in terms of 1.the frames (e.g. Compliance) that they evoke, 2.the frame elements (semantic roles) that make up the components of the frames (in Compliance, Norm is one such frame element), and 3.each word’s valence possibilities, the ways in which information about the frames is provided in the linguistic structures connected to them (with observe, Norm is typically the direct object). theta

NOMINAL RELATIONS

HISTORY

CLASSIFICATION SCHEMES FOR NOMINAL RELATIONS

ONE EXAMPLE (Barker et al1998, Nastase & Spakowicz 2003)

THE TWO-LEVEL TAXONOMY OF RELATIONS, 2

THE SEMEVAL-2007 CLASSIFICATION OF RELATIONS Cause-Effect: laugh wrinkles Instrument-Agency: laser printer Product-Producer: honey bee Origin-Entity: message from outer-space Theme-Tool: news conference Part-Whole: car door Content-Container: the air in the jar

CAUSAL RELATIONS

TEMPORAL RELATIONS

THE MUC AND ACE TASKS Modern research in relation extraction, as well, was kicked-off by the Message Understanding Conference (MUC) campaigns and continued through the Automatic Content Extraction (ACE) and Machine Reading follow- ups MUC: NE, coreference, TEMPLATE FILLING ACE: NE, coreference, relations

TEMPLATE-FILLING

EXAMPLE MUC: JOB POSTING

THE ASSOCIATED TEMPLATE

AUTOMATIC CONTENT EXTRACTION (ACE)

ACE: THE DATA

ACE: THE TASKS

RELATION DETECTION AND RECOGNITION

ACE: RELATION TYPES

OTHER PRACTICAL VERSIONS OF RELATION EXTRACTION Biomedical domain (BIONLP, BioCreative) Chemistry Cultural Heritage

THE TASK OF SEMANTIC RELATION EXTRACTION

SEMANTIC RELATION EXTRACTION: THE CHALLENGES

HISTORY OF RELATION EXTRACTION Before 1993: Symbolic methods (using knowledge bases) Since then: statistical / heuristic based methods – From 1995 to around 2005: mostly SUPERVISED – More recently: also quite a lot of UNSUPERVISED / SEMI SUPERVISED techniques

SUPERVISED RE: RE AS A CLASSIFICATION TASK Binary relations Entities already manually/automatically recognized Examples are generated for all sentences with at least 2 entities Number of examples generated per sentence is NC2 – Combination of N distinct entities selected 2 at a time

GENERATING CANDIDATES TO CLASSIFY

RE AS A BINARY CLASSIFICATION TASK

NUMBER OF CANDIDATES TO CLASSIFY – SIMPLE MINDED VERSION

THE SUPERVISED APPROACH TO RE Most current approaches to RE are kernel- based Different information is used – Sequences of words, e.g., through the GLOBAL CONTEXT / LOCAL CONTEXT kernels of Bunescu and Mooney / Giuliano Lavelli & Romano – Syntactic information through the TREE KERNELS of Zelenko et al / Moschitti et al – Semantic information in recent work

KERNEL METHODS: A REMINDER Embedding the input data in a feature space Using a linear algorithm for discovering non-linear patterns Coordinates of images are not needed, only pairwise inner products Pairwise inner products can be efficiently computed directly from X using a kernel function K:X×X→R

MODULARITY OF KERNEL METHODS

THE WORD-SEQUENCE APPROACH Shallow linguistic Information: – tokenization – Lemmatization – sentence splitting – PoS tagging Claudio Giuliano, Alberto Lavelli, and Lorenza Romano (2007), FBK-IRST: Kernel methods for relation extraction, Proc. Of SEMEVAL-2007

LINGUISTIC REALIZATION OF RELATIONS Bunescu & Mooney, NIPS 2005

WORD-SEQUENCE KERNELS Two families of “basic” kernels – Global Context – Local Context Linear combination of kernels Explicit computation – Extremely sparse input representation

THE GLOBAL CONTEXT KERNEL

THE LOCAL CONTEXT KERNEL

LOCAL CONTEXT KERNEL (2)

KERNEL COMBINATION

EXPERIMENTAL RESULTS Biomedical data sets – AIMed – LLL Newspaper articles – Roth and Yih SEMEVAL 2007

EVALUATION METHODOLOGIES

EVALUATION (2)

EVALUATION (3)

EVALUATION (4)

RESULTS ON AIMED

NON-SUPERVISED METHODS FOR RELATION EXTRACTION Unsupervised relation extraction: – Hearst – Other work on extracting hyponymy relations – Extracting other relations: Almuhareb and Poesio, Cimiano and Wenderoth Semi-supervised methods – KNOW-IT-ALL

HEARST 1992, 1998: USING PATTERNS TO EXTRACT ISA LINKS Intuition: certain constructions typically used to express certain types of semantic relations E.g., for ISA: – The seabass IS A fish – Swimming, running AND OTHER activities – Vehicles such as cars, trucks and bikes

TEXT PATTERNS FOR HYPONYMY EXTRACTION HEARST 1998: NP {, NP}* {,} or other NP bruises …… broken bones, and other INJURIES HYPONYM (bruise, injury) EVALUATION: 55.46% precision wrt WordNet

THE PRECISION / RECALL TRADEOFF X and other Y: high precision, low recall X isa Y: low precision, high recall

HEARST’ REQUIREMENTS ON PATTERNS

OTHER WORK ON EXTRACTING HYPONYMY Caraballo ACL 1999 Widdows & Dorow 2002 Pantel & Ravichandran ACL 2004

OTHER APPROACHES TO RE Using syntactic information Using lexical features

Syntactic information for RE Pros: – more structured information useful when dealing with long-distance relations Cons: – not always robust – (and not available for all languages)

Semi-supervised methods Hearst 1992: find new patterns by using initial examples as SEEDS This approach has been pursued in a number of ways – Espresso (Pantel and Pennacchiotti 2006) – OPEN INFORMATION EXTRACTION (Etzioni and colleagues)

THE GENERIC SEMI-SUPERVISED ALGORITHM 1.Start with SEED INSTANCES Depending on algorithm, seed may be hand-generated or automatically obtained 2.For each seed instance, extract patterns from corpus Choice of patterns depends on algorithm 3.Output the best patterns according to some metric 4.(Possibly) iterate steps 2-3

THE ESPRESSO SEMI-SUPERVISED ALGORITHM 1.Start with SEED INSTANCES Hand-chosen 2.For each seed instance, extract patterns from corpus Generalization of whole sentence 3.Output the best patterns according to some metric A metric based on PMI 3.Do iterate steps 2-3

KNOW-IT-ALL A system for ontology population developed by Oren Etzioni and collaborators at the University of Washington

KNOW-IT-ALL: ARCHITECTURE

INPUT

BOOTSTRAPPING This first step takes the input domain predicates and the generic extraction patterns and produces domain-specific extraction patterns

EXTRACTION PATTERNS

EXTRACTOR Uses domain-specific extraction patterns + syntactic constraints – In “Garth Brooks is a country singer”, country NOT extracted as an instance of the pattern “X is a NP” Produces EXTRACTIONS (= instances of the patterns that satisfy the syntactic constraints)

ASSESSOR Estimates the likelihood of an extraction using POINTWISE MUTUAL INFORMATION between the extracted INSTANCE and DISCRIMINATOR phrases E.g., INSTANCE: Liege DISCRIMINATOR PHRASES: “is a city”

ESTIMATING THE LIKELIHOOD OF A FACT P(f |  ) and P(f |  ) estimated using a set of positive and negative instances

TERMINATION CONDITION KNOW-IT-ALL could continue searching for instances – But for instance, COUNTRY has only around 300 instances Stop: Signal-to-Noise ratio – Number of high probability facts / Number of low probability ones

OVERALL ALGORITHM

EVALUATION 5 classes: CITY, US STATE, COUNTRY, ACTOR, FILM

IE IN PRACTICE: THE GOOGLE KNOWLEDGE GRAPH “A huge knowledge graph of interconnected entities and their attributes”. Amit Singhal, Senior Vice President at Google “A knowledge based used by Google to enhance its search engine’s results with semantic-search information gathered from a wide variety of sources” 82

Based on information derived from many sources including Freebase, CIA World Factbook, Wikipedia Contains 570 million objects and more than 18 billion facts about and relationships between these different objects 83 INFORMATION IN THE GKG

84 Search for a person, place, or thing Facts about entities are displayed in a knowledge box on the right side INFORMATION IN THE GKG

What it looks like Web Results have not changed

What it looks like This is what’s new Map General info Upcoming Events Points of interest *The Type of information that appears in this panel depends on what you are searching for

Handling vague searches/homophones Prompt user to indicate more precisely exactly what it is they are looking for Displays results only relating to that meaning Eliminates other results in both the panel and web results

Example of this: Very General Results

Example of this: Shows Possible Results: Now user pick what they were looking for Lets assume user meant the TV show Kings

MORE COMPLEX SEMANTICS Modalities Temporal interpretation

ACKNOWLEDGMENTS Many slides borrowed from – Roxana Girju – Alberto Lavelli