807 - TEXT ANALYTICS Massimo Poesio Lecture 8: Relation extraction
OTHER ASPECTS OF SEMANTIC INTERPRETATION Identification of RELATIONS between entities mentioned – Focus of interest in modern CL since 1993 or so Identification of TEMPORAL RELATIONS – From about 2003 on QUALIFICATION of such relations (modality, epistemicity) – From about 2010 on
TYPES OF RELATIONS Predicate-argument structure (verbs and nouns) – John kicked the ball Nominal relations – The red ball Relations between events / temporal relations – John kicked the ball and scored a goal Domain-dependent relations (MUC/ACE) – John works for IBM
TYPES OF RELATIONS Predicate-argument structure (verbs and nouns) – John kicked the ball Nominal relations – The red ball Relations between events / temporal relations – John kicked the ball and scored a goal Domain-dependent relations (MUC/ACE) – John works for IBM
PREDICATE/ARGUMENT STRUCTURE Powell met Zhu Rongji Proposition: meet(Powell, Zhu Rongji ) Powell met with Zhu Rongji Powell and Zhu Rongji met Powell and Zhu Rongji had a meeting... When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane. meet(Powell, Zhu) discuss([Powell, Zhu], return(X, plane)) debate consult join wrestle battle meet(Somebody1, Somebody2)
PREDICATE-ARGUMENT STRUCTURE Linguistic Theories – Case Frames – Fillmore FrameNet – Lexical Conceptual Structure – Jackendoff LCS – Proto-Roles – Dowty PropBank – English verb classes (diathesis alternations) - Levin VerbNet – Talmy, Levin and Rappaport
Fillmore’s Case Theory Sentences have a DEEP STRUCTURE with CASE RELATIONS A sentence is a verb + one or more NPs – Each NP has a deep-structure case A(gentive) I(nstrumental) D(ative) F(actitive) L(ocative) O(bjective) – Subject is no more important than Object Subject/Object are surface structure
THEMATIC ROLES Following on Fillmore’s original work, many theories of predicate argument structure / thematic roles were proposed, among which the best known perhaps – Jackendoff’s LEXICAL CONCEPTUAL SEMANTICS – Dowty’s PROTO-ROLES theory
Dowty’s PROTO-ROLES Event-dependent Prototypes based on shared entailments Grammatical relations such as subject related to observed (empirical) classification of participants Typology of grammatical relations Proto-Agent Proto-Patient
Proto-Agent Properties – Volitional involvement in event or state – Sentience (and/or perception) – Causing an event or change of state in another participant – Movement (relative to position of another participant) – (exists independently of event named) *may be discourse pragmatic
Proto-Patient Properties: – Undergoes change of state – Incremental theme – Causally affected by another participant – Stationary relative to movement of another participant – (does not exist independently of the event, or at all) *may be discourse pragmatic
Semantic role labels: Jan broke the LCD projector. break (agent(Jan), patient(LCD-projector)) cause(agent(Jan), change-of-state(LCD-projector)) (broken(LCD-projector)) agent(A) -> intentional(A), sentient(A), causer(A), affector(A) patient(P) -> affected(P), change(P),… Filmore, 68 Jackendoff, 72 Dowty, 91
VERBNET AND PROPBANK Dowty’s theory of proto-roles was the basis for the development of PROPBANK, the first corpus annotated with information about predicate-argument structure
PROPBANK REPRESENTATION a GM-Jaguar pact that would give *T*-1 the US car maker an eventual 30% stake in the British company Arg0 Arg2 Arg1 give(GM-J pact, US car maker, 30% stake) a GM-Jaguar pact that would give the U.S. car maker an eventual 30% stake in the British company.
ARGUMENTS IN PROPBANK Arg0 = agent Arg1 = direct object / theme / patient Arg2 = indirect object / benefactive / instrument / attribute / end state Arg3 = start point / benefactive / instrument / attribute Arg4 = end point Per word vs frame level – more general?
FROM PREDICATES TO FRAMES In one of its senses, the verb observe evokes a frame called Compliance: this frame concerns people’s responses to norms, rules or practices. The following sentences illustrate the use of the verb in the intended sense: – Our family observes the Jewish dietary laws. – You have to observe the rules or you’ll be penalized. – How do you observe Easter? – Please observe the illuminated signs.
FrameNet FrameNet records information about English words in the general vocabulary in terms of 1.the frames (e.g. Compliance) that they evoke, 2.the frame elements (semantic roles) that make up the components of the frames (in Compliance, Norm is one such frame element), and 3.each word’s valence possibilities, the ways in which information about the frames is provided in the linguistic structures connected to them (with observe, Norm is typically the direct object). theta
NOMINAL RELATIONS
HISTORY
CLASSIFICATION SCHEMES FOR NOMINAL RELATIONS
ONE EXAMPLE (Barker et al1998, Nastase & Spakowicz 2003)
THE TWO-LEVEL TAXONOMY OF RELATIONS, 2
THE SEMEVAL-2007 CLASSIFICATION OF RELATIONS Cause-Effect: laugh wrinkles Instrument-Agency: laser printer Product-Producer: honey bee Origin-Entity: message from outer-space Theme-Tool: news conference Part-Whole: car door Content-Container: the air in the jar
CAUSAL RELATIONS
TEMPORAL RELATIONS
THE MUC AND ACE TASKS Modern research in relation extraction, as well, was kicked-off by the Message Understanding Conference (MUC) campaigns and continued through the Automatic Content Extraction (ACE) and Machine Reading follow- ups MUC: NE, coreference, TEMPLATE FILLING ACE: NE, coreference, relations
TEMPLATE-FILLING
EXAMPLE MUC: JOB POSTING
THE ASSOCIATED TEMPLATE
AUTOMATIC CONTENT EXTRACTION (ACE)
ACE: THE DATA
ACE: THE TASKS
RELATION DETECTION AND RECOGNITION
ACE: RELATION TYPES
OTHER PRACTICAL VERSIONS OF RELATION EXTRACTION Biomedical domain (BIONLP, BioCreative) Chemistry Cultural Heritage
THE TASK OF SEMANTIC RELATION EXTRACTION
SEMANTIC RELATION EXTRACTION: THE CHALLENGES
HISTORY OF RELATION EXTRACTION Before 1993: Symbolic methods (using knowledge bases) Since then: statistical / heuristic based methods – From 1995 to around 2005: mostly SUPERVISED – More recently: also quite a lot of UNSUPERVISED / SEMI SUPERVISED techniques
SUPERVISED RE: RE AS A CLASSIFICATION TASK Binary relations Entities already manually/automatically recognized Examples are generated for all sentences with at least 2 entities Number of examples generated per sentence is NC2 – Combination of N distinct entities selected 2 at a time
GENERATING CANDIDATES TO CLASSIFY
RE AS A BINARY CLASSIFICATION TASK
NUMBER OF CANDIDATES TO CLASSIFY – SIMPLE MINDED VERSION
THE SUPERVISED APPROACH TO RE Most current approaches to RE are kernel- based Different information is used – Sequences of words, e.g., through the GLOBAL CONTEXT / LOCAL CONTEXT kernels of Bunescu and Mooney / Giuliano Lavelli & Romano – Syntactic information through the TREE KERNELS of Zelenko et al / Moschitti et al – Semantic information in recent work
KERNEL METHODS: A REMINDER Embedding the input data in a feature space Using a linear algorithm for discovering non-linear patterns Coordinates of images are not needed, only pairwise inner products Pairwise inner products can be efficiently computed directly from X using a kernel function K:X×X→R
MODULARITY OF KERNEL METHODS
THE WORD-SEQUENCE APPROACH Shallow linguistic Information: – tokenization – Lemmatization – sentence splitting – PoS tagging Claudio Giuliano, Alberto Lavelli, and Lorenza Romano (2007), FBK-IRST: Kernel methods for relation extraction, Proc. Of SEMEVAL-2007
LINGUISTIC REALIZATION OF RELATIONS Bunescu & Mooney, NIPS 2005
WORD-SEQUENCE KERNELS Two families of “basic” kernels – Global Context – Local Context Linear combination of kernels Explicit computation – Extremely sparse input representation
THE GLOBAL CONTEXT KERNEL
THE LOCAL CONTEXT KERNEL
LOCAL CONTEXT KERNEL (2)
KERNEL COMBINATION
EXPERIMENTAL RESULTS Biomedical data sets – AIMed – LLL Newspaper articles – Roth and Yih SEMEVAL 2007
EVALUATION METHODOLOGIES
EVALUATION (2)
EVALUATION (3)
EVALUATION (4)
RESULTS ON AIMED
NON-SUPERVISED METHODS FOR RELATION EXTRACTION Unsupervised relation extraction: – Hearst – Other work on extracting hyponymy relations – Extracting other relations: Almuhareb and Poesio, Cimiano and Wenderoth Semi-supervised methods – KNOW-IT-ALL
HEARST 1992, 1998: USING PATTERNS TO EXTRACT ISA LINKS Intuition: certain constructions typically used to express certain types of semantic relations E.g., for ISA: – The seabass IS A fish – Swimming, running AND OTHER activities – Vehicles such as cars, trucks and bikes
TEXT PATTERNS FOR HYPONYMY EXTRACTION HEARST 1998: NP {, NP}* {,} or other NP bruises …… broken bones, and other INJURIES HYPONYM (bruise, injury) EVALUATION: 55.46% precision wrt WordNet
THE PRECISION / RECALL TRADEOFF X and other Y: high precision, low recall X isa Y: low precision, high recall
HEARST’ REQUIREMENTS ON PATTERNS
OTHER WORK ON EXTRACTING HYPONYMY Caraballo ACL 1999 Widdows & Dorow 2002 Pantel & Ravichandran ACL 2004
OTHER APPROACHES TO RE Using syntactic information Using lexical features
Syntactic information for RE Pros: – more structured information useful when dealing with long-distance relations Cons: – not always robust – (and not available for all languages)
Semi-supervised methods Hearst 1992: find new patterns by using initial examples as SEEDS This approach has been pursued in a number of ways – Espresso (Pantel and Pennacchiotti 2006) – OPEN INFORMATION EXTRACTION (Etzioni and colleagues)
THE GENERIC SEMI-SUPERVISED ALGORITHM 1.Start with SEED INSTANCES Depending on algorithm, seed may be hand-generated or automatically obtained 2.For each seed instance, extract patterns from corpus Choice of patterns depends on algorithm 3.Output the best patterns according to some metric 4.(Possibly) iterate steps 2-3
THE ESPRESSO SEMI-SUPERVISED ALGORITHM 1.Start with SEED INSTANCES Hand-chosen 2.For each seed instance, extract patterns from corpus Generalization of whole sentence 3.Output the best patterns according to some metric A metric based on PMI 3.Do iterate steps 2-3
KNOW-IT-ALL A system for ontology population developed by Oren Etzioni and collaborators at the University of Washington
KNOW-IT-ALL: ARCHITECTURE
INPUT
BOOTSTRAPPING This first step takes the input domain predicates and the generic extraction patterns and produces domain-specific extraction patterns
EXTRACTION PATTERNS
EXTRACTOR Uses domain-specific extraction patterns + syntactic constraints – In “Garth Brooks is a country singer”, country NOT extracted as an instance of the pattern “X is a NP” Produces EXTRACTIONS (= instances of the patterns that satisfy the syntactic constraints)
ASSESSOR Estimates the likelihood of an extraction using POINTWISE MUTUAL INFORMATION between the extracted INSTANCE and DISCRIMINATOR phrases E.g., INSTANCE: Liege DISCRIMINATOR PHRASES: “is a city”
ESTIMATING THE LIKELIHOOD OF A FACT P(f | ) and P(f | ) estimated using a set of positive and negative instances
TERMINATION CONDITION KNOW-IT-ALL could continue searching for instances – But for instance, COUNTRY has only around 300 instances Stop: Signal-to-Noise ratio – Number of high probability facts / Number of low probability ones
OVERALL ALGORITHM
EVALUATION 5 classes: CITY, US STATE, COUNTRY, ACTOR, FILM
IE IN PRACTICE: THE GOOGLE KNOWLEDGE GRAPH “A huge knowledge graph of interconnected entities and their attributes”. Amit Singhal, Senior Vice President at Google “A knowledge based used by Google to enhance its search engine’s results with semantic-search information gathered from a wide variety of sources” 82
Based on information derived from many sources including Freebase, CIA World Factbook, Wikipedia Contains 570 million objects and more than 18 billion facts about and relationships between these different objects 83 INFORMATION IN THE GKG
84 Search for a person, place, or thing Facts about entities are displayed in a knowledge box on the right side INFORMATION IN THE GKG
What it looks like Web Results have not changed
What it looks like This is what’s new Map General info Upcoming Events Points of interest *The Type of information that appears in this panel depends on what you are searching for
Handling vague searches/homophones Prompt user to indicate more precisely exactly what it is they are looking for Displays results only relating to that meaning Eliminates other results in both the panel and web results
Example of this: Very General Results
Example of this: Shows Possible Results: Now user pick what they were looking for Lets assume user meant the TV show Kings
MORE COMPLEX SEMANTICS Modalities Temporal interpretation
ACKNOWLEDGMENTS Many slides borrowed from – Roxana Girju – Alberto Lavelli