Lecture 24: Relation Extraction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 CS6501-NLP
Goal Acquire structured knowledge from text CS6501-NLP
Information extraction Entities recognition Identify name entities: People, Organization, Location, Times, Dates, etc. or genes, proteins, diseases, etc. Relation extraction Location in, employed by, married to CS6501-NLP
Example CS6501-NLP
Why relation extraction? Create structured knowledge bases Augment structured knowledge bases Support question answering The first step for event extraction and storyline extraction … CS6501-NLP
Relation types (closed domain) 17 relations from Automated Content Extraction (ACE) Credit: Dan Jurafsky CS6501-NLP
Relation types (closed domain) UMLS: Unified Medical Language System 134 entity types, 54 relations CS6501-NLP
Relation types (open domain) Freebase: thousand relations/million entities CS6501-NLP
Wikipedia Infobox CS6501-NLP
|undergrad = 15,669<ref name=facts/> |postgrad = 6,316<ref name=facts/> |city = [[Charlottesville, Virginia|Charlottesville]]|state = [[Virginia]]|country = U.S. |campus = [[Charlottesville, Virginia metropolitan area|Small city]]<br />{{convert|1682|acre|km2}}<br />[[World Heritage Site]] CS6501-NLP
How to build relation extractors (closed domain) Hand-written patterns Supervised machine learning Take each sentence as input Identify name entities (mentions) Perform multi-class classifications + constraints or features to model correlations CS6501-NLP
CS6501-NLP
How to build relation extractors (open domain) Bootstrap learning [Brin 98, …] Use seed instances to extract a set of relational patterns Unsupervised learning Cluster sentences based on relational patterns Distant supervision Distant supervision for relation extraction without labeled data [Mintz 09+] Combine the above approaches CS6501-NLP
A follow-up approach: Relation Extraction with Matrix Factorization and Universal Schemas [Riedel 13+] CS6501-NLP