Zero-Shot Relation Extraction via Reading Comprehension

Zero-Shot Relation Extraction via Reading Comprehension
Omer Levy, Minjoon Seo, Eunsol Choi, Luke Zettlemoyer University of Washington Allen Institute for Artificial Intelligence I’d like to talk to you about how we can leverage recent progress in reading comprehension to achieve a very interesting result in relation extraction.

Relation Extraction (Slot Filling)
Relation: educated_at(x,?) Relation Extraction Model Entity: x = Turing So in this talk, I’m going to refer to a specific type of relation extraction: slot-filling. In this task, we’re given: a relation – like “educated at” an entity – like Alan Turing and a sentence – “Alan Turing graduated from Princeton”. The goal is to fill the missing slot in the relation from the information in the sentence, if possible. In this case, the answer is “Princeton”. Sentence: “Alan Turing graduated from Princeton.” Answer: Princeton

Relation Extraction (Slot Filling)
Relation: educated_at(x,?) Relation Extraction Model Entity: x = Turing Obviously, there are many cases where the sentence cannot complete the relation, in which case we expect the model to indicate that there is no answer. Sentence: “Turing was an English mathematician.” Answer: <null>

Reading Comprehension
Question: “Where did Turing study?” In the task of reading comprehension, we’re also given a text, but instead of a relation and an entity, we have a natural-language question – “Where did Turing study?” The goal is to answer the question from the text by selecting a span (or a set of spans) from the text. Sentence: “Alan Turing graduated from Princeton.” Reading Comprehension Model Answer: Princeton

Relation Extraction via Reading Comprehension
Relation: educated_at(x,?) Entity: x = Turing Now, our main observation is that the task of relation extraction can be reduced to the reading comprehension. What we basically need to do is translate the knowledge-base relation and entity Into a natural language question. Sentence: “Alan Turing graduated from Princeton.” Reading Comprehension Model Answer: Princeton

Relation: educated_at(x,?) Question Template: “Where did x study?” Querification Entity: x = Turing We do this by first converting the relation into a question template, a process we call “querification”, Sentence: “Alan Turing graduated from Princeton.” Reading Comprehension Model Answer: Princeton

Relation: educated_at(x,?) Question Template: “Where did x study?” Querification Entity: x = Turing Question: “Where did Turing study?” Instantiation And then simply instantiating the template with the entity, to produce a natural-language question. Sentence: “Alan Turing graduated from Princeton.” Reading Comprehension Model Answer: Princeton

Advantages So, what is it good for?
What do we get from reducing relation extraction to reading comprehension?

Advantage: Generalize to Unseen Questions
Provides a natural-language API for defining and querying relations educated_at(Turing, ?) ≈ “Where did Turing study?” “Which university did Turing go to?” Well, first of all, a model based on reading comprehension can generalize to unseen questions, Which allows us to use natural language questions like “where did someone study”, instead of relation identifiers like “educated at”, to both define our schema and also query it at test time.

Advantage: Generalize to Unseen Relations
Enables zero-shot relation extraction Train: educated_at, occupation, spouse, … Test: country Impossible for many relation-extraction systems And perhaps an even more interesting advantage is that it can generalize to questions about completely new relations, which were not even part of the schema during training. In other words, reducing relation extraction to reading comprehension allows us to do zero-shot relation extraction, Which is an impossible feat for many relation extraction models, including some of the Universal Schema approaches.

Challenges Translating relations into question templates
Schema Querification Generated over 30,000,000 examples Modeling reading comprehension Plenty of research on SQuAD (Rajpurkar et al, EMNLP 2016) Model based on BiDAF (Seo et al, ICLR 2017) Predicting negative instances Modified BiDAF can indicate no answer Naturally, there are also some technical challenges. For instance, how do we translate relations into questions in a scalable way, so that we can collect enough data to train our reading comprehension model? We’ll talk about schema querification, which is a very efficient process for doing this translation. In fact, it’s so efficient, that we were able to generate more than 30 million reading comprehension examples with a very modest budget. Once we have data, how do we actually model reading comprehension? Well, there’s been a huge amount of research on SQuAD over the past year, and it so happens that one of the more successful models was Minjoon’s bi-directional attention flow (or “BiDAF”) model, which we used. Now, SQuAD and relation extraction are not quite the same. In particular, SQuAD assumes that the raw text we’re given as input must contain the answer, But in relation extraction, most of the sentences in a given document are actually *not* going to fill the missing slot. To address this, we modified BiDAF to consider the “no answer” option in addition to any potential answer spans.

Challenges So let’s dive into the challenges first.

Instance Querification
educated_at(Turing, Princeton) → “Where did Turing study?” “Where did Turing graduate from?” “Which university did Turing go to?” And we’ll start with the problem of translating relations into questions. Some QA datasets were built by taking instances from a KB, like “educated_at(Turing, Princeton)”, and then asking people to generate natural-language questions from them. Now, this works well, but it also scales linearly with your budget, because you’re annotating at the instance level. And if you want to annotate millions of examples, it’s going to cost you. Problem: scaling to millions of examples Large-Scale Simple Question Answering with Memory Networks (Bordes et al, 2015)

Schema Querification: The Challenge
educated_at(x,?) → “Where did x study?” “Where did x graduate from?” “Which university did x go to?” So one way to avoid it is by annotating at the schema level. Instead of asking a question about a specific entity, we use a placeholder X. Now, this is actually a hard problem, because the relation name alone doesn’t give the annotator enough information. For example, can “educated at” also refer to the high school in which X studied? Or the country? Also, how do the annotators come up with the phrasing? Problem: not enough information

Schema Querification: Crowdsourcing Solution
Ask a single question about x whose answer is, for each sentence, the underlined spans. The wine is produced in the x region of France. x, the capital of Mexico, is the most populous city in North America. x is an unincorporated and organized territory of the United States. The x mountain range stretched across the United States and Canada. So what we want to do, is basically prime and constrain annotators with actual data. We give them 4 sentences with a placeholder X, and ask them to think of a single question about X Whose answer is, for each sentence, the underlined spans. <<<wait>>> So in this example, a good question would be: “In which country is X located?” Because it fits well with all four sentences. “In which country is x located?”

Dataset Annotated 120 relations from WikiReading (Hewlett et al, ACL 2016) Collected 10 templates per relation with high agreement Generated over 30,000,000 reading comprehension examples Generated negative examples by mixing questions about same entity So overall we crowdsourced questions for 120 relations from the WikiReading dataset, Collecting an average of 10 question templates per relation, which had high agreement, Allowing us to generate over 30 million reading comprehension examples. We also generated some negative examples by taking two examples about the same entity and then mixing their questions.

Reading Comprehension Model: BiDAF
Pre-trained word embeddings Character embeddings Bi-directional LSTMs for contextualization Special attention mechanism: Attends on both question and sentence Computed independently for each token in the sentence For our reading comprehension model, we used an adaptation of BiDAF. BiDAF follows a lot of standard practices like word embeddings, character embeddings, and bi-directional LSTMs, But it also has a special attention mechanism, that computes a weighted average of both the question and the sentence, for each token. Bi-Directional Attention Flow for Machine Comprehension (Seo et al, ICLR 2017)

Output Layer: Alan Turing graduated from [Princeton] [<null>] Begin: End: 0.1 0.3 0.1 0.1 0.4 Now the way BiDAF predicts the answer span, is by computing 2 softmaxes over the sentence: one to mark the beginning of the answer, and another for the end. 0.1 0.1 0.1 0.1 0.6 Bi-Directional Attention Flow for Machine Comprehension (Seo et al, ICLR 2017)

Output Layer: Alan Turing graduated from [Princeton] [<null>] Begin: End: 0.1 0.3 0.1 0.1 0.4 From each softmax, BiDAF basically take the index with the highest confidence, as long as the beginning appears before the end. 0.1 0.1 0.1 0.1 0.6 Bi-Directional Attention Flow for Machine Comprehension (Seo et al, ICLR 2017)

Predicting Negative Instances
Output Layer: Alan Turing graduated from [Princeton] [<null>] Begin: End: 0.01 0.03 0.01 0.01 0.04 0.9 Now, to handle negative examples, which don’t have an answer, we add a NULL token at the end of the sentence. 0.01 0.01 0.01 0.01 0.06 0.9 Add <null> token to the sentence

Predicting Negative Instances
Output Layer: Alan Turing graduated from [Princeton] [<null>] Begin: End: 0.01 0.03 0.01 0.01 0.04 0.9 If the null token is selected by the model, we predict “no answer”. 0.01 0.01 0.01 0.01 0.06 0.9 if argmax = <null>, predict no answer

Experiments So now that we’ve addressed the challenges, let’s see how our approach works in practice.

Generalizing to Unseen Questions
Model is trained on several question templates per relation “Where did Alan Turing study?” “Where did Claude Shannon graduate from?” “Which university did Edsger Dijkstra go to?” User asks about the relation using a different form “Which university awarded Noam Chomsky a PhD?” So first, we want to check if the model can generalize to questions that it hasn’t seen before. This basically tests what happens when the model is trained using several question templates that allude to the same relation, Like “where did turing study?” or “where did Shannon graduate from?”, And then at test-time, the user asks about the same relation, but uses a completely different template to phrase the question. For example, “which university awarded Chomsky a PhD?”

Generalizing to Unseen Questions
Experiment: split the data by question templates Performance on seen question templates: 86.6% F1 Performance on unseen question templates: 83.1% F1 Our method is robust to new descriptions of existing relations So we took our dataset and split it according to question templates, And that allowed us to test how well our model performed on instances with seen templates vs new/unseen templates. As you can see, there is some difference, which is expected, but it’s relatively small. And what this basically means it that our model is robust to new descriptions of relations that it saw during training.

Generalizing to Unseen Relations
Model is trained on several relations “Where did Alan Turing study?” (educated_at) “What is Ivanka Trump’s job?” (occupation) “Who is Justin Trudeau married to?” (spouse) User asks about a new, unseen relation “In which country is Seattle located?” (country) But what about relations that it didn’t see during training? How well can our model, or any other model, for that matter, generalize to a completely new relation? In this scenario, we train our model on a set of questions that pertain to a certain set of relations, Like “educated_at”, “occupation”, and “spouse”, And at test time, ask it questions about a completely different relation, like “country”.

Generalizing to Unseen Relations
Experiment: split the data by relations Results Random named-entity baseline: % F1 Off-the-shelf RE system: impossible BiDAF w/ relation name as query: 33.4% F1 BiDAF w/ querified relation as query: 39.6% F1 BiDAF w/ + multiple questions at test: 41.1% F1 So this time, we split the data according to relations, and tested how well our model, as well as some others, perform on unseen relations. As a simple unsupervised baseline, we just picked one of the named entities at random, and that gave us about 12 points F1. We also tried an off-the-shelf relation extraction model, but, as expected, it didn’t get anything correct. Now, it’s not that there’s anything wrong with the model, it just wasn’t designed for the zero-shot scenario. The same is true for many other models. Mathematically, the only way you can try to solve this problem with a supervised approach is by featurizing the relation. One way to do it is to use the relation’s name, the actual string, as a question, and that does much than the random basline. Now, when you use natural-language questions, which is basically what we’re proposing in this work, you get even better results. You can even improve those results a bit more if you allow the model to look at multiple questions during test time.

Why does a reading comprehension model enable zero-shot relation extraction?
It can learn answer types that are used across relations Q: When was the Snow Hawk released? S: The Snow Hawk is a 1925 film… It can detect paraphrases of relations Q: Who started the Furstenberg China Factory? S: The Furstenberg China Factory was founded by Johann Georg… So what is the reading comprehension model learning, that allows it to generalize to new relations? From analyzing the results, we found 2 interesting properties: First, the model is able to learn answer types that are common to many relations. For example, “when” typically refers to a date, and “where” is often a country or a city. It’s also able to detect paraphrases of relations, like “started” and “was founded by”. And we suspect that it’s able to do this with the help of pre-trained word embeddings.

Conclusion So, in conclusion,

Conclusion Relation extraction can be reduced to reading comprehension
Provides a natural-language API for defining and querying relations Enables zero-shot relation extraction Challenging dataset: nlp.cs.washington.edu/zeroshot/ We showed that the task of relation extraction can be reduced to reading comprehension, Providing a natural-language API for defining and querying relations, That can even extract new relation types that were never observed during training. This task is far from solved, so we’ve made all our code and data publicly available, In hope that we, as a community, can use this benchmark to advance research in reading comprehension. Thank you! Thank you!

Zero-Shot Relation Extraction via Reading Comprehension

Similar presentations

Presentation on theme: "Zero-Shot Relation Extraction via Reading Comprehension"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Zero-Shot Relation Extraction via Reading Comprehension

Similar presentations

Presentation on theme: "Zero-Shot Relation Extraction via Reading Comprehension"— Presentation transcript:

Similar presentations

About project

Feedback