Unsupervised Relation Detection using Automatic Alignment of Query Patterns extracted from Knowledge Graphs and Query Click Logs Panupong PasupatDilek Hakkani-Tür Stanford UniversityMicrosoft Research
Spoken Language Understanding (SLU) Input: Transcribed query (e.g., “Who played Jake Sully in Avatar”) Output: Semantic information (e.g., dialog acts, slot values, relations) Speech Recognition Spoken Language Understanding Dialog Management Natural Language Generation Speech Synthesis
Knowledge Graph Relations A knowledge graph contains entities and relations. Avatar Action Sci-fi James Cameron Sam Worthington Jake Sully genre directed by Initial release date starring actor character
Knowledge Graph Relations A knowledge graph contains entities and relations. Determining the correct KG relations is an important step toward finding the correct response to a query. Avatar ??? Jake Sully starring actor character “Who played Jake Sully in Avatar”
Task: Relation Detection Inputs: ◦ Natural language query “Who played Jake Sully in Avatar” ◦ KG relations of interest Output: ◦ List of all KG relations expressed in the query acted by, movie character, character name, movie name
Types of Relations Explicit Relations Who playedJake Sullyin Avatar character name ◦ The value of the relation is in the query ◦ Very similar to semantic slots Implicit Relations Who playedJake Sully in Avatar movie actor ◦ The value of the relation is not explicitly stated
Approach 1.Mine queries related to the entities of interest 2.Infer explicit and implicit relations in the mined queries 3.Use the annotated queries to train a classifier
Approach 1.Mine queries related to the entities of interest 2.Infer explicit and implicit relations in the mined queries 3.Use the annotated queries to train a classifier
Mining Entities Given a domain of interest (e.g., movie), we will mine relevant entities from KGs. Avatar Start with entities from the central type (e.g., movie).
Mining Entities Given a domain of interest (e.g., movie), we will mine relevant entities from KGs. Avatar Action Sci-fi James Cameron 2009 Sam Worthington Jake Sully genre directed by Initial release date starring actor character Traverse edges in KG to get a related entities. (All entities shown here, including Avatar itself, are valid entities.) (identity)
Mining Queries After we get an entity of interest, we mine queries that are related to that entity. Avatar James Cameron directed by
Query Click Log (QCL) Our queries come from query click logs (QCLs). A query click log is a weighted graph between search queries and URLs that the search engine users click on. james cameron movies cameron 2009 movie avatar nm en.wikipedia.org/wiki/ Avatar_(2009_film)
Mining Queries Method 1: Construct seed queries (by applying templates on the entity), and then traverse the QCL twice. Avatar James Cameron directed by james cameron films action movies by james cameron … Does not perform as well as expected due to lexical ambiguities (e.g., comic character Flash “flash movie”).
Mining Queries Method 2: Get URLs of the entity from KG, and then traverse the QCL once. Avatar James Cameron directed by action movies by james cameron Gives better queries in general, but cannot be applied to some entity types (e.g., dates like 2009). en.wikipedia.org/ James_Cameron URL We will use this method in the experiments.
Approach 1.Mine queries related to the entities of interest 2.Infer explicit and implicit relations in the mined queries 3.Use the annotated queries to train a classifier
Inferring Explicit Relations Avatar (identity) Who played Jake Sully in Avatar mined from QCL
Inferring Explicit Relations Idea: If a query is mined from an entity e, it should explicitly contain either some other entities related to e, or e itself. Avatar (identity) Who played Jake Sully in Avatar mined from QCL Jake Sully starring.character e = character name
Inferring Explicit Relations Idea: If a query is mined from an entity e, it should explicitly contain either some other entities related to e, or e itself. Avatar (identity) Who played Jake Sully in Avatar mined from QCL Avatar (identity) movie name e =
Inferring Explicit Relations Bonus: By inferring all explicit relations, we get an automatic slot annotation. Avatar (identity) Who played Jake Sully in Avatar mined from QCL Avatar (identity) movie name e = character name
Inferring Implicit Relations Sometimes the entity e is absent from the query. Avatar James Cameron director Who directed the movie Avatar mined from QCL e = ∉
Inferring Implicit Relations Idea: If the entity e is absent from the query, then we infer that e is the object of an implicit relation. Avatar James Cameron director Who directed the movie Avatar mined from QCL e = directed by ∉
Inferring Implicit Relations Bonus: By collapsing entities related to e into placeholders, we get generic patterns for implicit relations. Avatar James Cameron director Who directed the movie [film] mined from QCL e = directed by ∉
Inferring Implicit Relations Bonus: By collapsing entities related to e into placeholders, we get generic patterns for implicit relations. directed byacted by director of [film][profession] in [film] who directed [film][character] from [film] [film] the moviewho played [character] [film] directorcast of [film] Example Frequent Patterns
Approach 1.Mine queries related to the entities of interest 2.Infer explicit and implicit relations in the mined queries (produces 2 datasets: D E for inferred explicit relations and D I for inferred implicit relations) 3.Use the annotated queries to train a classifier
Approach 1.Mine queries related to the entities of interest 2.Infer explicit and implicit relations in the mined queries (produces 2 datasets: D E for inferred explicit relations and D I for inferred implicit relations) 3.Use the annotated queries to train a classifier i.Train an implicit relation classifier on D I ii.Apply the implicit relation classifier on queries in D E and augment the predicted implicit relations to D E iii.Train a final classifier on the augmented D E (Classifiers are multiclass multilabel linear classifiers trained using AdaBoost on decision tree stumps.)
Experiments Dataset: ◦ Movie domain relation dataset (Hakkani-Tür et al., 2014) ◦ 3338 training / 1084 test ◦ Features: n-grams + weighted gazetteers
Main Results ClassifierMicro F1 Majority27.6 Chen et al., SLT 2014 (also unsupervised)43.3 Mine queries with URLs from KG trained on D E only42.7 trained on D I only29.3 final classifier55.5 Both datasets ( D E and D I ) help boost the performance of the final classifier.
Main Results ClassifierMicro F1 Majority27.6 Chen et al., SLT 2014 (also unsupervised)43.3 Mine queries with URLs from KG trained on D E only42.7 trained on D I only29.3 final classifier55.5 supervised86.0 semi-supervised (self-training)86.5 The bootstrapped classifier also improves the accuracy of the full supervised model.
Conclusion We have presented techniques for: 1. Mining queries related to the domain of interest. 2. Infer explicit and implicit relations in the mined queries. 3. Train a classifier to detect both types of relations without any hand-labeled data. As by-products, we also get automatic slot annotations and implicit relation patterns. Thank you!