Natural Language Questions for the Web of Data 1 Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany 2 Shady Elbassuoni.

Natural Language Questions for the Web of Data 1 Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany 2 Shady Elbassuoni Qatar Computing Research Institute 3 Maya Ramanath Dept. of CSE, IIT-Delhi, India 4 Volker Tresp 4 Siemens AG, Corporate Technology, Munich, Germany EMNLP 2012

Q NL Translation to Q NL : Natural Language Questions “Which female actor played in Casablanca and is married to a writer who was born in Rome?”. QFL : SPARQL 1.0 ?x hasGender female ?x marriedTo ?w ?x isa actor?w isa writer ?x actedIn Casablanca_(film) ?w bornIn Rome Characteristics of SPARQL : Complex query good results Difficult for the user Translation

Natural Language Questions for the Web of Data 3 Yago2 YAGO2s is a huge semantic knowledge base, derived from Wikipedia, WordNet and GeoNames. Relation ClassEntities

Architecture of DEANNA.

Phrase detection A detected phrase p is a pair Toks : phrase l : label (l ∈ {concept, relation}) 5 Natural Language Questions for the Web of Data Phrase detection Q NL Phrase P r : { } P c : { }

Phrase detection e.q. “Which female actor played in Casablanca and is married to a writer who was born in Rome?” Search instances of the means relation in Yago2 concept phrase detection :

Phrase detection relation phrase detection : rely on a relation detector based on ReVerb (Fader et al., 2011) with additional POS tag patterns e.q. “Which female actor played in Casablanca and is married to a writer who was born in Rome?”

Phrase Mapping to map concept phrases: also Search instances of the means relation in Yago2 to map relation phrases: rely on a corpus of textual patterns to relation mappings e.q. “Which female actor played in Casablanca and is married to a writer who was born in Rome?” textual patterns relation Phrase Mapping Phrase Mapping

Q-Unit Generation Mapping Candidate graph Dependency parsing : q-unit is a triple of sets of phrases

Q-Unit Generation Dependency parsing : identifies triples of tokens:, where t rel, t arg1, t arg2 ∈ q NL who was born in Rome? nsubjpass(born-3, who-1) auxpass(born-3, was-2) root(ROOT-0, born-3) prep_in(born-3, Rome-5) e.q. born whoRome t rel t arg1 t arg2 root nsubjpass in,

Q-Unit Generation q-unit is a triple of sets of phrases,t rel ∈ p rel, t arg1 ∈ p arg1, and t arg2 ∈ p arg2. triples of tokensphrase

Joint Disambiguation 1.each phrase is assigned to at most one semantic item 2.resolves the phrase boundary ambiguity ( only nonoverlapping phrases are mapped ) Rule

Joint Disambiguation Disambiguation Graph Joint disambiguation takes place over a disambiguation graph DG = (V, E), – V = V s ∪ V p ∪ V q – E = E sim ∪ E coh ∪ E q

V = V s ∪ V p ∪ V q V q : a set of placeholder nodes for q–units Joint Disambiguation V s : the set of s-node (s-node is semantic items) V p : the set of p-node p-node is phrases V rp : the set of relation phrases V rc : the set of concept phrases Disambiguation Graph

E q ⊆ V q × V p × d, d ∈ {rel, arg1, arg2} Called q-edge E = E sim ∪ E coh ∪ E q E sim ⊆ V p × V s a set of weighted similarity edges E coh ⊆ V s × V s a set of weighted coherence edges Disambiguation Graph

Edge Weights Coh sem (Semantic Coherence) – between two semantic items s1 and s2 as the Jaccard coefficient of their sets of inlinks. Three kinds of inlink – InLinks(e) – InLinks(c) – InLinks(r)

InLinks(e) InLinks(e): the set of Yago2 entities whose corresponding Wikipedia pages link to the entity. e.q. – Let e = Casablanca – InLinks(Casablanca) = {Marwan_al-Shehhi, Ingrid_Bergman, …, Morocco…} Natural Language Questions for the Web of Data 17

InLinks(c) InLinks(c) = ∪ e ∈ c Inlinks(e) e.q. let c = wikicategory_Metropolitan_areas_of_Morocco – InLinks(wikicategory_Metropolitan_areas_of_Morocco) = InLinks(Casablanca) ∪ InLinks(Marrakech) ∪ InLinks(Fes) ∪ InLinks(Agadir) ∪ InLinks(Safi,_Morocco) ∪ InLinks(Oujda) ∪ InLinks(Tangier) ∪ InLinks(Rabat) Natural Language Questions for the Web of Data 18

InLinks(r) InLinks(r) = ∪ (e1, e2) ∈ r (InLinks(e 1 ) ∩ InLinks(e 2 )) Natural Language Questions for the Web of Data 19

Similarity Weights For entities – how often a phrase refers to a certain entity in Wikipedia. For classes – reflects the number of members in a class For relations – reflects the maximum n-gram similarity between the phrase and any of the relation’s surface forms 20 Natural Language Questions for the Web of Data

Disambiguation Graph Processing The result of disambiguation is a subgraph of the disambiguation graph, yielding the most coherent mappings. We employ an ILP to this end. 21 Natural Language Questions for the Web of Data

Definitions (part1) 22 Natural Language Questions for the Web of Data

Definitions (part2) 23 Natural Language Questions for the Web of Data

objective function 24 Natural Language Questions for the Web of Data

Constraints(1~3) 25 Natural Language Questions for the Web of Data

Constraints(4~7) 26 Natural Language Questions for the Web of Data

Constraints(8~9) This is not invoked for existential questions 27 Natural Language Questions for the Web of Data

resulting subgraph for the disambiguation graph of Figure 3 28 Natural Language Questions for the Web of Data

Query Generation not assign subject/object roles in triploids and q-units Example: – “Which singer is married to a singer?” ?x type singer, ?x marriedTo ?y, and ?y type singer 29 Natural Language Questions for the Web of Data

5 Evaluation Datasets Evaluation Metrics Results & Discussion 30 Natural Language Questions for the Web of Data

Datasets author's experiments are based on two collections of questions: – QALD-1 1st Workshop on Question Answering over Linked Data (QALD-1) the context of the NAGA project – NAGA collection The NAGA collection is based on linking data from the Yago2 knowledge base Training set – 23 QALD-1 questions – 43 NAGA questions Test set – 27 QALD-1 questions – 44 NAGA questions Get hyperparameters (α, β, γ) in the ILP objective function. 19 QALD-1 questions in Test set 31 Natural Language Questions for the Web of Data

Evaluation Metrics author evaluated the output of DEANNA at three stages – 1. after the disambiguation of phrases – 2. after the generation of the SPARQL query – 3. after obtaining answers from the underlying linked- data sources Judgement – two human assessors who judged whether an output item was good or not – If the two were in disagreement, then a third person resolved the judgment. 32 Natural Language Questions for the Web of Data

disambiguation stage The task of judges – looked at each q-node/s-node pair, in the context of the question and the underlying data schemas, – determined whether the mapping was correct or not – determined whether any expected mappings were missing. 33 Natural Language Questions for the Web of Data

query-generation stage The task of judges – Looked at each triple pattern – determined whether the pattern was meaningful for the question or not – whether any expected triple pattern was missing. 34 Natural Language Questions for the Web of Data

query-answering stage the judges were asked to identify if the result sets for the generated queries are satisfactory. 35 Natural Language Questions for the Web of Data

Micro-averaging aggregates over all assessed items regardless of the questions to which they belong. Macro-averaging first aggregates the items for the same question, and then averages the quality measure over all questions. For a question q and item set s in one of the stages of evaluation correct(q, s) : the number of correct items in s ideal(q) : the size of the ideal item set retrieved(q, s) : the number of retrieved items define coverage and precision as follows: cov(q, s) = correct(q, s) / ideal(q) prec(q, s) = correct(q, s) / retrieved(q, s). 36 Natural Language Questions for the Web of Data

37 Natural Language Questions for the Web of Data

Conclusions Author presented a method for translating natural language questions into structured queries. Although author’s model, in principle, leads to high combinatorial complexity, they observed that the Gurobi solver could handle they judiciously designed ILP very efficiently. Author’s experimental studies showed very high precision and good coverage of the query translation, and good results in the actual question answers. 38 Natural Language Questions for the Web of Data

Natural Language Questions for the Web of Data 1 Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany 2 Shady Elbassuoni.

Similar presentations

Presentation on theme: "Natural Language Questions for the Web of Data 1 Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany 2 Shady Elbassuoni."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Natural Language Questions for the Web of Data 1 Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany 2 Shady Elbassuoni.

Similar presentations

Presentation on theme: "Natural Language Questions for the Web of Data 1 Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany 2 Shady Elbassuoni."— Presentation transcript:

Similar presentations

About project

Feedback