Download presentation
Presentation is loading. Please wait.
Published byJeffrey Bradley Modified over 6 years ago
1
YAGO-QA Answering Questions by Structured Knowledge Queries
Peter Adolphs Martin Theobald Ulrich Schäfer Hans Uszkoreit Gerhard Weikum ICSC Stanford University September 19, 2011
2
Jeopardy! A big US city with two airports, one named after a World War II hero, and one named after a World War II battle field? YAGO-QA: Answering Questions by Structured Knowledge Queries
3
Deep-QA in NL William Wilkinson's "An Account of the Principalities of Wallachia and Moldavia" inspired this author's most famous novel This town is known as "Sin City" & its downtown is "Glitter Gulch" As of 2010, this is the only former Yugoslav republic in the EU 99 cents got me a 4-pack of Ytterlig coasters from this Swedish chain YAGO knowledge backends question classification & decomposition D. Ferrucci et al.: Building Watson: An Overview of the DeepQA Project. AI Magazine, 2010. YAGO-QA: Answering Questions by Structured Knowledge Queries
4
Structured Knowledge Queries
A big US city with two airports, one named after a World War II hero, and one named after a World War II battle field? Select Distinct ?c Where { ?c type City . ?c locatedIn USA . ?a1 type Airport . ?a2 type Airport . ?a1 locatedIn ?c . ?a2 locatedIn ?c . ?a1 namedAfter ?p . ?p type WarHero . ?a2 namedAfter ?b . ?b type BattleField . } In this work: focus on factoid and list questions YAGO-QA: Answering Questions by Structured Knowledge Queries
5
Agenda YAGO Server & API Names, Surface Patterns & Paraphrases
Wikipedia-based information extraction Searching & ranking in large RDF graphs Names, Surface Patterns & Paraphrases Named entity disambiguation Mapping surface patterns onto semantic relations Crowdsourcing for questions paraphrases YAGO-QA Architecture Template-based mapping of NL questions onto SPARQL Conclusions & Future Work YAGO-QA: Answering Questions by Structured Knowledge Queries
6
Information Extraction from Wikipedia
Subj. Pred. Obj. Stanford University type Private University hasPresident J.L.Hennessy hasStudents 15,319 foundedBy L.Stanford foundedIn 1891 … YAGO-QA: Answering Questions by Structured Knowledge Queries
7
YAGO Knowledge Base Combine knowledge from WordNet & Wikipedia
Additional Gazetteers (geonames.org) Part of the Linked- Data cloud YAGO-QA: Answering Questions by Structured Knowledge Queries
8
YAGO-2 Numbers www.mpi-inf.mpg.de/yago-naga/ 104 114 364,740 2,641,040
Just Wikipedia Incl. Gazetteer Data #Relations 104 114 #Classes 364,740 #Entities 2,641,040 9,804,102 #Facts 120,056,073 461,893,127 - types & classes 8,649,652 15,716,697 - base relations 25,471,211 196,713,637 - space, time & proven. 85,935,210 249,462,793 Size (CSV format) 3.4 GB 8.7 GB estimated precision > 95% (for base relations excl. space, time & provenance) YAGO-QA: Answering Questions by Structured Knowledge Queries
9
Searching & Ranking RDF Graphs in NAGA
Ranking based on confidence, compactness and relevance Discovery queries: $x Nobel prize hasWon $a diedOn $y hasSon $b > Kiel $x scientist type bornIn Connectedness queries: Thomas Mann Goethe * German novelist type Queries with regular expressions: Ling $x scientist type hasFirstName | hasLastName $y Zhejiang locatedIn* worksFor Beng Chin Ooi (coAuthor | advisor)* YAGO-QA: Answering Questions by Structured Knowledge Queries
10
YAGO Server: UI & API % YAGO-QA: Answering Questions by Structured Knowledge Queries
11
YAGO Server: UI & API YAGO-UI YAGO-API Interactive online demo
RDF with time, space & provenance annotations SPARQL + keywords YAGO-API Two basic WebServices: processQuery (String query) getYagoEntitiesByNames (String[] names) … YAGO-QA: Answering Questions by Structured Knowledge Queries
12
Names, Surface Patterns & Paraphrases
Which chemist was born in London? (I) Named entity disambiguation chemist wordnet_chemist, wordnet_pharmacist born Bertran_de_Born, Born_Identity_(Movie), Born_(Album) London London_UK, London_Arkansas, Antonio_London (II) Mapping surface patterns onto semantic relations <person> was_born_in <location> bornIn(<person>, <location>) <person> was_born_in <date> bornOn(<person>, <date>) (III) Paraphrases of questions <person> [was] born in <location> <location>-born <person> NN VBD VBN IN NNP/LOC bornIn(<person>, <location>) YAGO-QA: Answering Questions by Structured Knowledge Queries
13
(I) Named Entity Disambiguation
Paris 32,362 Paris, France 570 Paris Masters 134 Paris (mythology) 118 University of Paris 79 Paris, Texas 56 Paris, Ontario 45 Paris (rapper) 29 Open Gaz de France 26 Paris, Kentucky 20 Paris (2008 film) 19 Gare Saint-Lazare 18 Paris, Tennessee 17 BNP Paribas Masters 16 Paris, Maine 14 Paris Hilton 12 Paris, Arkansas 11 Paris (Supertramp album) 10 Gare du Nord 9 Paris (1979 TV series) 8 Count Paris 7 Palais Omnisports de Paris-Bercy 6 Paris, Virginia 5 Paris 2012 Olympic bid 4 Paris (2003 film) 3 #inlinks with anchor “Paris” Wikipedia link structure 65,872,435 intra-wiki links 2,782,297 disambiguation pages & 328,372 redirects 2,886,027 distinct link anchor texts YAGO “means” relation 18,470,099 mappings of names to entities 6.2 distinct names per entity (on avg.) Individual name disambiguation vs. joint disambiguation AIDA tool for graph-based disambiguation in YAGO-2: “Robust Disambiguation of Named Entities in Text” J. Hoffart et al. In EMNLP, Edinburgh, Scotland, 2011 YAGO-QA: Answering Questions by Structured Knowledge Queries
14
(II) From Patterns to Semantic Relations
PROSPERA – statistical pattern mining from free-text Domain-oriented extraction of patterns for known relations (POS-enhanced n-grams) X carried out his doctoral research in math under the supervision of Y X { carried out PRP doctoral research [IN NP] [DET] supervision [IN] } Y Confidence & support based on seeds & counter seeds Pattern/fact-duality & consistency reasoning 10s to 100s of typed patterns per relation pattern-fact duality functional dependencies type constraints inclusion dependencies Spouse(x,y): x y, y x Spouse Person Person occurs(p,x,y) expresses(p,R) R(x,y) occurs(p,x,y) R(x,y) expresses(p,R) capitalOfCountry cityOfCountry YAGO-QA: Answering Questions by Structured Knowledge Queries
15
PROSPERA Architecture
Gathering: Enhanced Hearst patterns POS-enhanced n-grams Pattern-fact duality & constraints Analysis: Refined pattern weights Carefully chosen seeds and counter seeds Thresholds for pattern confidence & support Reasoning: Scalable extraction & consistency reasoning MapReduce functions for pattern extraction & statistics gathering Distributed MaxSat solver (MAP Inference) YAGO-QA: Answering Questions by Structured Knowledge Queries
16
(III) Crowdsourcing for Question Paraphrases
Pattern acquisition from the crowd Annotators paraphrase natural- language seed questions Seed questions are associated with their semantic arguments and functions Gold resource for pattern acquisition and system evaluation Preliminary results 4,620 paraphrases for 254 seed questions with 7 annotators Total annotation time: ~49 hours, ~1 work-day per annotator YAGO-QA: Answering Questions by Structured Knowledge Queries
17
YAGO-QA Architecture Input analysis Input interpretation
SProUT for tokenization, stemming & NER ( NE gazetteer extended by YAGO entities Input interpretation Named-entity disambiguation based on YAGO statistics Vague matching against the gathered question paraphrases YAGO-QA: Answering Questions by Structured Knowledge Queries
18
YAGO-QA Architecture (ct’d)
Input interpretation / Answer retrieval An actor whose place of birth is Chicago. Which actor was born in Chicago ? Which <actor> was_born_in <Chicago> ? ?x type ARG1 . ?x bornIn ARG2 . Template-based answer generation Who/what is/are <?x> ? YAGO-QA: Answering Questions by Structured Knowledge Queries
19
YAGO-QA Example Multiple named entity annotations: all names are annotated Interpretation picks suitable NE readings Vague matching against surface templates YAGO-QA: Answering Questions by Structured Knowledge Queries
20
Conclusions & Future Work
QA based on structured knowledge queries (beyond IR-style retrieval of matching sentences/paragraphs) Wikipedia as rich knowledge backend Entities, semantic classes & typed relations Large-scale statistics for entity disambiguation & surface patterns Crowdsourcing for question paraphrases Predefined question templates translated into join queries Future work “Open-QA” via open-domain information extraction Dynamic learning of template structures from grammars More modular template structures YAGO-QA: Answering Questions by Structured Knowledge Queries
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.