QA Systems in QALD Hybrid Task Qingxia Liu 2016/02/29
an Overview of the Hybrid Task QALD-4 task3 25 training, 10 test, no system participate Which anti-apartheid activist was born in Mvezo? SELECT DISTINCT ?uri WHERE { ?uri rdf:type text:"anti-apartheid activist" . ?uri dbo:birthPlace res:Mvezo . } Which writers had influenced the philosopher that refused a Nobel Prize? ?x rdf:type dbo:Philosopher . ?x text:"refuse" text:"Nobel Prize" . ?x dbo:influencedBy ?uri . ?uri rdf:type dbo:Writer .
an Overview of the Hybrid Task QALD-5 task2 40 training, 10 test, 5 systems
ISOFT Authors Seonyeong Park, Soonchoul Kwon, Byungsoo Kim and Gary Geunbae Lee Pohang University of Science and Technology, South Korea Idea semantic answer type detection search on multi-tagged text database decompose question by contract right-most phrases 朴善英,浦项工科大学 SAT: train 3-level type ontology: 71.42% 2: 84.62%
if no good answer, generate SPARQL comparative -> prop + compare SAT classification: features kw, literal AT, (not use NE) libSVM training tags: coreference and disambiguation NL tags(POS, dependency, SRL) for each subquery search on sentences all NEs as candidates if no good answer, generate SPARQL comparative -> prop + compare predicate mapping by vector ESA lib 基于疑问词对问题进行分类
ISOFT Who is the architect of the tallest building in Japan? Are there man-made lakes in Australia that are deeper than 100 meters? In which city were Charlie Chaplin's half brothers born?
HAWK: ranking techniques Authors Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo University of Leipzig, Germany Axel-Cyrille Ngonga Ngomo among the organizers in QALD-3,4,5,6 Main Idea Generate hybrid queries by BFS search on dependency tree each node to a triple-pattern (at least one variable) Rank queries by training 3 ranking methods
Which anti-apartheid activist was born in Mvezo? Training: Optimal Ranking (determine bad parts) Feature-based Ranking (rank queries) Overlap-based Ranking (rank answers) Which anti-apartheid activist was born in Mvezo? remove meaningless nodes in the tree by POS-tags auxiliary tokens, e.g. did link classes, properties fuzzy string search use lexicon (lemon.dbpedia) born ( anti-apartheid activist, dbr:Mvezo) “born”: dbo:birthPlace, dbo:birthDate 1. SELECT ?proj {?proj text:query ’anti-apartheid activist’. ?proj dbo:birthPlace dbr:Mvezo.} 2. SELECT ?proj {?proj text:query ’anti-apartheid activist’. ?proj dbo:birthDate dbr:Mvezo.g} 3. SELECT ?proj {?proj text:query ’anti-apartheid activist’. ?const dbo:birthPlace ?proj.} BFS search on pruned tree: possible triple patterns according to node tag Pruning by features: unbound triple pattern e.g. (?s, ?p, ?o) unconnected cyclic no projection violating disjointness ESWC 2015 反种族隔离的,曼德拉 FUSEKI: 支持特定uri上文本信息的检索 feature-based ranking: 节点数、triple数、type数
HAWK Optimal Ranking > overlap-based , feature-based Errors: 不考虑yago: 26/40, 8/10, train on QALD-4 F-measure Errors: failing entity annotation e.g. Jane_T._Austion, G8, Los_Alamos missing type information in the gold standard resource
YodaQA: search-based methods Authors Petr Baudiš and Jan Šedivy Czech Technical University, Czech Republic NLP, ML, speech recognition Main Idea A modular QA system pipeline Search –based methods generate answers by passage analysis Tricks Goal: factoid single-answer questions List-questions: top15 Boolean questions:always return true Extract focus: 6 simple hand-crafted heuristics 原本是基于非结构化数据, 捷克 篇章(这里即句子)
Who wrote Ender’s Game? Title Text Search: title matching (logistic regression) (get related sentences) (type coerion) (class, prop mapping) Focus: who, SV: wrote, LAT: person question Cluse: Ender’s Game, wrote title: 对实体排序,取句子; full-text:先对实体排序并取top6,再对各句子打分,取各文档内top3; doc:直接取实体; lexical answer type (LAT) type coerion: wordNet hypernymy relation Who wrote Ender’s Game? Title Text Search: title matching (first sentences in top 6 docs) Full-text Search: title + article sentences (top 3 passages in each doc in top 6 docs) Document Search: search in wiki doc (top 20 docs)
Errors: missing type information in the gold standard resource
Error Cases id ISOFT HAWK 1 Where was the "Father of Singapore" born? (unable to generate SPARQL query) 2 Which Secretary of State was significantly involved in the United States' dominance of the Caribbean? (wrong answer) 3 Who is the architect of the tallest building in Japan? √ 4 What is the name of the Viennese newspaper founded by the creator of the croissant? (predicate URI not found) 5 In which city were Charlie Chaplin's half brothers born? partial 6 Which German mathematicians were members of the von Braun rocket group? 7 Which writers converted to Islam? 8 Are there man-made lakes in Australia that are deeper than 100 meters? 9 Which movie by the Coen brothers stars John Turturro in the role of a New York City playwright? 10 Which of the volcanoes that erupted in 1550 is still active? (cannot generate appropriate query) property How many scientists graduated from an Ivy League university? SELECT DISTINCT count (? uri) WHERE { ?uri rdf: type dbo: Scientist . ?uri dbo: almaMater ? university . ? university dbo: affiliation dbr:Ivy_League . } Which animals are critically endangered? SELECT DISTINCT ? uri WHERE { ?uri rdf: type dbo: Animal . ?uri dbo: conservationStatus 'CR ' . }
Summary Components ISOFT HAWK YodaQA basic NLP deep NLP dependency tree keywords, focus, LAT deep NLP co-reference textual solution phrase-based search text:query clue-based search structural solution SPARQL template trigged by words dependency tree node to triple class, property mapping question decomposition concatenating the two rightmost phrases and find answer errors missing type info in gold standard resource failing entity annotation
Thank you ~