Download presentation
Presentation is loading. Please wait.
Published byJeremy Baker Modified over 6 years ago
1
Reading Report on Hybrid Question Answering System
系统阅读 孙亚伟
2
Articles ISOFT at QALD-5: Hybrid question answering system over linked data and text data Seonyeong Park, Pohang University of Science and Technology In CLEF 2015 Working Notes Papers
3
Introduction System Description Experiment Conclusion Outline
System Architecture Basic Analysis Query Generation Semantic Answer Type Multi-information Tagged Text Database SPARQL query template generator Experiment Conclusion
4
Introduction Complex Questions
Where was the "Father of Singapore" born? Which Secretary of State was significantly involved in the United States‘ dominance of the Caribbean? Who is the architect of the tallest building in Japan? Which German mathematicians were members of the von Braun rocket group? How old was Steve Jobs' sister when she first met him? Answering these questions need information from both structured data and unstructured data
5
QA over linked data and text data Method: combine KBQA and IRQA
Introduction QA over linked data and text data Method: combine KBQA and IRQA Extract clues using IRQA If fail to find clues, generate SPARQL query
6
Multi-information Tagged Text Database SPARQL query template generator
System Description System Architecture Basic Analysis Query Generation Semantic Answer Type Multi-information Tagged Text Database SPARQL query template generator
7
System Architecture
8
Example
9
Ordinary NLP techniques
Basic Analysis Method Statistical and Rule Ordinary NLP techniques Tokenization, POS tagging, Dependency Parsing Keyword , Term , NE QA system oriented techniques Question to statement(Q2S) Lexical answer type (LAT) Phrase extraction Predicate phrase Prepositional phrase
10
Example
11
Question Sequential Phrases Queries
Query Generation Question Sequential Phrases Queries First query: concatenating the two rightmost phrases Next query: concatenation of the answer of the first query and the next rightmost phrase Repeat this process to find the answer Note: If fail to find clues, generate SPARQL query
12
Semantic Answer Type SAT Classification Type Set from DBpedia Features Keywords, LAT, type of each NE, Interrogative wh-word, predicate and its arguments Accuracy 71.42% (3-level type ontology) 84.62% (2-level type ontology)
13
Multi-information Tagged Text Database
Each Sentence in Wikipedia Plain text Tagged text with co-reference resolution and disambiguation information Title from which Wikipedia page the sentence is POS tagging, dependency parsing, and SRL result
14
SPARQL query template generator
Detect words from each question to extract the SPARQL template Questions including arithmetic information Comparative word such as ‘deeper’ , Superlative word such as ‘deepest’ Yes/No questions If not ‘wh-question’ nor ‘list-question’, then ‘yes/no question’ Simple questions Using lexical matching and semantic similarity, Map predicate to properties in KB
15
Experiment Test: QALD-5 hybrid question test dataset Data: Wikipedia, DBpedia 3.10
16
Success Question QGeneration Process
[53].Who is the architect of the tallest building in Japan?[Building] in Japan/tallest building/architect [1]. Find tallest building in Japan is “Tokyo_Skytree” [using IR] [2]. Map “architect” to “architect” Nikken_Sekkei [using sparql] [55].In which city where Charilie Chaplin’s half brothers born[City] Charlie Chaplin half brother /born [1].Find the half brother of Charlie Chaplin [using IR] [2].Select ?uri {Sydney_Chaplin birthplace?uri} [using sparql] “England”, “United Kingdom” and “London” [3]. Using Semantic answer type, filter “London”. [Filter] [58].Are there man-made lakes in Australia that are deeper than 100 meters?[Place] man-made lake Australia / deeper 100 [1]. Extract answer clues of “man -made lake Australia”. [using IR] [2]. Compare the length of each named entities and check the more than one river is deeper than 100 meters. [using sparql]
17
Error No find answer clue Question Question generation
[51].Where was the “Father of Singapore” born? [Place] “Father of Singapore” born/Where [52].Which Secretary of State was significantly involved in the United States‘ dominance of the Caribbean [ADMINISTRATIVE REGION] Unites’ dominance of the Carbbean/involve/ Secretary [56].Which German mathematicians were members of the von Braun rocket group?[Person] member von Braun rocket group /German mathematician [57].Which writers converted to Islam[Person] writer converted Islam [59].Which movie by the Coen brothers stars John Turturro in the role of a New York City playwright?[Place] role of a New York City playwright/Coen brother stars [60].Which of the volcanoes that erupted in 1550 is still active?[Place] volcanoes/erupted in 1550/active
18
Error Question Question generation Error
[54].What is the name of the Viennese newspaper founded by the creator of the croissant[Person] creator of croissant/ Viennese newspaper founded Cannot map the “founded” to predicate in DBpedia [60].Which of the volcanoes that erupted in 1550 is still active?[Place] volcanoes/erupted in 1550/active Cannot generate appropriate query to extract answer clue
19
Conclusion Hybrid QA System Future
Both KBQA approach and IRQA approach First search text data If the results are not appropriate or related to arithmetic, generate SPARQL query Future Extend query to find relevant answer clue Predicate Mapping Semantic Parsing Information Extraction
20
Tools Stanford co-reference tool: co-reference resolution
DBPedia Spotlight: map NEs in the question to entities in DBpedia Predicate Mapping lexical matching semantic similarity on explicit semantic analysis(ESA) ClearNLP Tokenization, POS tagging and dependency parsing WordNet: term extraction Lucene: Wikipedia index libSVM: SAT classifier
21
致谢 欢迎老师和同学提问!
22
讨论 ISOFT ISOFT优点是什么? ISOFT缺点是什么? 问题分解成短语 从右开始,文本检索
一旦文本检索不相关或与算术计算有关,进入SPARQL模块 最后利用答案类型过滤 ISOFT优点是什么? ISOFT缺点是什么?
23
讨论 Hybrid QA System 的关键是什么?
问句分析?Predicate-argument Structure? 问句拆分子句?子句如何求解?原子操作是什么? 短语检索?底层数据文本分析? Entity Linking?Predicate Mapping?Semantic Matching? SPARQL Construction? Candidate Answers Ranking? Filters? Textual Entailment? Paraphrase?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.