Reading Report: A Survey on QA

Reading Report: A Survey on QA
论文阅读瞿裕忠南京大学计算机系

Articles Oleksandr Kolomiyets, Marie-Francine Moens: A survey on question answering technology from an information retrieval perspective. Inf. Sci. 181(24): (2011) L. Hirschman, R. Gaizauskas, Natural language question answering: The view from here, Natural Language Engineering 7 (4) (2001) 275–300.

Outline Preliminary A Short History Performance Metrics Methods
Applications of QA Issues The Future of QA

Preliminary Question Question type
A natural language sentence //interrogative word Statement //an imperative construct and starts with a verb Question type Confirmation Factoids List // List/Name [me] [at least 3] //A list of entities or facts Relationship Definition //What is, descriptive question Causal //explanation of an event or artifact, like Why. Procedural //a list of instructions for accomplishing a task Opinion //about an entity or an event Hypothetical //What would happen if

Preliminary Information source Unstructured data Semi-structured
A collection of information objects (documents, video, audio, text, files or databases) available to the question answering system for extracting answers. Unstructured data Text/Image Semi-structured XML Structured data RDB KB/Rules

Preliminary RDQA, restricted domain question answering
Users in a specific domain of competence Rely on manually constructed data or knowledge sources ODQA, open domain question answering Regardless of the subject domain Extracting answers from a large corpus of textual documents Retrieval model The representation of the information sources/need The retrieval function, or ranking function

A Short History NLIDB (Natural language interfaces to databases) [8, 95] 1960s -1970s BASEBALL (1961), LUNAR (1972) Manually build analysis patterns embedded in a domain-specific vocabulary Deductive question answering 1980s and 1990s, knowledge base systems Reasoning and explanation MYCIN [132], SHRDLU [148] The first Web-based QA system START, on the Web since 1993 Contemporary QA, since 1999 WWW, 1990 W3C, 1994 Semantic Web, 2001 SPARQL, 2008

A Short History -- Evaluations
The annual Text Retrieval Conference (TREC),1992 QA, The Cross-Language Evaluation Forum (CLEF), 2000 Multiple Language QA (2008) QALD since 2011, QALD (hybrid QA track) since 2014 The Text Analysis Conference (TAC), NIST, 2008 QA (2008) KBP since 2009 Semantic Evaluation (SemEval) , 1998, 2007 QA since 2015 NII Test Collection for IR systems (NTCIR) Workshop, 1998 IR, QA, text summarization, extraction

A Short History -- SemEval
SemEval (Semantic Evaluation) Evaluations of computational semantic analysis systems Evolved from the Senseval series (WSD) *SEM conference Senseval-1 (1998), Senseval-2 (2001), Senseval-3 (2004), SemEval-2007, SemEval-2010, SemEval-2012, then yearly co-located with NAACL 2012, NAACL 2013 COLING 2014, NAACL-HLT 2015

A Short History -- SemEval
Workshop No. of Tasks Areas of study Languages of Data Evaluated SE12 8 Common Sense Reasoning, Lexical Simplification, Relational Similarity, Spatial Role Labelling, Semantic Dependency Parsing, Semantic and Textual Similarity Chinese, English SE13 14 Temporal Annotation, Sentiment Analysis, Spatial Role Labeling, Noun Compounds, Phrasal Semantics, Textual Similarity, Response Analysis, Cross-lingual Textual Entailment, BioMedical Texts, Cross and Multilingual WSD, Word Sense Induction, and Lexical Sample Catalan, French, German, English, Italian, Spanish SE14 10 Compositional Distributional Semantic, Grammar Induction for Spoken Dialogue Systems, Cross-Level Semantic Similarity, Sentiment Analysis, L2 Writing Assistant, Supervised Semantic Parsing, Clinical Text Analysis, Semantic Dependency Parsing, Sentiment Analysis in Twitter, Multilingual Semantic Textual Similarity English, Spanish, French, German, Dutch, SE15 18 (incl. 1 cancelled) Text Similarity and Question Answering, Time and Space, Sentiment, Word Sense Disambiguation and Induction, Learning Semantic Relations English, Spanish, Arabic, Italian SE16 Textual Similarity and Question Answering, Sentiment Analysis, Semantic Parsing, Semantic Analysis, Semantic Taxonomy

A Short History -- TAC TAC Tracks All Tracks 2008 2009 2010 2011 2012
2013 2014 Question Answering Recognizing Textual Entailment Summarization Knowledge Base Population

A Short History -- Contemporary QA
key QA-techniques Expected answer type [45,84,152] Syntactic and semantic structures for QA [51,7,27,49] Semantic-role labelling for QA [93,97,130] Discourse relationships and textual entailment for QA [29,44] Logic-based representations [85] Geographical question answering [40] Temporal question answering [86,43,6,124,112]

Performance Metrics Factual questions
mean reciprocal rank (MRR): the average over a set of n queries, where different scores are attributed inversely proportional to the rank (ranki) of the first correct answer in the answer list 平均倒数排序一种更精细的准确性度量 Questions for which there is no answer in the document collection Recall and precision of these NIL answers

Performance Metrics (TREC) list questions
Instance Precision (IP) , Instance Recall (IR) F-score ( = 1) 越大，越偏向Recall To make a distinction between systems that provide ’early’ correct answers in the ranking, the confidence score

Performance Metrics (TREC) Other questions
The Pyramid method [94] measures the overlap of words and phrases in the system-generated answer with the expert-generated answer, assuming a dictionary of equivalent paraphrases. F-score ( = 5)

Performance Metrics The final score is called pre-series score

Evaluation scores of three best systems
TREC QA 2007, 2009 and NTCIR QA 2008 2009 2008 2008

Methods QA is a form of information retrieval Seven kinds of methods
the representation of the information need the representation of the retrievable object the retrieval or ranking function Seven kinds of methods Bag-of-words representations Morpho-syntactic analysis of natural language statements Semantic classification of the expected answer type (EAT) Semantic classification of all constituents of NL sentences Identifying the necessary discourse relationships Translation into and retrieval with a structured language Translation into and reasoning with a logical representation

Bag-of-words representations
Vector Space Model Language model [26] Retrieval TF-IDF Probabilistic content model Lexico-semantic resources or thesauri (WordNet [79]) Evaluation Useful for the initial filtering of documents and sentences Mapping rules, exploiting the redundancy of information [17]

Morpho-syntactic analysis
Morpho-syntactic analysis of natural language statements part-of-speech (POS), phrase chunking parse-based cues (constituent, dependency tree) [27] Retrieval syntactic tree kernels (dependency parse trees) [22,91,19] Recognize textual entailments and paraphrases [47] syntactic rewriting rules, Tree edit Evaluation up to 50–138% improvement in MRR and a precision level over 95% at rank 1. Increasing the MRR value up to 8.9% Time and space complexities

Expected Answer Type (EAT)
For factoid question, question type class  EAT Type distribution of 1000 TREC questions according to the answer type taxonomy [64]

Expected Answer Type (EAT)
Semantic classification of the expected answer type (EAT) Answer type taxonomy 140 text-surface patterns based on 17,000 questions [52] WordNet-based taxonomy of classes [45] The most famous EAT taxonomy [64] Symbolic approaches use hand-crafted rules [99,136,52] Machine learning: average precision 70–95.4% Retrieval Deterministic approach, and probabilistic retrieval approach Linguistic syntax-based patterns for questions/answers [54,135] Evaluation 36.4% errors (QA TREC-8, 9,10)  incorrect EAT estimation

Semantic labelling/parsing
Semantic classification of all constituents of NL sentences Semantic roles and frames, FrameNet [10], PropBank [105] Temporal and spatial information, TimeBank Supervised machine learning, conditional random fields [77] Retrieval Propagating algorithm with verb arguments [97] Graph similarity between question and answer graphs [130] Logical retrieval model and infer the answer [125] Evaluation Answer type recognition of complex questions from 35% (baseline) up to 73.5% (semantic topic-based approach) [93] 63% performance gain (MRR) for answer candidate ranking in comparison to a bag-of-words approach [92]

Discourse relationship (语篇关系)
Important relationships Equivalence, hypernymy and hyponymy Temporal or spatial references Recognition of rhetorical relations in expository texts [75] Co-reference techniques for textual QA [90,153,49,127] 13 conceptual categories for questions [62] Causal antecedent, Goal orientation, Enablement, Causal consequent, Verification, Request Why questions [141] text-based argumentation detection [104] How questions [9]

Translating into a structured language
Translation into and retrieval with a structured language Uses a set of symbolic rules [37,101] and mappings [110]. Entity-relation extraction [24,57,150]

Translation into and reasoning with logics
Translation into and reasoning with a logical representation Map sentences to lambda calculus [151] Integrate semantic role recognition Statistical relational learners [39] Probabilistic relational learners [117] Probabilistic retrieval model Probabilistic inference based on Bayesian network [107] (Relaxed) subgraph matching [87]

Methods--Conclusive findings
Essential parts of QA system Estimation of the expected answer type The semantic classes of information in the question and candidate answers Deeper semantic analysis (semantic role labelling and discourse analysis) Improve the performance But time consuming For factual open domain question answering, where the information redundancy online can be exploited, such an analysis might not pay off the effort. On the other hand, for more specific and complex questions a deeper semantic analysis is a key feature of the system.

Methods--Conclusive findings
QA system should employ a set of models depending on … Type of question answering (open domain, restricted domain), How can information redundancy be exploited? Types of questions (factoids, list, definitional vs. causal, relations, procedural) How deep should the semantic analysis be? Type of interrogated data How deep should the semantic analysis of the source be? Salient temporal and spatial information [125,40] Response time ’just-in-time’ semantic representations [134] Value of wrong answer, partial answer and no answer Type of users and usability criteria

Applications of QA Others Online question answering services
AskMSR [18], START [55] AnswerBus, AskJeeves (Ask.com) Answer.com, Wondir Question answering in restricted domains [88] Academic prototypes with small-scale KB [11, 30, 37, 102]. Others using Wikipedia [53,2,137,35] over European legislation [41,4] with geographical reasoning GikiCLEF [123,32,61].

Information extraction techniques Paraphrasing
Issues Recent advance in Natural language understanding Information retrieval from semi-structured collections Probabilistic reasoning with content models Information extraction techniques Precision/Recall Paraphrasing IR community [71,154,120]. Lexical chaining for rephrasing query [106]

(Deeper) Text Analysis
The Future of QA (Deeper) Text Analysis linking of information within and across documents Learn different phrasings of a relation between entities [68] Eventually lead to structured/logic Representation Others spoken questions multimedia data, cross-lingual, cross-media (semantic search) context identification, user-adapted answers, user interfaces Evaluation, question generation

Thinking Topics Evaluation Entity linking
Relation recognition (semantic role) EAT estimation IR, Search, query, (inexact) match RTE Evaluation QALD SemEval KBP (TAC)

References Ion Androutsopoulos, Natural language interfaces to databases – An introduction, Natural Language Engineering 1 (1995) 29–81. Lucian Vlad Lita, Jaime Carbonell, Unsupervised question answering data acquisition from local corpora, CIKM ’04 Anh Kim Nguyen, Huong Thanh Le, Natural language interface construction using semantic grammars, PRICAI-08, 2008, pp. 728–739. Valentin Tablan, Danica Damljanovic, Kalina Bontcheva, A natural language query interface to structured information, ESWC’08, 2008, pp. 361–375.

References An analysis of the AskMSR question-answering system, (EMNLP ’02) Tree edit models for recognizing textual entailments, paraphrases, and answers to questions, NACL-HLT 2010 Using semantic roles to improve question answering, (EMNLP ’07) Sharon Small, Tomek Strzalkowski, HITIQA: High-quality intelligence through interactive question answering, Natural Language Engineering 15 (1) (2009) 31–54.

References Vinitha Reddy, Kyle Neumeier, Joshua McFarlane, Jackson Cothren, Craig W. Thompson, Extending a natural language interface with geospatial queries, IEEE Internet Computing 11 (2007) 82–85. Parisa Kordjamshidi, Martijn van Otterlo, Marie-Francine Moens, From language towards formal spatial calculi, in: Proceedings of Computational Models of Spatial Language Interpretation (COSLI), 2010, pp. 17–24.

致谢欢迎提问！

Reading Report: A Survey on QA

Similar presentations

Presentation on theme: "Reading Report: A Survey on QA"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reading Report: A Survey on QA

Similar presentations

Presentation on theme: "Reading Report: A Survey on QA"— Presentation transcript:

Similar presentations

About project

Feedback