© Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

Slides:



Advertisements
Similar presentations
Information Retrieval in Practice
Advertisements

1 Machine Learning: Lecture 1 Overview of Machine Learning (Based on Chapter 1 of Mitchell T.., Machine Learning, 1997)
QuALiM – Michael Kaisser The QuALiM Question Answering system Question Answering by Searching Large Corpora with Linguistic Methods.
Working with Discourse Representation Theory Patrick Blackburn & Johan Bos Lecture 3 DRT and Inference.
© Johan Bos November 2005 Question Answering Lecture 1 (two weeks ago): Introduction; History of QA; Architecture of a QA system; Evaluation. Lecture 2.
Question Answering Paola Velardi, Johan Bos. Outline Introduction: History of QA; Architecture of a QA system; Evaluation. Question Classification: NLP.
Improved TF-IDF Ranker
QA-LaSIE Components The question document and each candidate answer document pass through all nine components of the QA-LaSIE system in the order shown.
Mining External Resources for Biomedical IE Why, How, What Malvina Nissim
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
© Johan Bos November 2005 Pub Quiz. © Johan Bos November 2005 Question Answering Lecture 1 (Last week): Introduction; History of QA; Architecture of a.
The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Using Information Extraction for Question Answering Done by Rani Qumsiyeh.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Employing Two Question Answering Systems in TREC 2005 Harabagiu, Moldovan, et al 2005 Language Computer Corporation.
Introduction to Machine Learning Approach Lecture 5.
Overview of Search Engines
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( Bridging Languages for Question Answering: DIOGENE at CLEF-2003.
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
ONTOLOGY LEARNING AND POPULATION FROM FROM TEXT Ch8 Population.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
1 Query Operations Relevance Feedback & Query Expansion.
Text Mining In InQuery Vasant Kumar, Peter Richards August 25th, 1999.
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
NTCIR /21 ASQA: Academia Sinica Question Answering System for CLQA (IASL) Cheng-Wei Lee, Cheng-Wei Shih, Min-Yuh Day, Tzong-Han Tsai, Tian-Jian Jiang,
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Natural Language Processing for Information Retrieval -KVMV Kiran ( )‏ -Neeraj Bisht ( )‏ -L.Srikanth ( )‏
Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Information Retrieval
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Keyword Translation Accuracy and Cross-Lingual Question Answering in Chinese and Japanese Teruko Mitamura Mengqiu Wang Hideki Shima Frank Lin In CMU EACL.
Improving QA Accuracy by Question Inversion John Prager, Pablo Duboue, Jennifer Chu-Carroll Presentation by Sam Cunningham and Martin Wintz.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Using Semantic Relations to Improve Information Retrieval
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Recognising Textual Entailment Johan Bos School of Informatics University of Edinburgh Scotland,UK.
Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Multimedia Information Retrieval
CSE 635 Multimedia Information Retrieval
CS246: Information Retrieval
Information Retrieval and Web Design
Information Retrieval
Introduction to Search Engines
Presentation transcript:

© Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System Evaluation State-of-the-art Lecture 2 Question Analysis Background Knowledge Answer Typing Lecture 3 Query Generation Document Analysis Semantic Indexing Answer Extraction Selection and Ranking

© Johan Bos April 2008 knowledge parsing boxing query answer typing Indri answer extraction answer selection answer reranking question answer ccg drs WordNet NomLex Indexed Documents Pronto architecture

© Johan Bos April 2008 knowledge parsing boxing query answer typing Indri answer extraction answer selection answer reranking question answer ccg drs WordNet NomLex Indexed Documents Lecture 3

© Johan Bos April 2008 Question Answering Lecture 3 Query Generation Document Analysis Semantic Indexing Answer Extraction Selection and Ranking

© Johan Bos April 2008 knowledge parsing boxing query answer typing Indri answer extraction answer selection answer reranking question answer ccg drs WordNet NomLex Indexed Documents Architecture of PRONTO

© Johan Bos April 2008 Query Generation Once we analysed the question, we need to retrieve appropriate documents Most QA systems use an off-the-shelf information retrieval system for this task Examples: –Lemur –Lucene –Indri (used by Pronto) The input of the IR system is a query; the output is a ranked set of documents

© Johan Bos April 2008 Queries Query generation depends on the way documents are indexed Based on –Semantic analysis of the question –Expected answer type –Background knowledge Computing a good query is hard – we dont want too little documents, and we dont want too many!

© Johan Bos April 2008 Generating Query Terms Example 1: –Question: Who discovered prions? –Text A: Dr. Stanley Prusiner received the Nobel prize for the discovery of prions. –Text B: Prions are a kind of proteins that… Query terms?

© Johan Bos April 2008 Generating Query Terms Example 2: –Question: When did Franz Kafka die? –Text A: Kafka died in –Text B: Dr. Franz died in Query terms?

© Johan Bos April 2008 Generating Query Terms Example 3: –Question: How did actor James Dean die? –Text: James Dean was killed in a car accident. Query terms?

© Johan Bos April 2008 Useful query terms Ranked on importance: –Named entities –Dates or time expressions –Expressions in quotes –Nouns –Verbs Queries can be expanded using the created local knowledge base

© Johan Bos April 2008 Query expansion example Query: sacajawea Returns only five documents Use synonyms in query expansions New query: sacajawea OR sagajawea Returns two hundred documents TREC 44.6 ( Sacajawea ) How much is the Sacajawea coin worth?

© Johan Bos April 2008 knowledge parsing boxing query answer typing Indri answer extraction answer selection answer reranking question answer ccg drs WordNet NomLex Indexed Documents Architecture of PRONTO

© Johan Bos April 2008 Question Answering Lecture 3 Query Generation Document Analysis Semantic Indexing Answer Extraction Selection and Ranking

© Johan Bos April 2008 Document Analysis – Why? The aim of QA is to output answers, not documents We need document analysis to –Find the correct type of answer in the documents –Calculate the probability that an answer is correct Semantic analysis is important to get valid answers

© Johan Bos April 2008 Document Analysis – When? After retrieval –token or word based index –keyword queries –low precision Before retrieval –semantic indexing –concept queries –high precision –More NLP required

© Johan Bos April 2008 Document Analysis – How? Ideally use the same NLP tools as for question analysis –This will make the semantic matching of Question and Answer easier –Not always possible: wide coverage tools are usally good at analysing text, but not at analysing questions –Questions are often not part of large annotated corpora, on which NLP tools are trained

© Johan Bos April 2008 Documents vs Passages Split documents into smaller passages –This will make the semantic matching faster and more accurate –In Pronto the passage size is two sentences, implemented by a sliding window Too small passages risk losing important contextual information –Pronouns and referring expressions

© Johan Bos April 2008 Document Analysis Tokenisation Part of speech tagging Lemmatisation Syntactic analysis (Parsing) Semantic analysis (Boxing) Named entity recognition Anaphora resolution

© Johan Bos April 2008 Why semantics is important Example: –Question: When did Franz Kafka die? –Text A: The mother of Franz Kafka died in 1918.

© Johan Bos April 2008 Why semantics is important Example: –Question: When did Franz Kafka die? –Text A: The mother of Franz Kafka died in –Text B: Kafka lived in Austria. He died in 1924.

© Johan Bos April 2008 Why semantics is important Example: –Question: When did Franz Kafka die? –Text A: The mother of Franz Kafka died in –Text B: Kafka lived in Austria. He died in –Text C: Both Kafka and Lenin died in 1924.

© Johan Bos April 2008 Why semantics is important Example: –Question: When did Franz Kafka die? –Text A: The mother of Franz Kafka died in –Text B: Kafka lived in Austria. He died in –Text C: Both Kafka and Lenin died in –Text D: Max Brod, who knew Kafka, died in 1930.

© Johan Bos April 2008 Why semantics is important Example: –Question: When did Franz Kafka die? –Text A: The mother of Franz Kafka died in –Text B: Kafka lived in Austria. He died in –Text C: Both Kafka and Lenin died in –Text D: Max Brod, who knew Kafka, died in –Text E: Someone who knew Kafka died in 1930.

© Johan Bos April 2008 DRS for The mother of Franz Kafka died in _____________________ | x3 x4 x2 x1 | | | | mother(x3) | | named(x4,kafka,per) | | named(x4,franz,per) | | die(x2) | | thing(x1) | | event(x2) | | of(x3,x4) | | agent(x2,x3) | | in(x2,x1) | | timex(x1)=+1918XXXX | |_____________________|

© Johan Bos April 2008 DRS for: Kafka lived in Austria. He died in _______________________ _____________________ | x3 x1 x2 | | x5 x4 | | | | | (| male(x3) |+| die(x5) |) | named(x3,kafka,per) | | thing(x4) | | live(x1) | | event(x5) | | agent(x1,x3) | | agent(x5,x3) | | named(x2,austria,loc) | | in(x5,x4) | | event(x1) | | timex(x4)=+1924XXXX | | in(x1,x2) | |_____________________| |_______________________|

© Johan Bos April 2008 DRS for: Both Kafka and Lenin died in _____________________ | x6 x5 x4 x3 x2 x1 | | | | named(x6,kafka,per) | | die(x5) | | event(x5) | | agent(x5,x6) | | in(x5,x4) | | timex(x4)=+1924XXXX | | named(x3,lenin,per) | | die(x2) | | event(x2) | | agent(x2,x3) | | in(x2,x1) | | timex(x1)=+1924XXXX | |_____________________|

© Johan Bos April 2008 DRS for: Max Brod, who knew Kafka, died in _____________________ | x3 x5 x4 x2 x1 | | | | named(x3,brod,per) | | named(x3,max,per) | | named(x5,kafka,per) | | know(x4) | | event(x4) | | agent(x4,x3) | | patient(x4,x5) | | die(x2) | | event(x2) | | agent(x2,x3) | | in(x2,x1) | | timex(x1)=+1930XXXX | |_____________________|

© Johan Bos April 2008 DRS for: Someone who knew Kafka died in _____________________ | x3 x5 x4 x2 x1 | | | | person(x3) | | named(x5,kafka,per) | | know(x4) | | event(x4) | | agent(x4,x3) | | patient(x4,x5) | | die(x2) | | event(x2) | | agent(x2,x3) | | in(x2,x1) | | timex(x1)=+1930XXXX | |_____________________|

© Johan Bos April 2008 Document Analysis Tokenisation Part of speech tagging Lemmatisation Syntactic analysis (Parsing) Semantic analysis (Boxing) Named entity recognition Anaphora resolution

© Johan Bos April 2008 Recall the Answer-Type Taxonomy We divided questions according to their expected answer type Simple Answer-Type Taxonomy PERSON NUMERAL DATE MEASURE LOCATION ORGANISATION ENTITY

© Johan Bos April 2008 Named Entity Recognition In order to make use of the answer types, we need to be able to recognise named entities of the same types in the documents PERSON NUMERAL DATE MEASURE LOCATION ORGANISATION ENTITY

© Johan Bos April 2008 Example Text Italys business world was rocked by the announcement last Thursday that Mr. Verdi would leave his job as vice-president of Music Masters of Milan, Inc to become operations director of Arthur Andersen.

© Johan Bos April 2008 Named entities Italys business world was rocked by the announcement last Thursday that Mr. Verdi would leave his job as vice-president of Music Masters of Milan, Inc to become operations director of Arthur Andersen.

© Johan Bos April 2008 Named Entity Recognition Italy s business world was rocked by the announcement last Thursday that Mr. Verdi would leave his job as vice-president of Music Masters of Milan, Inc to become operations director of Arthur Andersen.

© Johan Bos April 2008 NER difficulties Several types of entities are too numerous to include in dictionaries New names turn up every day Ambiguities –Paris, Lazio Different forms of same entities in same text –Brian Jones … Mr. Jones Capitalisation

© Johan Bos April 2008 NER approaches Rule-based approaches –Hand-crafted rules –Help from databases of known named entities [e.g. locations] Statistical approaches –Features –Machine learning

© Johan Bos April 2008 Document Analysis Tokenisation Part of speech tagging Lemmatisation Syntactic analysis (Parsing) Semantic analysis (Boxing) Named entity recognition Anaphora resolution

© Johan Bos April 2008 What is anaphora? Relation between a pronoun and another element in the same or earlier sentence Anaphoric pronouns: –he, him, she, her, it, they, them Anaphoric noun phrases: –the country, –these documents, –his hat, her dress

© Johan Bos April 2008 Anaphora (pronouns) Question: What is the biggest sector in Andorras economy? Corpus: Andorra is a tiny land-locked country in southwestern Europe, between France and Spain. Tourism, the largest sector of its tiny, well-to-do economy, accounts for roughly 80% of the GDP. Answer: ?

© Johan Bos April 2008 Anaphora (definite descriptions) Question: What is the biggest sector in Andorras economy? Corpus: Andorra is a tiny land-locked country in southwestern Europe, between France and Spain. Tourism, the largest sector of the countrys tiny, well-to-do economy, accounts for roughly 80% of the GDP. Answer: ?

© Johan Bos April 2008 Anaphora Resolution Anaphora Resolution is the task of finding the antecedents of anaphoric expressions Example system: –Mitkov, Evans & Orasan (2002) –

© Johan Bos April 2008 Kafka lived in Austria. He died in _______________________ _____________________ | x3 x1 x2 | | x5 x4 | | | | | (| male(x3) |+| die(x5) |) | named(x3,kafka,per) | | thing(x4) | | live(x1) | | event(x5) | | agent(x1,x3) | | agent(x5,x3) | | named(x2,austria,loc) | | in(x5,x4) | | event(x1) | | timex(x4)=+1924XXXX | | in(x1,x2) | |_____________________| |_______________________|

© Johan Bos April 2008 Kafka lived in Austria. He died in _______________________ _____________________ | x3 x1 x2 | | x5 x4 | | | | | (| male(x3) |+| die(x5) |) | named(x3,kafka,per) | | thing(x4) | | live(x1) | | event(x5) | | agent(x1,x3) | | agent(x5,x3) | | named(x2,austria,loc) | | in(x5,x4) | | event(x1) | | timex(x4)=+1924XXXX | | in(x1,x2) | |_____________________| |_______________________|

© Johan Bos April 2008 Co-reference resolution Question: What is the biggest sector in Andorras economy? Corpus: Andorra is a tiny land-locked country in southwestern Europe, between France and Spain. Tourism, the largest sector of Andorras tiny, well- to-do economy, accounts for roughly 80% of the GDP. Answer: Tourism

© Johan Bos April 2008 Question Answering Lecture 3 Query Generation Document Analysis Semantic Indexing Answer Extraction Selection and Ranking

© Johan Bos April 2008 knowledge parsing boxing query answer typing Indri answer extraction answer selection answer reranking question answer ccg drs WordNet NomLex Indexed Documents Architecture of PRONTO

© Johan Bos April 2008 Semantic indexing If we index documents on the token level, we cannot search for specific semantic concepts If we index documents on semantic concepts, we can formulate more specific queries Semantic indexing requires a complete preprocessing of the entire document collection [can be costly]

© Johan Bos April 2008 Semantic indexing example Example NL question: When did Franz Kafka die? Term-based –query: kafka –Returns all passages containing the term kafka"

© Johan Bos April 2008 Semantic indexing example Example NL question: When did Franz Kafka die? Term-based –query: kafka –Returns all passages containing the term kafka" Concept-based –query: DATE & kafka –Returns all passages containing the term "kafka" and a date expression

© Johan Bos April 2008 Question Answering Lecture 3 Query Generation Document Analysis Semantic Indexing Answer Extraction Selection and Ranking

© Johan Bos April 2008 knowledge parsing boxing query answer typing Indri answer extraction answer selection answer reranking question answer ccg drs WordNet NomLex Indexed Documents Architecture of PRONTO

© Johan Bos April 2008 Answer extraction Passage retrieval gives us a set of ranked documents Match answer with question –DRS for question –DRS for each possible document –Score for amount of overlap Deep inference or shallow matching Use knowledge

© Johan Bos April 2008 Answer extraction: matching Given a question and an expression with a potential answer, calculate a matching score S = match(Q,A) that indicates how well Q matches A Example –Q: When was Franz Kafka born? –A 1 : Franz Kafka died in –A 2 : Kafka was born in 1883.

© Johan Bos April 2008 Using logical inference Recall that Boxer produces first order representations [DRSs] In theory we could use a theorem prover to check whether a retrieved passage entails or is inconsistent with a question In practice this is too costly, given the high number of possible answer + question pairs that need to be considered Also: theorem provers are precise – they dont give us information if they almost find a proof, although this would be useful for QA

© Johan Bos April 2008 Semantic matching Matching is an efficient approximation to the inference task Consider flat semantic representation of the passage and the question Matching gives a score of the amount of overlap between the semantic content of the question and a potential answer

© Johan Bos April 2008 Matching Example Question: When was Franz Kafka born? Passage 1: Franz Kafka died in Passage 2: Kafka was born in 1883.

© Johan Bos April 2008 Semantic Matching [1] answer(X) franz(Y) kafka(Y) born(E) patient(E,Y) temp(E,X) franz(x1) kafka(x1) die(x3) agent(x3,x1) in(x3,x2) 1924(x2) Q:A1:A1:

© Johan Bos April 2008 Semantic Matching [1] answer(X) franz(Y) kafka(Y) born(E) patient(E,Y) temp(E,X) franz(x1) kafka(x1) die(x3) agent(x3,x1) in(x3,x2) 1924(x2) Q:A1:A1: X=x2

© Johan Bos April 2008 Semantic Matching [1] answer(x2) franz(Y) kafka(Y) born(E) patient(E,Y) temp(E,x2) franz(x1) kafka(x1) die(x3) agent(x3,x1) in(x3,x2) 1924(x2) Q:A1:A1:

© Johan Bos April 2008 Semantic Matching [1] answer(x2) franz(Y) kafka(Y) born(E) patient(E,Y) temp(E,x2) franz(x1) kafka(x1) die(x3) agent(x3,x1) in(x3,x2) 1924(x2) Q:A1:A1: Y=x1

© Johan Bos April 2008 Semantic Matching [1] answer(x2) franz(x1) kafka(x1) born(E) patient(E,x1) temp(E,x2) franz(x1) kafka(x1) die(x3) agent(x3,x1) in(x3,x2) 1924(x2) Q:A1:A1:

© Johan Bos April 2008 Semantic Matching [1] answer(x2) franz(x1) kafka(x1) born(E) patient(E,x1) temp(E,x2) franz(x1) kafka(x1) die(x3) agent(x3,x1) in(x3,x2) 1924(x2) Q:A1:A1:

© Johan Bos April 2008 Semantic Matching [1] answer(x2) franz(x1) kafka(x1) born(E) patient(E,x1) temp(E,x2) Match score = 3/6 = 0.50 Q:A1:A1: franz(x1) kafka(x1) die(x3) agent(x3,x1) in(x3,x2) 1924(x2)

© Johan Bos April 2008 Semantic Matching [2] answer(X) franz(Y) kafka(Y) born(E) patient(E,Y) temp(E,X) kafka(x1) born(x3) patient(x3,x1) in(x3,x2) 1883(x2) Q:A2:A2:

© Johan Bos April 2008 Semantic Matching [2] answer(X) franz(Y) kafka(Y) born(E) patient(E,Y) temp(E,X) kafka(x1) born(x3) patient(x3,x1) in(x3,x2) 1883(x2) Q:A2:A2: X=x2

© Johan Bos April 2008 Semantic Matching [2] answer(x2) franz(Y) kafka(Y) born(E) patient(E,Y) temp(E,x2) kafka(x1) born(x3) patient(x3,x1) in(x3,x2) 1883(x2) Q:A2:A2:

© Johan Bos April 2008 Semantic Matching [2] answer(x2) franz(Y) kafka(Y) born(E) patient(E,Y) temp(E,x2) kafka(x1) born(x3) patient(x3,x1) in(x3,x2) 1883(x2) Q:A2:A2: Y=x1

© Johan Bos April 2008 Semantic Matching [2] answer(x2) franz(x1) kafka(x1) born(E) patient(E,x1) temp(E,x2) kafka(x1) born(x3) patient(x3,x1) in(x3,x2) 1883(x2) Q:A2:A2:

© Johan Bos April 2008 Semantic Matching [2] answer(x2) franz(x1) kafka(x1) born(E) patient(E,x1) temp(E,x2) kafka(x1) born(x3) patient(x3,x1) in(x3,x2) 1883(x2) Q:A2:A2: E=x3

© Johan Bos April 2008 Semantic Matching [2] answer(x2) franz(x1) kafka(x1) born(x3) patient(x3,x1) temp(x3,x2) kafka(x1) born(x3) patient(x3,x1) in(x3,x2) 1883(x2) Q:A2:A2:

© Johan Bos April 2008 Semantic Matching [2] answer(x2) franz(x1) kafka(x1) born(x3) patient(x3,x1) temp(x3,x2) kafka(x1) born(x3) patient(x3,x1) in(x3,x2) 1883(x2) Q:A2:A2:

© Johan Bos April 2008 Semantic Matching [2] answer(x2) franz(x1) kafka(x1) born(x3) patient(x3,x1) temp(x3,x2) kafka(x1) born(x3) patient(x3,x1) in(x3,x2) 1883(x2) Q:A2:A2: Match score = 4/6 = 0.67

© Johan Bos April 2008 Matching Example Question: When was Franz Kafka born? Passage 1: Franz Kafka died in Passage 2: Kafka was born in Match score = 0.67 Match score = 0.50

© Johan Bos April 2008 Matching Techniques Weighted matching –Higher weight for named entities –Estimate weights using machine learning Incorporate background knowledge –WordNet [hyponyms] –NomLex –Paraphrases: BORN(E) & IN(E,Y) & DATE(Y) TEMP(E,Y)

© Johan Bos April 2008 Question Answering Lecture 3 Query Generation Document Analysis Semantic Indexing Answer Extraction Selection and Ranking

© Johan Bos April 2008 knowledge parsing boxing query answer typing Indri answer extraction answer selection answer reranking question answer ccg drs WordNet NomLex Indexed Documents Architecture of PRONTO

© Johan Bos April 2008 Answer selection Rank answer –Group duplicates –Syntactically or semantically equivalent –Sort on frequency How specific should an answer be? –Semantic relations between answers –Hyponyms, synonyms –Answer modelling [PhD thesis Dalmas 2007] Answer cardinality

© Johan Bos April 2008 Answer selection example 1 Where did Franz Kafka die? –In his bed –In a sanatorium –In Kierling –Near Vienna –In Austria –In Berlin –In Germany

© Johan Bos April 2008 Answer selection example 2 Where is 3M based? –In Maplewood –In Maplewood, Minn. –In Minnesota –In the U.S. –In Maplewood, Minn., USA –In San Francisco –In the Netherlands

© Johan Bos April 2008 knowledge parsing boxing query answer typing Indri answer extraction answer selection answer reranking question answer ccg drs WordNet NomLex Indexed Documents Architecture of PRONTO

© Johan Bos April 2008 Reranking Most QA systems first produce a list of possible answers… This is usually followed by a process called reranking Reranking promotes correct answers to a higher rank

© Johan Bos April 2008 Factors in reranking Matching score –The better the match with the question, the more likely the answers Frequency –If the same answer occurs many times, it is likely to be correct

© Johan Bos April 2008 Answer Validation –check whether an answer is likely to be correct using an expensive method Tie breaking –Deciding between two answers with similar probability Methods: –Inference check –Sanity checking –Googling

© Johan Bos April 2008 Inference check Use first-order logic [FOL] to check whether a potential answer entails the question This can be done with the use of a theorem prover –Translate Q into FOL –Translate A into FOL –Translate background knowledge into FOL –If ((BK fol & A fol ) Q fol ) is a theorem, we have a likely answer

© Johan Bos April 2008 Sanity Checking Answer should be informative, that is, not part of the question Q: Who is Tom Cruise married to? A: Tom Cruise Q: Where was Florence Nightingale born? A: Florence

© Johan Bos April 2008 Googling Given a ranked list of answers, some of these might not make sense at all Promote answers that make sense How? Use even a larger corpus! –Sloppy approach –Strict approach

© Johan Bos April 2008 The World Wide Web

© Johan Bos April 2008 Answer validation (sloppy) Given a question Q and a set of answers A 1 …A n For each i, generate query Q A i Count the number of hits for each i Choose A i with most number of hits Use existing search engines –Google, AltaVista –Magnini et al (CCP)

© Johan Bos April 2008 Corrected Conditional Probability Treat Q and A as a bag of words –Q = content words question –A = answer hits(A NEAR Q) CCP(Qsp,Asp) = hits(A) x hits(Q) Accept answers above a certain CCP threshold

© Johan Bos April 2008 Answer validation (strict) Given a question Q and a set of answers A 1 …A n Create a declarative sentence with the focus of the question replaced by A i Use the strict search option in Google –High precision –Low recall Any terms of the target not in the sentence as added to the query

© Johan Bos April 2008 Example TREC 99.3 Target: Woody Guthrie. Question: Where was Guthrie born? Top-5 Answers: 1) Britain * 2) Okemah, Okla. 3) Newport * 4) Oklahoma 5) New York

© Johan Bos April 2008 Example: generate queries TREC 99.3 Target: Woody Guthrie. Question: Where was Guthrie born? Generated queries: 1) Guthrie was born in Britain 2) Guthrie was born in Okemah, Okla. 3) Guthrie was born in Newport 4) Guthrie was born in Oklahoma 5) Guthrie was born in New York

© Johan Bos April 2008 Example: add target words TREC 99.3 Target: Woody Guthrie. Question: Where was Guthrie born? Generated queries: 1) Guthrie was born in Britain Woody 2) Guthrie was born in Okemah, Okla. Woody 3) Guthrie was born in Newport Woody 4) Guthrie was born in Oklahoma Woody 5) Guthrie was born in New York Woody

© Johan Bos April 2008 Example: morphological variants TREC 99.3 Target: Woody Guthrie. Question: Where was Guthrie born? Generated queries: Guthrie is OR was OR are OR were born in Britain Woody Guthrie is OR was OR are OR were born in Okemah, Okla. Woody Guthrie is OR was OR are OR were born in Newport Woody Guthrie is OR was OR are OR were born in Oklahoma Woody Guthrie is OR was OR are OR were born in New York Woody

© Johan Bos April 2008 Example: google hits TREC 99.3 Target: Woody Guthrie. Question: Where was Guthrie born? Generated queries: Guthrie is OR was OR are OR were born in Britain Woody 0 Guthrie is OR was OR are OR were born in Okemah, Okla. Woody 10 Guthrie is OR was OR are OR were born in Newport Woody 0 Guthrie is OR was OR are OR were born in Oklahoma Woody 42 Guthrie is OR was OR are OR were born in New York Woody 2

© Johan Bos April 2008 Example: reranked answers TREC 99.3 Target: Woody Guthrie. Question: Where was Guthrie born? Original answers 1) Britain * 2) Okemah, Okla. 3) Newport * 4) Oklahoma 5) New York Reranked answers * 4) Oklahoma * 2) Okemah, Okla. 5) New York 1) Britain 3) Newport

© Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System Evaluation State-of-the-art Lecture 2 Question Analysis Background Knowledge Answer Typing Lecture 3 Query Generation Document Analysis Semantic Indexing Answer Extraction Selection and Ranking

© Johan Bos April 2008 Where to go from here Producing answers in real-time Improve accuracy Answer explanation User modelling Speech interfaces Dialogue (interactive QA) Multi-lingual QA Non sequential architectures