© Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

Slides:



Advertisements
Similar presentations
© Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.
Advertisements

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
First-Order Logic (and beyond)
© Johan Bos November 2005 Question Answering Lecture 1 (two weeks ago): Introduction; History of QA; Architecture of a QA system; Evaluation. Lecture 2.
Question Answering Paola Velardi, Johan Bos. Outline Introduction: History of QA; Architecture of a QA system; Evaluation. Question Classification: NLP.
Statistical NLP: Lecture 3
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
1 Words and the Lexicon September 10th 2009 Lecture #3.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
Introduction to Computational Linguistics Lecture 2.
© Johan Bos November 2005 Pub Quiz. © Johan Bos November 2005 Question Answering Lecture 1 (Last week): Introduction; History of QA; Architecture of a.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Word Classes and English Grammar.
The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.
NLP and Speech 2004 English Grammar
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language Syntax Parsing.
© Johan Bos November 2005 Carol Beer (Little Britain)
CS224N Interactive Session Competitive Grammar Writing Chris Manning Sida, Rush, Ankur, Frank, Kai Sheng.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
9/8/20151 Natural Language Processing Lecture Notes 1.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A semantic approach for question classification using.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Natural Language Processing Lecture 6 : Revision.
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
Methods for the Automatic Construction of Topic Maps Eric Freese, Senior Consultant ISOGEN International.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Introduction to CL & NLP CMSC April 1, 2003.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Linguistic Essentials
Semantic Construction lecture 2. Semantic Construction Is there a systematic way of constructing semantic representation from a sentence of English? This.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
CSA2050 Introduction to Computational Linguistics Parsing I.
Natural Language Processing
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
March 2006Introduction to Computational Linguistics 1 CLINT Tokenisation.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
Text segmentation Amany AlKhayat. Before any real processing is done, text needs to be segmented at least into linguistic units such as words, punctuation,
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
NATURAL LANGUAGE PROCESSING
NLP. Introduction to NLP #include int main() { int n, reverse = 0; printf("Enter a number to reverse\n"); scanf("%d",&n); while (n != 0) { reverse =
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
Question Classification Ling573 NLP Systems and Applications April 25, 2013.
Introduction to Machine Learning and Text Mining
Statistical NLP: Lecture 3
Basic Parsing with Context Free Grammars Chapter 13
Natural Language Processing (NLP)
Natural Language - General
Linguistic Essentials
Natural Language Processing (NLP)
David Kauchak CS159 – Spring 2019
Natural Language Processing (NLP)
Presentation transcript:

© Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System Evaluation State-of-the-art Lecture 2 Question Analysis Background Knowledge Answer Typing Lecture 3 Query Generation Document Analysis Semantic Indexing Answer Extraction Selection and Ranking

© Johan Bos April 2008 knowledge parsing boxing query answer typing Indri answer extraction answer selection answer reranking question answer ccg drs WordNet NomLex Indexed Documents Pronto QA System

© Johan Bos April 2008 knowledge parsing boxing query answer typing Indri answer extraction answer selection answer reranking question answer ccg drs WordNet NomLex Indexed Documents Lecture 2

© Johan Bos April 2008 Question Answering (QA) Lecture 2  Question Analysis Background Knowledge Answer Typing

© Johan Bos April 2008 Question Analysis – Why? The aim of QA is to output answers, not documents We need question analysis to –Determine the type of answer that we try to find –Estimate the number of answers that we want to return –Calculate the probability that an answer is correct

© Johan Bos April 2008 We need ways to automate the process of manipulating natural language –Punctuation –The way words are composed –The relationship between words –The structure of phrases –Represent meaning of phrases This is where NLP comes in! –(NLP = Natural Language Processing) Natural Language Processing

© Johan Bos April 2008 How to use NLP tools? There is a large set of tools available on the web, most of it free for research Examples of integrated text processing environments: –GATE (University of Sheffield) –TTT (University of Edinburgh) –LingPipe –For a general overview of NLP tools, see –C&C (used by the Pronto QA system)

© Johan Bos April 2008 knowledge parsing boxing query answer typing Indri answer extraction answer selection answer reranking question answer ccg drs WordNet NomLex Indexed Documents Architecture of PRONTO

© Johan Bos April 2008 Question Analysis Tokenisation Part of speech tagging Lemmatisation Syntactic analysis (Parsing) Semantic analysis (Boxing) Named entity recognition Anaphora resolution

© Johan Bos April 2008 Tokenisation Tokenisation is the task of splitting words from punctuation –Semicolons, colons ; : –exclamation marks, question marks ! ? –commas and full stops., –quotes “ ‘ ` Tokens are normally split by spaces –In the following slides, we use |

© Johan Bos April 2008 Tokenisation: Example 1 Input (9 tokens): When was the Buckingham Palace built in London, England?

© Johan Bos April 2008 Tokenisation: Example 1 Input (9 tokens): When | was | the | Buckingham | Palace | built | in | London, | England?

© Johan Bos April 2008 Tokenisation: Example 1 Input (9 tokens): When | was | the | Buckingham | Palace | built | in | London, | England? Output (11 tokens): When | was | the | Buckingham | Palace | built | in | London |, | England | ?

© Johan Bos April 2008 Tokenisation: Example 2 Input (7 tokens): What year did "Snow White" come out?

© Johan Bos April 2008 Tokenisation: Example 2 Input (7 tokens): What | year | did | "Snow | White" | come | out?

© Johan Bos April 2008 Tokenisation: Example 2 Input (7 tokens): What | year | did | "Snow | White" | come | out? Output (10 tokens): What | year | did | “ | Snow | White | " | come | out | ?

© Johan Bos April 2008 Tokenisation: combined words Combined words are split –I’d  I | ’d –country’s  country | ’s –won’t  wo | n’t –“don’t!”  “ do | n’t ! “ Some Italian examples – gliel’ha detto  glie | l’ | ha | detto – posso prenderlo  posso | prender | lo

© Johan Bos April 2008 Difficulties with tokenisation Abbreviations, acronyms –When was the U.S. invasion of Haiti? In particular if the abbreviation or acronym is the last word of a sentence –Look at next word: if in uppercase, then assume it is end of sentence –But think of cases such as Mr. Jones

© Johan Bos April 2008 Why is tokenisation important? Required for all subsequent stages of processing –Parsing –Named entity recognition –Lemmatisation –To look up a word in an electronic dictionary (such as WordNet)

© Johan Bos April 2008 Question Analysis Tokenisation  Part of speech tagging Named Entity Recognition Lemmatisation Syntactic analysis (Parsing) Semantic analysis (Boxing)

© Johan Bos April 2008 Traditional parts of speech Verb Noun Pronoun Adjective Adverb Preposition Conjunction Interjection

© Johan Bos April 2008 Parts of speech in NLP CLAWS1 (132 tags) Examples: NN singular common noun (boy, pencil... ) NN$ genitive singular common noun (boy's, parliament's... ) NNP singular common noun with word initial capital (Austrian, American, Sioux, Eskimo... ) NNP$ genitive singular common noun with word initial capital (Sioux', Eskimo's, Austrian's, American's,...) NNPS plural common noun with word initial capital (Americans,... ) NNPS$ genitive plural common noun with word initial capital (Americans‘, …) NNS plural common noun (pencils, skeletons, days, weeks... ) NNS$ genitive plural common noun (boys', weeks'... ) NNU abbreviated unit of measurement unmarked for number (in, cc, kg …) Penn Treebank (45 tags) Examples: JJ adjective (green, …) JJR adjective, comparative (greener,…) JJS adjective, superlative (greenest, …) MD modal (could, will, …) NN noun, singular or mass (table, …) NNS noun plural (tables, …) NNP proper noun, singular (John, …) NNPS proper noun, plural (Vikings, …) PDT predeterminer (both the boys) POS possessive ending (friend's) PRP personal pronoun (I, he, it, …) PRP$ possessive pronoun (my, his, …) RB adverb (however, usually, naturally, here, good, …) RBR adverb, comparative (better, …)

© Johan Bos April 2008 POS tagged example What year did “ Snow White " come out ?

© Johan Bos April 2008 POS tagged example What WP year NN did VBD “ Snow NNP White NNP " “ come VB out IN ?.

© Johan Bos April 2008 Why is POS-tagging important? To disambiguate words For instance, to distinguish “book” used as a noun from “book” used as a verb –Where can I find a book on cooking? –Where can I book a room? Prerequisite for further processing stages, such as parsing

© Johan Bos April 2008 Question Analysis Tokenisation Part of speech tagging  Lemmatisation Syntactic analysis (Parsing) Semantic analysis (Boxing)

© Johan Bos April 2008 Lemmatisation Lemmatising means – grouping morphological variants of words under a single headword For example, you could group the words am, was, are, is, were, and been together under the word be

© Johan Bos April 2008 Lemmatisation Lemmatising means – grouping morphological variants of words under a single headword For example, you could group the words am, was, are, is, were, and been together under the word be

© Johan Bos April 2008 Lemmatisation Using linguistic terminology, the variants taken together form the lemma of a lexeme Lexeme: a “lexical unit”, an abstraction over specific constructions Other examples: dying, die, died, dies  die car, cars  car man, men  man

© Johan Bos April 2008 Question Analysis Tokenisation Part of speech tagging Lemmatisation  Syntactic analysis (Parsing) Semantic analysis (Boxing)

© Johan Bos April 2008 What is Parsing Parsing is the process of assigning a syntactic structure to a sequence of words The syntactic structure is defined using a grammar A grammar contains of a set of symbols (terminal and non-terminal symbols) and production rules (grammar rules) The lexicon is built over the terminal symbols (i.e., the words)

© Johan Bos April 2008 Syntactic Categories The non-terminal symbols correspond to syntactic categories –Det (determiner) –N (noun) –IV (intransitive verb) –TV (transitive verb) –PN (proper name) –Prep (preposition) –NP (noun phrase) the car –PP (prepositional phrase) at the table –VP (verb phrase) saw a car –S (sentence) Mia likes Vincent

© Johan Bos April 2008 Example Grammar Lexicon Det: which, a, the,… N: rock, singer, … IV: die, walk, … TV: kill, write,… PN: John, Lithium, … Prep: on, from, to, … Grammar Rules S  NP VP NP  Det N NP  PN N  N N N  N PP VP  TV NP VP  IV PP  Prep NP VP  VP PP

© Johan Bos April 2008 The Parser A parser automates the process of parsing The input of the parser is a string of words (annotated with POS-tags) The output of a parser is a parse tree, connecting all the words The way a parse tree is constructed is also called a derivation

© Johan Bos April 2008 Derivation Example Which rock singer wrote Lithium

© Johan Bos April 2008 Lexical stage Det N N TV PN Which rock singer wrote Lithium

© Johan Bos April 2008 Use rule: NP  Det N NP Det N N TV PN Which rock singer wrote Lithium

© Johan Bos April 2008 Use rule: NP  PN NP NP Det N N TV PN Which rock singer wrote Lithium

© Johan Bos April 2008 Use rule: VP  TV NP VP NP NP Det N N TV PN Which rock singer wrote Lithium

© Johan Bos April 2008 Backtracking VP NP NP Det N N TV PN Which rock singer wrote Lithium

© Johan Bos April 2008 Use rule: N  N N VP N NP Det N N TV PN Which rock singer wrote Lithium

© Johan Bos April 2008 Use rule: NP  Det N NP VP N NP Det N N TV PN Which rock singer wrote Lithium

© Johan Bos April 2008 Use rule S  NP VP S NP VP N NP Det N N TV PN Which rock singer wrote Lithium

© Johan Bos April 2008 Wide coverage parsers Normally expect tokenised and POS-tagged input Example of wide-coverage parsers: –Charniak parser –Collins parser –RASP (Carroll & Briscoe) –CCG parser (Clark & Curran – used in Pronto)

© Johan Bos April 2008 Output C&C parser ba('S[wq]', fa('S[wq]', fa('S[wq]/(S[q]/PP)', fc('(S[wq]/(S[q]/PP))/N', lf(1,'(S[wq]/(S[q]/PP))/(S[wq]/(S[q]/NP))'), lf(2,'(S[wq]/(S[q]/NP))/N')), lf(3,'N')), fc('S[q]/PP', fa('S[q]/(S[b]NP)', lf(4,'(S[q]/(S[b]NP))/NP'), lex('N','NP', lf(5,'N'))), lf(6,'(S[b]NP)/PP'))), lf(7,'S[wq]S[wq]')). w(1,'For', for, 'IN', 'O', '(S[wq]/(S[q]/PP))/(S[wq]/(S[q]/NP))'). w(2,which, which, 'WDT','O', '(S[wq]/(S[q]/NP))/N'). w(3,newspaper, newspaper, 'NN', 'O', 'N'). w(4,does, do, 'VBZ','O', '(S[q]/(S[b]NP))/NP'). w(5,'Krugman', krugman, 'NNP','I-PER', 'N'). w(6,write, write, 'VB', 'O', '(S[b]NP)/PP'). w(7,?, ?, '.', 'O', 'S[wq]S[wq]').

© Johan Bos April 2008 Question Analysis Tokenisation Part of speech tagging Lemmatisation Syntactic analysis (Parsing)  Semantic analysis (Boxing)

© Johan Bos April 2008 knowledge parsing boxing query answer typing Indri answer extraction answer selection answer reranking question answer ccg drs WordNet NomLex Indexed Documents Architecture of PRONTO

© Johan Bos April 2008 Boxing (Semantic Analysis) Providing a semantic analysis on the basis of the syntactic analysis A semantic analysis of a question offers an abstract representation of the meaning of the question Boxer uses a particular semantic theory: Discourse Representation Theory

© Johan Bos April 2008 Discourse Representation Theory Meaning of natural language expressions represented in first-order logic No formulas but box representation (without explicit quantification and conjunction) DRT covers a wide range of linguistic phenomena (Kamp & Reyle)

© Johan Bos April 2008 Output of Boxer _______________________ ____________________________________ | x0 | | x1 | |_______________________| |____________________________________| (| named(x0,krugman,per) |+| write(x1) |) | named(x0,paul,per) | | event(x1) | | | | agent(x1,x0) | |_______________________| | _______________ ____________ | | | x2 | | | | | |_______________| |____________| | | | newspaper(x2) | ? | event(x1) | | | |_______________| | for(x1,x2) | | | |____________| | |____________________________________| DRS (Discourse Representation Structure): Paul Krugman. For which newspaper does Krugman write?

© Johan Bos April 2008 Focus and Topic Information expressed in a question can be structured into two parts: –the focus: information that is asked for –the topic: information about focus Example: How many inhabitants does Rome have? FOCUS TOPIC

© Johan Bos April 2008 Focus in DRS Focus _______________________ ____________________________________ | x0 | | x1 | |_______________________| |____________________________________| (| named(x0,krugman,per) |+| write(x1) |) | named(x0,paul,per) | | event(x1) | | | | agent(x1,x0) | |_______________________| | _______________ ____________ | | | x2 | | | | | |_______________| |____________| | | | newspaper(x2) | ? | event(x1) | | | |_______________| | for(x1,x2) | | | |____________| | |____________________________________|

© Johan Bos April 2008 Question Answering (QA) Lecture 2 Question Analysis  Background Knowledge Answer Typing

© Johan Bos April 2008 knowledge parsing boxing query answer typing Indri answer extraction answer selection answer reranking question answer ccg drs WordNet NomLex Indexed Documents Architecture of PRONTO

© Johan Bos April 2008 Knowledge Construction The knowledge component in Pronto constructs a local knowledge base for a the question under consideration –This knowledge is used in subsequent components The task of the knowledge component is to find all relevant knowledge that might be used –As little as possible to ensure efficiency

© Johan Bos April 2008 Manually Constructed Knowledge Linguistic knowledge –WordNet –NomLex –FrameNet General knowledge –CYC –CIA Factbook –Gazzetteers

© Johan Bos April 2008 WordNet Electronic dictionary Not only words and definitions, but also relations between words Four parts of speech –Nouns –Verbs –Adjectives –Adverbs

© Johan Bos April 2008 WordNet SynSets Words are organised in SynSets A SynSet is a group of words with the same meaning --- in other words, a set of synonyms Example: { Rome, Roma, Eternal City, Italian Capital, capital of Italy }

© Johan Bos April 2008 Senses A word can have several different meanings Example: plant –A building for industrial labour –A living organism lacking the power of locomotion The different meanings of a word are called senses Therefore, one word can occur in more than one SynSet in WordNet

© Johan Bos April 2008 SynSet Example -{mug, mugful} = the quantity that can be held in a mug -{chump, fool, gull, mark, patsy, fall guy, sucker, soft touch, chump, mug} = a person who is gullible and easy to take advantage of -{countenance, physiognomy, phiz, visage, kisser, smiler, mug} = the human face

© Johan Bos April 2008 Hypernyms and Hyponyms Hyperonomy is a WordNet relation defined among two SynSets –If A is a hypernym of B, then A is more generic then B The inverse of hyperonomy is hyponomy –If A is a hyponym of B, then A is more specific then B Take transitive closure of these relations Examples: –“cow” and “horse” are hyponyms of “animal” –“publication” is a hypernym of “book”

© Johan Bos April 2008 Examples using WordNet Which rock singer wrote Lithium? –WordNet: singer is a hyponym of person –Knowledge:  x(singer(x)  person(x)) What is the population of Andorra? –WordNet: population is a hyponym of number –Knowledge:  x(population(x)  number(x))

© Johan Bos April 2008 NomLex NomLex is a database of nominalisation paraphrases –A nominalisation is a “verb promoted to a noun” –A paraphrase links the noun to the root verb Example: –X is an invention by Y  Y invented X –the killing of X  X was killed

© Johan Bos April 2008 Harvesting Knowledge Often existing knowledge bases are incomplete for particular applications There are various ways to automatically construct knowledge bases: –Instances and Hyponyms [e.g. Hearst] –Paraphrases [e.g. Lin & Pantel]

© Johan Bos April 2008 Hyponyms (X such-as Y) WordNet has no instances of airlines. TREC 20.2 (Concorde) What airlines have Concorde in their fleets?

© Johan Bos April 2008 Hyponyms (X such as Y) Search for “Xs such as Y” patterns in large corpora, such as the web Here: X = airline, Y a hyponym of X Corpus: …airlines such as Continental and United now fly… TREC 20.2 (Concorde) What airlines have Concorde in their fleets?

© Johan Bos April 2008 Hyponyms (X such as Y) Knowledge (Acquaint corpus): Air Asia, Air Canada, Air France, Air Mandalay, Air Zimbabwe, Alaska, Aloha, American Airlines, Angel Airlines, Ansett, Asiana, Bangkok Airways, Belgian Carrier Sabena, British Airways, Canadian, Cathay Pacific, China Eastern Airlines, China Xinhua Airlines, Continental, Garuda, Japan Airlines, Korean Air, Lai, Lao Aviation, Lufthansa, Malaysia Airlines, Maylasian Airlines, Midway, Northwest, Orient Thai Airlines, Qantas, Seage Air, Shanghai Airlines, Singapore Airlines, Skymark Airlines Co., South Africa, Swiss Air, US Airways, United, Virgin, Yangon Airways TREC 20.2 (Concorde) What airlines have Concorde in their fleets?

© Johan Bos April 2008 Paraphrases Several methods have been developed for automatically finding paraphrases in large corpora This usually proceeds by starting with seed patterns of known positive instances Using bootstrapping new patterns are found, and new seeds can be used

© Johan Bos April 2008 Seed example Start: Oswald killed JFK Search for "Oswald * JFK" Results: –Oswald assassinated JFK –Oswald shot JFK Use these new patters to find other pairs and start again

© Johan Bos April 2008 Paraphrase Example Knowledge:  x  t(  e(kill(e)&theme(e,x)&in(e,t))   e(die(e)&agent(e,x)&in(e,t))) TREC 4.2 (James Dean) When did James Dean die?

© Johan Bos April 2008 Paraphrase Example Knowledge:  x  t(  e(kill(e)&theme(e,x)&in(e,t))   e(die(e)&agent(e,x)&in(e,t))) TREC 4.2 (James Dean) When did James Dean die? APW : In 1955, actor James Dean was killed in a two-car collision near Cholame, Calif.

© Johan Bos April 2008 Question Answering (QA) Lecture 2 Question Analysis Background Knowledge  Answer Typing

© Johan Bos April 2008 knowledge parsing boxing query answer typing Indri answer extraction answer selection answer reranking question answer ccg drs WordNet NomLex Indexed Documents Architecture of PRONTO

© Johan Bos April 2008 Answer Typing Providing information on the expected answer type –Type of question –Type (sortal ontology or taxonomy) –Answer cardinality Issues –Ambiguities –Vagueness –Classification problems

© Johan Bos April 2008 Question Types Wh-questions: –Where was Franz Kafka born? –How many countries are member of OPEC? –Who is Thom Yorke? –Why did David Koresh ask the FBI for a word processor? –How did Frank Zappa die? –Which boxer beat Muhammed Ali?

© Johan Bos April 2008 Question Types Yes-no questions: –Does light have weight? –Scotland is part of England – true or false? Choice-questions: –Did Italy or Germany win the world cup in 1982? –Who is Harry Potter’s best friend – Ron, Hermione or Sirius?

© Johan Bos April 2008 Indirect Questions Imperative mood: –Name four European countries that produce wine. –Give the date of birth of Franz Kafka. Declarative mood: –I would like to know when Jim Morrison was born.

© Johan Bos April 2008 Answer Type Taxonomies Simple Answer-Type Taxonomy: PERSON NUMERAL DATE MEASURE LOCATION ORGANISATION

© Johan Bos April 2008 Expected Answer Types PERSON: –Who won the Nobel prize for Peace? –Which rock singer wrote Lithium?

© Johan Bos April 2008 Expected Answer Types NUMERAL: –How many inhabitants does Rome have? –What’s the population of Scotland?

© Johan Bos April 2008 Expected Answer Types DATE: –When was JFK killed? –In what year did Rome become the capital of Italy?

© Johan Bos April 2008 Expected Answer Types MEASURE: –How much does a 125 gallon fish tank cost? –How tall is an African elephant? –How heavy is a Boeing 777?

© Johan Bos April 2008 Expected Answer Types LOCATION: –Where does Angus Young of AC/DC live? –What city gives a Christmas tree to Westminster every year as a gift?

© Johan Bos April 2008 Expected Answer Types ORGANISATION: –Which company invented the compact disk? –Who purchased Gilman Paper Company?

© Johan Bos April 2008 Using background knowledge Which rock singer … –singer is a hyponym of person, therefore expected answer type is PERSON What is the population of … –population is a hyponym of number, hence answer type NUMERAL

© Johan Bos April 2008 Answer type tagging Simple rule-based systems: Who …  PERSON Where …  LOCATION When …  DATE How many …  NUMERAL …often fail… –Who launched the iPod? –Where in the human body is the liver? –When is it time to go to bed?

© Johan Bos April 2008 Complex taxonomies Simple ontologies cannot account for the large variety of questions An example of a more complex ontology is proposed by Li & Roth Pronto uses its own complex ontology Machine learning approaches are often used to automatically tag questions with answer types

© Johan Bos April 2008 Taxonomy of Li & Roth (1/3) ENTITY –animal animals –body organs of body –color colors –creative inventions, books and other creative pieces –currency currency names –dis.med. diseases and medicine –event events –food food –instrument musical instrument –lang languages –letter letters like a-z –other other entities –plant plants –product products –religion religions –sport sports –substance elements and substances –symbol symbols and signs –technique techniques and methods –term equivalent terms –vehicle vehicles –word words with a special property

© Johan Bos April 2008 Taxonomy of Li & Roth (2/3) DESCRIPTION description and abstract concepts –definition definition of sth. –description description of sth. –manner manner of an action –reason reasons HUMAN human beings –group a group or organization of persons –ind an individual –title title of a person –description description of a person LOCATION locations –city cities –country countries –mountain mountains –other other locations –state states

© Johan Bos April 2008 Taxonomy of Li & Roth (3/3) NUMERIC numeric values –code postcodes or other codes –count number of sth. –date dates –distance linear measures –money prices –order ranks –other other numbers –period the lasting time of sth. –percent fractions –speed speed –temp temperature –size size, area and volume –weight weight ABBREVIATION –abb abbreviation –exp expansion

© Johan Bos April 2008 Pronto Answer Type Taxonomy

© Johan Bos April 2008 Pronto Answer Type Taxonomy

© Johan Bos April 2008 Answer typing: problems Ambiguities How long  distance or duration Vague Wh-words What do pinguins eat? What is the length of a football pitch? Taxonomy gaps Which alien race featured in Star Trek? What is the cultural capital of Italy?

© Johan Bos April 2008 Answer Cardinality How many distinct answers does a question have? Examples: –When did Louis Braille die?  1 answer –Who won a nobel prize in chemistry?  1 or more answers –What are the seven wonders of the world?  exactly 7 answers

© Johan Bos April 2008 Class activity: answer typing 1.How many islands does Italy have? 2.When did Inter win the Scudetto? 3.What are the colours of the Lithuanian flag? 4.Where is St. Andrews located? 5.Why does oil float in water? 6.How did Frank Zappa die? 7.Name the Baltic countries. 8.Which seabird was declared extinct in the 1840s? 9.Who is Noam Chomsky? 10.List names of Russian composers. 11.Edison is the inventor of what? 12.How far is the moon from the sun? 13.What is the distance from New York to Boston? 14.How many planets are there? 15.What is the exchange rate of the Euro to the Dollar? 16.What does SPQR stand for? 17.What is the nickname of Totti? 18.What does the Scottish word “bonnie” mean? 19.Who wrote the song “Paranoid Android”?

© Johan Bos April 2008 knowledge parsing boxing query answer typing Indri answer extraction answer selection answer reranking question answer ccg drs WordNet NomLex Indexed Documents Lecture 3

© Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System Evaluation State-of-the-art Lecture 2 Question Analysis Background Knowledge Answer Typing Lecture 3 Query Generation Document Analysis Semantic Indexing Answer Extraction Selection and Ranking