© Johan Bos November 2005 Pub Quiz. © Johan Bos November 2005 Question Answering Lecture 1 (Last week): Introduction; History of QA; Architecture of a.

Slides:



Advertisements
Similar presentations
© Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.
Advertisements

Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.
© Johan Bos November 2005 Question Answering Lecture 1 (two weeks ago): Introduction; History of QA; Architecture of a QA system; Evaluation. Lecture 2.
Question Answering Paola Velardi, Johan Bos. Outline Introduction: History of QA; Architecture of a QA system; Evaluation. Question Classification: NLP.
Statistical NLP: Lecture 3
LING 388 Language and Computers Lecture 22 11/25/03 Sandiway FONG.
The Eight Parts of Speech
1 Words and the Lexicon September 10th 2009 Lecture #3.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Introduction to Computational Linguistics Lecture 2.
Stemming, tagging and chunking Text analysis short of parsing.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Outline of English Syntax.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language Syntax Parsing.
© Johan Bos November 2005 Carol Beer (Little Britain)
CS224N Interactive Session Competitive Grammar Writing Chris Manning Sida, Rush, Ankur, Frank, Kai Sheng.
1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.
NATURAL LANGUAGE TOOLKIT(NLTK) April Corbet. Overview 1. What is NLTK? 2. NLTK Basic Functionalities 3. Part of Speech Tagging 4. Chunking and Trees 5.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
© Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.
ELN – Natural Language Processing Giuseppe Attardi
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
Natural Language Processing Lecture 6 : Revision.
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with.
Methods for the Automatic Construction of Topic Maps Eric Freese, Senior Consultant ISOGEN International.
CSA2050 Introduction to Computational Linguistics Lecture 3 Examples.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Grammar Race!. What is a sentence? Sentences express complete thoughts; they have a subject and a predicate. Subjects are nouns or pronouns (or phrases.
Linguistic Essentials
Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.
Rules, Movement, Ambiguity
CSA2050 Introduction to Computational Linguistics Parsing I.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Grammar Review Parts of Speech Sentences Punctuation.
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
March 2006Introduction to Computational Linguistics 1 CLINT Tokenisation.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Part-of-speech tagging
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
Text segmentation Amany AlKhayat. Before any real processing is done, text needs to be segmented at least into linguistic units such as words, punctuation,
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.
Machine Learning in Practice Lecture 13 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
NATURAL LANGUAGE PROCESSING
NLP. Introduction to NLP #include int main() { int n, reverse = 0; printf("Enter a number to reverse\n"); scanf("%d",&n); while (n != 0) { reverse =
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
Introduction to Machine Learning and Text Mining
Statistical NLP: Lecture 3
Basic Parsing with Context Free Grammars Chapter 13
LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.
Probabilistic and Lexicalized Parsing
Natural Language - General
Chunk Parsing CS1573: AI Application Development, Spring 2003
Linguistic Essentials
CS246: Information Retrieval
Natural Language Processing
David Kauchak CS159 – Spring 2019
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

© Johan Bos November 2005 Pub Quiz

© Johan Bos November 2005 Question Answering Lecture 1 (Last week): Introduction; History of QA; Architecture of a QA system; Evaluation. Lecture 2 (Today): Question Classification; NLP techniques for question analysis; Tokenisation; Lemmatisation; POS-tagging; Parsing; WordNet. Lecture 3 (Next lecture): Retrieving Answers; Document pre-processing; Named Entity Recognition; Anaphora Resolution; Matching; Reranking; Sanity checking.

© Johan Bos November 2005 Architecture of a QA system IR Question Analysis query Document Analysis Answer Extraction question answer-type question representation documents/passages passage representation corpus answers

© Johan Bos November 2005 Architecture of a QA system IR Question Analysis query Document Analysis Answer Extraction question answer-type question representation documents/passages passage representation corpus answers

© Johan Bos November 2005 Syntactically Distinguishing Questions Wh-questions: –Where was Franz Kafka born? –How many countries are member of OPEC? –Who is Thom Yorke? –Why did David Koresh ask the FBI for a word processor? –How did Frank Zappa die? –Which boxer beat Muhammed Ali?

© Johan Bos November 2005 Syntactically Distinguishing Questions Yes-no questions: –Does light have weight? –Scotland is part of England – true or false? Choice-questions: –Did Italy or Germany win the world cup in 1982? –Who is Harry Potter’s best friend – Ron, Hermione or Sirius?

© Johan Bos November 2005 Syntactically Distinguishing Questions Imperative: –Name four European countries that produce wine. –Give the date of birth of Franz Kafka. Declarative: –I would like to know when Jim Morrison was born.

© Johan Bos November 2005 Semantically Distinguishing Questions Divide questions according to their expected answer type Simple Answer-Type Typology: PERSON NUMERAL DATE MEASURE LOCATION ORGANISATION ENTITY

© Johan Bos November 2005 Expected Answer Types DATE: –When was JFK killed? –In what year did Rome become the capital of Italy?

© Johan Bos November 2005 Expected Answer Types DATE: –When was JFK killed? –In what year did Rome become the capital of Italy? PERSON: –Who won the Nobel prize for Peace? –Which rock singer wrote Lithium?

© Johan Bos November 2005 Expected Answer Types DATE: –When was JFK killed? –In what year did Rome become the capital of Italy? PERSON: –Who won the Nobel prize for Peace? –Which rock singer wrote Lithium? NUMERAL: –How many inhabitants does Rome have? –What’s the population of Scotland?

© Johan Bos November 2005 Focus and Topic Information expressed in a question can be structured into two parts: –the focus: information that is asked for –the topic: information about focus Example: How many inhabitants does Rome have? FOCUS TOPIC

© Johan Bos November 2005 We need to know how to process natural language!

© Johan Bos November 2005 Architecture of a QA system IR Question Analysis query Document Analysis Answer Extraction question answer-type question representation documents/passages passage representation corpus answers

© Johan Bos November 2005 Generating Query Terms Example 1: –Question: Who discovered prions? –Text A: Dr. Stanley Prusiner received the Nobel prize for the discovery of prions. –Text B: Prions are a kind of proteins that… Query terms?

© Johan Bos November 2005 Generating Query Terms Example 2: –Question: When did Franz Kafka die? –Text A: Kafka died in –Text B: Dr. Franz died in Query terms?

© Johan Bos November 2005 Generating Query Terms Example 3: –Question: How did actor James Dean die? –Text: James Dean was killed in a car accident. Query terms?

© Johan Bos November 2005 We need to know how to process natural language!

© Johan Bos November 2005 Architecture of a QA system IR Question Analysis query Document Analysis Answer Extraction question answer-type question representation documents/passages passage representation corpus answers

© Johan Bos November 2005 Difference in structure Example: –Question: When did Franz Kafka die? –Text A: The mother of Franz Kafka died in 1918.

© Johan Bos November 2005 Difference in structure Example: –Question: When did Franz Kafka die? –Text A: The mother of Franz Kafka died in –Text B: Kafka died in 1924.

© Johan Bos November 2005 Difference in structure Example: –Question: When did Franz Kafka die? –Text A: The mother of Franz Kafka died in –Text B: Kafka died in –Text C: Both Kafka and Lenin died in 1924.

© Johan Bos November 2005 Difference in structure Example: –Question: When did Franz Kafka die? –Text A: The mother of Franz Kafka died in –Text B: Kafka died in –Text C: Both Kafka and Lenin died in –Text D: Max Brod, a friend of Kafka, died in 1930.

© Johan Bos November 2005 We need to know how to process natural language!

© Johan Bos November 2005 We need ways to automate the process of manipulating natural language –Punctuation –The way words are composed –The relationship between wordforms –The relationship between words –The structure of phrases This is where NLP (Natural Language Processing) comes in! Natural Language is messy!

© Johan Bos November 2005 NLP Techniques Tokenisation Lemmatisation Part of Speech Tagging Syntactic analysis (parsing) WordNet

© Johan Bos November 2005 NLP Techniques Tokenisation Lemmatisation Part of Speech Tagging Syntactic analysis (parsing) WordNet

© Johan Bos November 2005 Tokenisation Tokenisation is the task that splits words from punctuation –Semicolons, colons ; : –exclamation marks, question marks ! ? –commas and full stops., –quotes “ ‘ ` Tokens are normally split by spaces

© Johan Bos November 2005 Tokenisation: Example 1 Input (9 tokens): When was the Buckingham Palace built in London, England?

© Johan Bos November 2005 Tokenisation: Example 1 Input (9 tokens): When was the Buckingham Palace built in London, England? Output (11 tokens): When was the Buckingham Palace built in London, England ?

© Johan Bos November 2005 Tokenisation: Example 2 Input (7 tokens): What year did "Snow White" come out?

© Johan Bos November 2005 Tokenisation: Example 2 Input (7 tokens): What year did "Snow White" come out? Output (10 tokens): What year did “ Snow White " come out ?

© Johan Bos November 2005 Tokenisation: combined words Combined words are split –I’d  I ’d –country’s  country ’s –won’t  will n’t –“don’t!”  “ do n’t ! “ Some Italian examples – gliel’ha detto  gli lo ha detto – posso prenderlo  posso prendere lo

© Johan Bos November 2005 Difficulties with tokenisation Abbreviations, acronyms –When was the U.S. invasion of Haiti? In particular if the abbreviation or acronym is the last word of a sentence –Look at next word: if in uppercase, then assume it is end of sentence –But think of cases such as Mr. Jones

© Johan Bos November 2005 Why is tokenisation important? To look up a word in an electronic dictionary (such as WordNet) For all subsequent stages of processing –Lemmatisation –Parsing

© Johan Bos November 2005 NLP Techniques Tokenisation Lemmatisation Part of Speech Tagging Syntactic analysis (parsing) WordNet

© Johan Bos November 2005 Lemmatisation Lemmatising means – grouping morphological variants of words under a single headword For example, you could take the words am, was, are, is, were, and been together under the word be

© Johan Bos November 2005 Lemmatisation Lemmatising means – grouping morphological variants of words under a single headword For example, you could take the words am, was, are, is, were, and been together under the word be

© Johan Bos November 2005 Lemmatisation Using linguistic terminology, the variants taken together form the lemma of a lexeme Lexeme: a “lexical unit”, an abstraction over specific constructions Other examples: dying, die, died, dies  die car, cars  car man, men  man

© Johan Bos November 2005 NLP Techniques Tokenisation Lemmatisation Part of Speech Tagging Syntactic analysis (parsing) WordNet

© Johan Bos November 2005 Traditional parts of speech Verb Noun Pronoun Adjective Adverb Preposition Conjunction Interjection

© Johan Bos November 2005 Parts of speech in NLP CLAWS1 (132 tags) Examples: NN singular common noun (boy, pencil... ) NN$ genitive singular common noun (boy's, parliament's... ) NNP singular common noun with word initial capital (Austrian, American, Sioux, Eskimo... ) NNP$ genitive singular common noun with word initial capital (Sioux', Eskimo's, Austrian's, American's,...) NNPS plural common noun with word initial capital (Americans,... ) NNPS$ genitive plural common noun with word initial capital (Americans‘, …) NNS plural common noun (pencils, skeletons, days, weeks... ) NNS$ genitive plural common noun (boys', weeks'... ) NNU abbreviated unit of measurement unmarked for number (in, cc, kg …) Penn Treebank (45 tags) Examples: JJ adjective (green, …) JJR adjective, comparative (greener,…) JJS adjective, superlative (greenest, …) MD modal (could, will, …) NN noun, singular or mass (table, …) NNS noun plural (tables, …) NNP proper noun, singular (John, …) NNPS proper noun, plural (Vikings, …) PDT predeterminer (both the boys) POS possessive ending (friend's) PRP personal pronoun (I, he, it, …) PRP$ possessive pronoun (my, his, …) RB adverb (however, usually, naturally, here, good, …) RBR adverb, comparative (better, …)

© Johan Bos November 2005 POS tagged example What year did “ Snow White " come out ?

© Johan Bos November 2005 POS tagged example What WP year NN did VBD “ Snow NNP White NNP " “ come VB out IN ?.

© Johan Bos November 2005 Why is POS-tagging important? To disambiguate words For instance, to distinguish “book” used as a noun from “book” used as a verb –I like that book –Did you book a room? Prerequisite for further processing stages, such as parsing

© Johan Bos November 2005 NLP Techniques Tokenisation Lemmatisation Part of Speech Tagging Syntactic analysis (parsing) WordNet

© Johan Bos November 2005 What is Parsing Parsing is the process of assigning a syntactic structure to a sequence of words The syntactic structure is defined using a grammar A grammar contains of a set of symbols (terminal and non-terminal symbols) and production rules (grammar rules) The lexicon is built over the terminal symbols (i.e., the words)

© Johan Bos November 2005 Syntactic Categories The non-terminal symbols correspond to syntactic categories –Det (determiner) –N (noun) –IV (intransitive verb) –TV (transitive verb) –PN (proper name) –Prep (preposition) –NP (noun phrase) the car –PP (prepositional phrase) at the table –VP (verb phrase) saw a car –S (sentence) Mia likes Vincent

© Johan Bos November 2005 Example Grammar Lexicon Det: which, a, the,… N: rock, singer, … IV: die, walk, … TV: kill, write,… PN: John, Lithium, … Prep: on, from, to, … Grammar Rules S  NP VP NP  Det N NP  PN N  N N N  N PP VP  TV NP VP  IV PP  Prep NP VP  VP PP

© Johan Bos November 2005 The Parser A parser automates the process of parsing The input of the parser is a string of words (possibly annotated with POS- tags) The output of a parser is a parse tree, connecting all the words The way a parse tree is constructed is also called a derivation

© Johan Bos November 2005 Derivation Example Which rock singer wrote Lithium

© Johan Bos November 2005 Lexical stage Det N N TV PN Which rock singer wrote Lithium

© Johan Bos November 2005 Use rule: NP  Det N NP Det N N TV PN Which rock singer wrote Lithium

© Johan Bos November 2005 Use rule: NP  PN NP NP Det N N TV PN Which rock singer wrote Lithium

© Johan Bos November 2005 Use rule: VP  TV NP VP NP NP Det N N TV PN Which rock singer wrote Lithium

© Johan Bos November 2005 Backtracking VP NP NP Det N N TV PN Which rock singer wrote Lithium

© Johan Bos November 2005 Use rule: N  N N VP N NP Det N N TV PN Which rock singer wrote Lithium

© Johan Bos November 2005 Use rule: NP  Det N NP VP N NP Det N N TV PN Which rock singer wrote Lithium

© Johan Bos November 2005 Use rule S  NP VP S NP VP N NP Det N N TV PN Which rock singer wrote Lithium

© Johan Bos November 2005 Syntactic “head” S NP VP N NP Det N N TV PN Which rock singer wrote Lithium

© Johan Bos November 2005 Parse Tree (another example) S NP N PP VP NP VP PP Det N Prep PN PN IV Prep NP The mother of Franz Kafka died in 1918

© Johan Bos November 2005 Syntactic head S NP N PP VP NP VP PP Det N Prep PN PN IV Prep NP The mother of Franz Kafka died in 1918

© Johan Bos November 2005 Using a parser Normally expects tokenised and POS-tagged input Example of wide-coverage parsers: –Charniak parser –Collins parser –RASP (Carroll & Briscoe) –CCG parser (Clark & Curran)

© Johan Bos November 2005 NLP Techniques Tokenisation Lemmatisation Part of Speech Tagging Syntactic analysis (parsing) WordNet

© Johan Bos November 2005 WordNet Electronic dictionary Not only words and definitions, but also relations between words Four parts of speech –Nouns –Verbs –Adjectives –Adverbs

© Johan Bos November 2005 WordNet SynSets Words are organised in SynSets A SynSet is a group of words with the same meaning --- in other words, a set of synonyms Example: { Rome, Roma, Eternal City, Italian Capital, capital of Italy}

© Johan Bos November 2005 Senses A word can have several different meanings Example: plant –A building for industrial labour –A living organism lacking the power of locomotion The different meanings of a word are called senses Therefore, one word can occur in more than one SynSet in WordNet

© Johan Bos November 2005 SynSet Example -{mug, mugful} = the quantity that can be held in a mug -{chump, fool, gull, mark, patsy, fall guy, sucker, soft touch, chump, mug} = a person who is gullible and easy to take advantage of -{countenance, physiognomy, phiz, visage, kisser, smiler, mug} = the human face

© Johan Bos November 2005 Hypernyms Hyperonomy is a WordNet relation defined among two SynSets –If A is a hypernym of B, then A is more generic then B The inverse of hyperonomy is hyponomy –If A is a hyponym of B, then A is more specific then B Use these relations transitively Examples: –“cow” and “horse” are hyponyms of “animal” –“publication” is a hypernym of “book”

© Johan Bos November 2005 Examples using WordNet Which rock singer … –singer is a hyponym of person, therefore expected answer type is PERSON What is the population of … –population is a hyponym of number, hence answer type NUMERAL

© Johan Bos November 2005 How to use NLP tools? There is a large set of tools available on the web, most of it free for research Examples of integrated text processing environment –GATE (University of Sheffield) –TTT (University of Edinburgh) –LingPipe –For a general ovewrview of NLP tools, see

© Johan Bos November 2005 Question Answering Lecture 1 (Last week): Introduction; History of QA; Architecture of a QA system; Evaluation. Lecture 2 (Today): Question Classification; NLP techniques for question analysis; Tokenisation; Lemmatisation; POS-tagging; Parsing; WordNet. Lecture 3 (Next lecture): Retrieving Answers; Document pre-processing; Named Entity Recognition; Anaphora Resolution; Matching; Reranking; Sanity checking.