QuALiM – Michael Kaisser The QuALiM Question Answering system Question Answering by Searching Large Corpora with Linguistic Methods.

Slides:



Advertisements
Similar presentations
Chapter 13: Query Processing
Advertisements

Finding The Unknown Number In A Number Sentence! NCSCOS 3 rd grade 5.04 By: Stephanie Irizarry Click arrow to go to next question.
Artificial Intelligence: Natural Language and Prolog
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
The Relational Model and Normalization (3) IS 240 – Database Management Lecture # Prof. M. E. Kabay, PhD, CISSP Norwich University
Chapter 1 The Study of Body Function Image PowerPoint
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
1 Use of Electronic Resources in Research Prof. Dr. Khalid Mahmood Department of Library & Information Science University of the Punjab.
We need a common denominator to add these fractions.
Library 1 Electronic Resources in the EUI Library Veerle Deckmyn, Library Director Aimee Glassel, Electronic Resources Librarian September 2, 2009.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
Jeopardy Q 1 Q 2 Q 3 Q 4 Q 5 Q 6Q 16Q 11Q 21 Q 7Q 12Q 17Q 22 Q 8Q 13Q 18 Q 23 Q 9 Q 14Q 19Q 24 Q 10Q 15Q 20Q 25 Final Jeopardy Literature Terms I.
Determine Eligibility Chapter 4. Determine Eligibility 4-2 Objectives Search for Customer on database Enter application signed date and eligibility determination.
Multiplying binomials You will have 20 seconds to answer each of the following multiplication problems. If you get hung up, go to the next problem when.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Addition Facts
Year 6 mental test 5 second questions
Date : 2012/09/20 Author : Sina Fakhraee, Farshad Fotouhi Source : KEYS12 Speaker : Er-Gang Liu Advisor : Dr. Jia-ling Koh 1.
Site Safety Plans PFN ME 35B.
LABELING TURKISH NEWS STORIES WITH CRF Prof. Dr. Eşref Adalı ISTANBUL TECHNICAL UNIVERSITY COMPUTER ENGINEERING 1.
1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology.
1 Payment Systems Funds Availability Problems. 2 Problems Funds Availability Scenarios All deposits made in Oklahoma City on Monday March 1 Bank open.
Chair of Software Engineering Einführung in die Programmierung Introduction to Programming Prof. Dr. Bertrand Meyer Exercise Session 5.
Data Structures: A Pseudocode Approach with C
ABC Technology Project
© Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.
PubMed Searching: Automatic Term Mapping (ATM) PubMed for Trainers, Spring 2014 U.S. National Library of Medicine (NLM) and NLM Training Center.
Chapter Information Systems Database Management.
Copyright © 2013, 2009, 2005 Pearson Education, Inc.
By Waqas Over the many years the people have studied software-development approaches to figure out which approaches are quickest, cheapest, most.
Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010.
1 Evaluations in information retrieval. 2 Evaluations in information retrieval: summary The following gives an overview of approaches that are applied.
Measurements and Their Uncertainty 3.1
A Third Look At ML 1. Outline More pattern matching Function values and anonymous functions Higher-order functions and currying Predefined higher-order.
© 2012 National Heart Foundation of Australia. Slide 2.
Lecture plan Outline of DB design process Entity-relationship model
Sets Sets © 2005 Richard A. Medeiros next Patterns.
Chapter 5 Test Review Sections 5-1 through 5-4.
Addition 1’s to 20.
25 seconds left…...
School Census Summer 2011 Headlines Version Jim Haywood Product Manager for Statutory Returns.
Teaching Adults to Read: Assessment Strategies and Reading Profiles 2011 ABE Statewide Summer Institute August 19,
Chapter 2 Entity-Relationship Data Modeling: Tools and Techniques
Week 1.
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 12 View Design and Integration.
PSSA Preparation.
Chapter 11 Describing Process Specifications and Structured Decisions
1 Functions and Applications
Systems Analysis and Design
1 Programming Languages (CS 550) Mini Language Interpreter Jeremy R. Johnson.
© Johan Bos November 2005 Question Answering Lecture 1 (two weeks ago): Introduction; History of QA; Architecture of a QA system; Evaluation. Lecture 2.
The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.
Using Information Extraction for Question Answering Done by Rani Qumsiyeh.
Use of Patterns for Detection of Answer Strings Soubbotin and Soubbotin.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( Bridging Languages for Question Answering: DIOGENE at CLEF-2003.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
Using Semantic Relations to Improve Information Retrieval
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Presentation transcript:

QuALiM – Michael Kaisser The QuALiM Question Answering system Question Answering by Searching Large Corpora with Linguistic Methods

QuALiM – Michael Kaisser Talk Outline What does a QA system do? QuALiMs two answer strategies: –Fallback mechanism –Rephrasing algorithm TREC evaluation results Post TREC evaluation results

QuALiM – Michael Kaisser Question Answering - Definition Definition from Wikipedia: Question Answering (QA) is a type of information retrieval. Given a collection of documents (such as the World Wide Web) the system should be able to retrieve answers to questions posed in natural language. QA is regarded as requiring more complex natural language processing (NLP) techniques than other types of information retrieval such as document retrieval, and it is sometimes regarded as the next step beyond search engines.

QuALiM – Michael Kaisser Question Answering - Example Start is MIT’s QA system:

QuALiM – Michael Kaisser Question Answering - Example Start is MIT’s QA system:

QuALiM – Michael Kaisser Question Answering - Example Start is MIT’s QA system: Better—however— would be: “Albert Einstein was born on March 14th, 1879.” The system should actually return a complete English sentence expressing the desired fact.

QuALiM – Michael Kaisser The Fallback Mechanism (exemplary for common answer finding techniques)

QuALiM – Michael Kaisser Fallback Mechanism The fallback mechanism creates queries based upon keywords and key phrases from the question. Three queries are send to Google: The first query contains all non-stop words from the question The second contains all NPs from the question (that contain at least one non-stop word) The third query contains all NPs and all non-stop words that do not occur in the NPs.

QuALiM – Michael Kaisser Fallback Mechanism So "When was Jim Inhofe first elected to the senate?” becomes Jim Inhofe senate first elected “Jim Inhofe” “the senate” “Jim Inhofe” “the senate” first elected Note: The results from the last query are weighted twice as high as the results form the first two queries.

QuALiM – Michael Kaisser Fallback Mechanism 72.0: "senator" 42.0: "senator jim inhofe" "senator jim" 41.25: "r" (abbreviation for republican) 32.25: "oklahoma" 30.0: "r-okla" (abbreviation for republican-oklahoma) 26.25: "1994" 25.0: "the leading conservative voices" "of the leading conservative voices“ "leading conservative voices" 24.0: "us senator" 23.25: "republican" 21.0: "okla" (abbreviation for oklahoma) The result from the queries when placed in a Weighted Sequence Bag:

QuALiM – Michael Kaisser Fallback Mechanism 72.0: "senator" 42.0: "senator jim inhofe" "senator jim" 41.25: "r" (abbreviation for republican) 32.25: "oklahoma" 30.0: "r-okla" (abbreviation for republican-oklahoma) 26.25: “1994” 25.0: "the leading conservative voices" "of the leading conservative voices“ "leading conservative voices" 24.0: "us senator" 23.25: "republican" 21.0: "okla" (abbreviation for oklahoma) But we know that we are looking for a date, so the answer is “1994”:

QuALiM – Michael Kaisser Definition Questions Query: "Florence Nightingale“ 20.0: "may 12, 1820" 16.0: "may 12" "nursing" 15.0: "august 13, 1910" 14.0: " “ 13.0: "born" 12.0: "august 13" "museum" 11.0: "history" 10.0: "modern nursing" "lady with the lamp" "florence nightingale museum" "the lady with the lamp" 9.0: "italy" 8.0: "of modern nursing" "nurses" "london" 7.5: "on may 12, 1820" 7.0: "2 lambeth palace road london"

QuALiM – Michael Kaisser Definition Questions 20.0: "may 12, 1820" 16.0: "may 12" "nursing" 15.0: "august 13, 1910" 14.0: " “ 13.0: "born" 12.0: "august 13" "museum" 11.0: "history" 10.0: "modern nursing" "lady with the lamp" "florence nightingale museum" "the lady with the lamp" 9.0: "italy" 8.0: "of modern nursing" "nurses" "london" 7.5: "on may 12, 1820" 7.0: "2 lambeth palace road london“ Answer sentences in AQUAINT corpus: "on may 12, 1820, the founder of modern nursing, florence nightingale, was born in florence, italy." "on aug. 13, 1910, florence nightingale, the founder of modern nursing, died in london.“

QuALiM – Michael Kaisser The Rephrasing Algorithm

QuALiM – Michael Kaisser Pattern Layout When did NP V INF NP|PP ? in NP in NP, more targets... dateComplete Date year|in_year Sequences are matched against questions. Targets describe (flat) syntactic structures of potential answer sentences. AnswerTypes place restrictions on the expected answer type.

QuALiM – Michael Kaisser Sequences When did NP V INF NP|PP ? This sequence matches all questions beginning with “When” followed by “did” followed by an NP followed by a verb in its infinitive form followed by an NP or a PP followed by a question mark (which has to be the last element in the question) question start word: When word: did phrase: NP POS: V INF phrase: NP or PP punctuation: ? question end

QuALiM – Michael Kaisser Sequences When did NP V INF NP|PP ? In the TREC 2005 question set this particular sequence matched 5 questions: “When did Floyd Patterson win the title?” “When did Amtrak begin operations?” “When did Jack Welch become chairman of General Electric?” “When did Jack Welch retire from GE?” “When did the Khmer Rouge come into power?” question start word: When word: did phrase: NP POS: V INF phrase: NP or PP punctuation: ? question end

QuALiM – Michael Kaisser Targets in NP in NP, If a question matched a sequence, the targets are used to propose templates for potential answer sentences. For the question “When did Amtrak begin operations”, these would be: ”Amtrak began operations in ANSWER[NP]” ”In ANSWER[NP] (,) Amtrak began operations”

QuALiM – Michael Kaisser Targets answer sentence start Amtrak began operations in answer (NP) answer sentence end answer sentence start In answer (NP) (,) Amtrak began operations answer sentence end in NP in NP, 3 4 5

QuALiM – Michael Kaisser Targets in NP in NP, The information from the targets can be used to create Google queries: ”Amtrak began operations in” ”In” “Amtrak began operations”

QuALiM – Michael Kaisser Snippet Retrieval For the first query ”Amtrak began operations in” the first five sentences Google returns are: “Since Amtrak began operations in 1971, federal outlays for intercity rail passenger service have been about \$18 billion.” “Amtrak began operations in 1971.” “Amtrak of the obligation to operate the basic system of routes that was largely inherited from the private railroads when Amtrak began operations in 1971.” “Amtrak began operations in 1971, as authorized by the Rail Passenger Service Act of 1970.'‘ “A comprehensive history of intercity passenger service in Indiana, from the mid-19th century through May 1, 1971, when Amtrak began operations in the state.”

QuALiM – Michael Kaisser Answer Extraction The sentences are parsed and tagged, and by matching then to the targets once more the exact position of the potential answer can be located: “Since Amtrak began operations in 1971, federal outlays for intercity rail passenger service have been about \$18 billion.” “Amtrak began operations in 1971.” “Amtrak of the obligation to operate the basic system of routes that was largely inherited from the private railroads when Amtrak began operations in 1971.” “Amtrak began operations in 1971, as authorized by the Rail Passenger Service Act of 1970.'‘ “A comprehensive history of intercity passenger service in Indiana, from the mid-19th century through May 1, 1971, when Amtrak began operations in the state.”

QuALiM – Michael Kaisser QuALiM – Type Checking dateComplete date year|in_year The answerType element in the pattern tells us that we are looking for a date. We’d like to have: a complete date in standard form, e.g. “May 1st, 1971” some form of a date, e.g. “5/1/1971” If we cannot have that, a year specification will also do. (E.g. “1971”)

QuALiM – Michael Kaisser QuALiM – Type Checking dateComplete date year|in_year An answerType may contain the following elements: NamedEntity WordNetCategory Built-in (date, year, percentage ect.) Measure (“15 meters”, “100 mph”) List (e.g. a list of movies) WebHypernym other

QuALiM – Michael Kaisser Excursus: WordNet

QuALiM – Michael Kaisser Excursus: WordNet

QuALiM – Michael Kaisser Excursus: WordNet

QuALiM – Michael Kaisser Excursus: Named Entity Recognition The task: identify atomic elements of information in text person names company/organization names locations dates&times percentages monetary amounts

QuALiM – Michael Kaisser Excursus: Named Entity Recognition Task of a NE System: Delimit the named entities in a text and tag them with NE categores: Italy ‘s business world was rocked by the announcement last Thursday that Mr. Verdi would leave his job as vice-president of Music Masters of Milan, Inc to become operations director of Arthur Andersen. „Milan“ is part of organization name „Arthur Andersen“ is a company „Italy“ is sentence-initial => capitalization useless

QuALiM – Michael Kaisser Excursus: Named Entity Recognition Task of a NE System: Delimit the named entities in a text and tag them with NE categores: „Milan“ is part of organization name „Arthur Andersen“ is a company „Italy“ is sentence-initial => capitalization useless Italy‘s business world was rocked by last Thursday that Mr.Verdi would leave his job as vice-president of Music Masters of Milan, Inc to become operations director of Arthur Andersen.

QuALiM – Michael Kaisser Excursus: Named Entity Recognition How does it work? Basically quite simple: The system accesses huge lists of: First names Last names Cities Countries... And knows about special words/abbreviations like Mr., Dr., Prof., Inc., Blvd. etc. It knows the names of weekdays or months etc.

QuALiM – Michael Kaisser Excursus: Named Entity Recognition Some system use hand-written context-sensitive reduction rules: 1)title capitalized word => title person_name compare „Mr. Jones“ vs. „Mr. Ten-Percent“ => no rule without exceptions 2) person_name, „the“ adj* „CEO of“ organization „Fred Smith, the young dynamic CEO of BlubbCo“ => ability to grasp non-local patterns plus help from databases of known named entities

QuALiM – Michael Kaisser QuALiM – Type Checking dateComplete date year|in_year An answerType may contain the following elements: NamedEntity WordNetCategory Built-in (date, year, percentage ect.) Measure (“15 meters”, “100 mph”) List (e.g. a list of movies) WebHypernym other

QuALiM – Michael Kaisser QuALiM – Type Checking When the answers are checked on their correct semantic type the first four sentences pass the test, the last one is ruled out: “Since Amtrak began operations in 1971, federal outlays for intercity rail passenger service have been about \$18 billion.” “Amtrak began operations in 1971.” “Amtrak of the obligation to operate the basic system of routes that was largely inherited from the private railroads when Amtrak began operations in 1971.” “Amtrak began operations in 1971, as authorized by the Rail Passenger Service Act of 1970.'‘ “A comprehensive history of intercity passenger service in Indiana, from the mid-19th century through May 1, 1971, when Amtrak began operations in the state.”

QuALiM – Michael Kaisser TREC 2004 Results and Post-TREC Evaluation

QuALiM – Michael Kaisser TREC Results – factoid questions

QuALiM – Michael Kaisser TREC Results – combined score

QuALiM – Michael Kaisser Post TREC Evaluation Purpose: What is the performance and behavior of the different algorithms implemented? Performed with resolved questions. (“When was Franz Kafka born?” instead of “When was he born?”) No document localization, thus: –no NIL answers returned –no “unsupported” judgments

QuALiM – Michael Kaisser Post TREC Evaluation

QuALiM – Michael Kaisser