1 Question-Answering via the Web: the AskMSR System Note: these viewgraphs were originally developed by Professor Nick Kushmerick, University College Dublin,

Slides:



Advertisements
Similar presentations
Beyond Boolean Queries Ranked retrieval  Thus far, our queries have all been Boolean.  Documents either match or don’t.  Good for expert users with.
Advertisements

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
The Google Similarity Distance  We’ve been talking about Natural Language parsing  Understanding the meaning in a sentence requires knowing relationships.
Group Members: Satadru Biswas ( ) Tanmay Khirwadkar ( ) Arun Karthikeyan Karra (05d05020) CS Course Seminar Group-2 Question Answering.
Information Retrieval IR 7. Recap of the last lecture Vector space scoring Efficiency considerations Nearest neighbors and approximations.
Question-Answering: Shallow & Deep Techniques for NLP Ling571 Deep Processing Techniques for NLP March 9, 2011 Examples from Dan Jurafsky)
Search Engines and Information Retrieval
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
1 Computer Processing of Natural Language Prof. Hearst i141 November 26, 2008.
Distributed Search over the Hidden Web Hierarchical Database Sampling and Selection Panagiotis G. Ipeirotis Luis Gravano Computer Science Department Columbia.
The College of Saint Rose CIS 460 – Search and Information Retrieval David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice, 1st.
Searching on the WWW The Google Phenomena Snyder p
Information Retrieval and Question-Answering
Information Retrieval in Practice
Web-based Factoid Question Answering (including a sketch of Information Retrieval) Slides adapted from Dan Jurafsky, Jim Martin and Ed Hovy.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
Natural Language Query Interface Mostafa Karkache & Bryce Wenninger.
An Analysis of the AskMSR Question-Answering System Eric Brill, Susan Dumais, and Michelle Banko Microsoft Research.
Lecture 14: Text Mining (Information Extraction): Query Answering (slides from Manning) Wen-Hsiang Lu ( 盧文祥 ) Department of Computer Science and Information.
Chapter 5: Information Retrieval and Web Search
Web Searching. Web Search Engine A web search engine is designed to search for information on the World Wide Web and FTP servers The search results are.
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
Information Retrieval in Practice
Question Answering Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Adapted from slides by Dan Jurafsky (Stanford)
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Search Engines and Information Retrieval Chapter 1.
A Technical Seminar on Question Answering SHRI RAMDEOBABA COLLEGE OF ENGINEERING & MANAGEMENT Presented By: Rohini Kamdi Guided By: Dr. A.J.Agrawal.
Citation Recommendation 1 Web Technology Laboratory Ferdowsi University of Mashhad.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
TextMap: An Intelligent Question- Answering Assistant Project Members:Abdessamad Echihabi Ulf Hermjakob Eduard Hovy Soo-Min Kim Kevin Knight Daniel Marcu.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Efficient Minimal Perfect Hash Language Models David Guthrie, Mark Hepple, Wei Liu University of Sheffield.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Question Answering From Zero to Hero Elena Eneva 11 Oct 2001 Advanced IR Seminar.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
A Language Independent Method for Question Classification COLING 2004.
Chapter 6: Information Retrieval and Web Search
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Presenter: Shanshan Lu 03/04/2010
From Social Bookmarking to Social Summarization: An Experiment in Community-Based Summary Generation Oisin Boydell, Barry Smyth Adaptive Information Cluster,
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Question Answering over Implicitly Structured Web Content
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs, Cambridge Filip Radlinski, Microsoft In Proceedings of WSDM
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
Evaluation of (Search) Results How do we know if our results are any good? Evaluating a search engine  Benchmarks  Precision and recall Results summaries:
Oxygen Indexing Relations from Natural Language Jimmy Lin, Boris Katz, Sue Felshin Oxygen Workshop, January, 2002.
Querying Web Data – The WebQA Approach Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan.
LING 573 Deliverable 3 Jonggun Park Haotian He Maria Antoniak Ron Lockwood.
Information Retrieval and Web Search Question Answering Instructor: Rada Mihalcea.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Information Retrieval Techniques MS(CS) Lecture 7 AIR UNIVERSITY MULTAN CAMPUS Most of the slides adapted from IIR book.
1 Answering English Questions by Computer Jim Martin University of Colorado Computer Science.
Text Retrieval and Spreadsheets Session 4 LBSC 690 Information Technology.
Shallow & Deep QA Systems Ling 573 NLP Systems and Applications April 9, 2013.
SEMANTIC VERIFICATION IN AN ONLINE FACT SEEKING ENVIRONMENT DMITRI ROUSSINOV, OZGUR TURETKEN Speaker: Li, HueiJyun Advisor: Koh, JiaLing Date: 2008/5/1.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Automatic Question Answering Beyond the Factoid Radu Soricut Information Sciences Institute University of Southern California Eric Brill Microsoft Research.
1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.
Panagiotis G. Ipeirotis Luis Gravano
CS246: Information Retrieval
Information Retrieval and Web Design
Presentation transcript:

1 Question-Answering via the Web: the AskMSR System Note: these viewgraphs were originally developed by Professor Nick Kushmerick, University College Dublin, Ireland. These copies are intended only for use for review in ICS 278.

2 Question-Answering Users want answers, not documents Databases Information Retrieval Information Extraction Question Answering Intelligent Personal Electronic Librarian Active research over the past few years, coordinated by US government “TREC” competitions Recent intense interest from security services (“What is Bin Laden’s bank account number?”)

3 Question-Answering on the Web Web = a potentially enormous “data set” for data mining –e.g., >8 billion Web pages indexed by Google Example: AskMSR Web question answering system –“answer mining” Users pose relatively simple questions –E.g., “who killed Abraham Lincoln”? Simple parsing used to reformulate as a “template answer” Search engine results used to find answers (redundancy helps) System is surprisingly accurate (on simple questions) Key contributor to system success is massive data (rather than better algorithms) –References: Dumais et al, 2002: Web question answering: is more always better? In Proceedings of SIGIR'02

4 AskMSR Web Question Answering: Is More Always Better? –Dumas, Bank, Brill, Lin, Ng (Microsoft, MIT, Berkeley) Q: “Where is the Louvre located?” Want “Paris” or “France” or “75058 Paris Cedex 01” or a map Don’t just want URLs Lecture 5 Adapted from: COMP-4016 ~ Computer Science Department ~ University College Dublin ~ ~ © Nicholas Kushmerick 2002

5 “Traditional” approach (Straw man?) Traditional deep natural-language processing approach –Full parse of documents and question –Rich knowledge of vocabulary, cause/effect, common sense, enables sophisticated semantic analysis E.g., in principle this answers the “who killed Lincoln?” question: The non-Canadian, non-Mexican president of a North American country whose initials are AL and who was killed by John Wilkes booth died ten revolutions of the earth around the sun after 1855.

6 AskMSR: Shallow approach Just ignore those documents, and look for ones like this instead:

7 AskMSR: Details

8 Step 1: Rewrite queries Intuition: The user’s question is often syntactically quite close to sentences that contain the answer –Where is the Louvre Museum located? –The Louvre Museum is located in Paris –Who created the character of Scrooge? –Charles Dickens created the character of Scrooge.

9 Query rewriting Classify question into seven categories –Who is/was/are/were…? –When is/did/will/are/were …? –Where is/are/were …? a. Category-specific transformation rules eg “For Where questions, move ‘is’ to all possible locations” “Where is the Louvre Museum located”  “is the Louvre Museum located”  “the is Louvre Museum located”  “the Louvre is Museum located”  “the Louvre Museum is located”  “the Louvre Museum located is” (Paper does not give full details!) b. Expected answer “Datatype” (eg, Date, Person, Location, …) When was the French Revolution?  DATE Hand-crafted classification/rewrite/datatype rules (Could they be automatically learned?) Nonsense, but who cares? It’s only a few more queries to Google.

10 Query Rewriting - weights One wrinkle: Some query rewrites are more reliable than others +“the Louvre Museum is located” Where is the Louvre Museum located? Weight 5 if we get a match, it’s probably right +Louvre +Museum +located Weight 1 Lots of non-answers could come back too

11 Step 2: Query search engine Throw all rewrites to a Web-wide search engine Retrieve top N answers (100?) For speed, rely just on search engine’s “snippets”, not the full text of the actual document

12 Step 3: Mining N-Grams Unigram, bigram, trigram, … N-gram: list of N adjacent terms in a sequence Eg, “Web Question Answering: Is More Always Better” –Unigrams: Web, Question, Answering, Is, More, Always, Better –Bigrams: Web Question, Question Answering, Answering Is, Is More, More Always, Always Better –Trigrams: Web Question Answering, Question Answering Is, Answering Is More, Is More Always, More Always Betters

13 Mining N-Grams Simple: Enumerate all N-grams (N=1,2,3 say) in all retrieved snippets Use hash table and other fancy footwork to make this efficient Weight of an n-gram: occurrence count, each weighted by “reliability” (weight) of rewrite that fetched the document Example: “Who created the character of Scrooge?” –Dickens –Christmas Carol - 78 –Charles Dickens - 75 –Disney - 72 –Carl Banks - 54 –A Christmas - 41 –Christmas Carol - 45 –Uncle - 31

14 Step 4: Filtering N-Grams Each question type is associated with one or more “data-type filters” = regular expression When… Where… What … Who … Boost score of n-grams that do match regexp Lower score of n-grams that don’t match regexp Details omitted from paper…. Date Location Person

15 Step 5: Tiling the Answers Dickens Charles Dickens Mr Charles Scores merged, discard old n-grams Mr Charles Dickens Score 45 N-Grams tile highest-scoring n-gram N-Grams Repeat, until no more overlap

16 Experiments Used the TREC-9 standard query data set Standard performance metric: MRR –Systems give “top 5 answers” –Score = 1/R, where R is rank of first right answer –1: 1; 2: 0.5; 3: 0.33; 4: 0.25; 5: 0.2; 6+: 0

17 Results [summary] Standard TREC contest test-bed: ~1M documents; 900 questions E.g., “who is president of Bolivia” E.g., “what is the exchange rate between England and the US” Technique doesn’t do too well (though would have placed in top 9 of ~30 participants!) –MRR = (ie, right answered ranked about #4-#5) –Why? Because it relies on the enormity of the Web! Using the Web as a whole, not just TREC’s 1M documents… MRR = 0.42 (ie, on average, right answer is ranked about #2-#3)

18 Example Question: what is the longest word in the English language? –Answer = pneumonoultramicroscopicsilicovolcanokoniosis (!) –Answered returned by AskMSR: 1: “1909 letters long” 2: the correct answer above 3: “screeched” (longest 1-syllable word in English)

19 Open Issues In many scenarios (eg, monitoring Bin Laden’s ) we only have a small set of documents! Works best/only for “Trivial Pursuit”-style fact- based questions Limited/brittle repertoire of –question categories –answer data types/filters –query rewriting rules