CSC 9010 Spring 2011. Paula Matuszek A Brief Overview of Watson.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Watson and the Jeopardy! Challenge Michael Sanchez
Chapter 5: Introduction to Information Retrieval
Beyond TREC-QA Ling573 NLP Systems and Applications May 28, 2013.
WATSON By Pradeep Gopinathan, Vera Kutsenko and Joseph Staehle.
UIMA David Gondek Knowledge Capture and Learning DeepQA IBM Research.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Introduction to Text Mining
Leveraging Community-built Knowledge For Type Coercion In Question Answering Aditya Kalyanpur, J William Murdock, James Fan and Chris Welty Mehdi AllahyariSpring.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Information Retrieval in Practice
Search Engines and Information Retrieval
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
Showcasing work by IBM on IBM’s Watson and Jeopardy!
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Employing Two Question Answering Systems in TREC 2005 Harabagiu, Moldovan, et al 2005 Language Computer Corporation.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Overview of Search Engines
Information Retrieval in Practice
Analysing Crime-Scene Reports Katerina Pastra and Horacio Saggion University of Sheffield Scene of Crime Information System.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Search Engines and Information Retrieval Chapter 1.
CSC 9010 Spring, Paula Matuszek, Lillian Cassel 1 CS 9010: Semantic Web Possible Topics for Discussion Paula Matuszek Spring, 2006.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering April 4, 2011 Marco Valtorta How Does Watson Work?
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.
Question-Answering: Overview Ling573 Systems & Applications April 4, 2013.
CTAKES The clinical Text Analysis and Knowledge Extraction System.
Recommending Twitter Users to Follow Using Content and Collaborative Filtering Approaches John HannonJohn Hannon, Mike Bennett, Barry SmythBarry Smyth.
Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Facilitating Document Annotation using Content and Querying Value.
Making Watson Fast Daniel Brown HON111. Need for Watson to be fast to play Jeopardy successfully – All computations have to be done in a few seconds –
Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.
©2003 Paula Matuszek CSC 9010: Text Mining Applications Dr. Paula Matuszek (610)
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second.
AQUAINT IBM PIQUANT ARDACYCORP Subcontractor: IBM Question Answering Update piQuAnt ARDA/AQUAINT December 2002 Workshop This work was supported in part.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
Introduction to Text Mining Hongning Wang
Open Health Natural Language Processing Consortium
2016/3/11 Exploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge Xia Hu, Nan Sun, Chao Zhang, Tat-Seng Chu.
1 CS 8803 AIAD (Spring 2008) Project Group#22 Ajay Choudhari, Avik Sinharoy, Min Zhang, Mohit Jain Smart Seek.
Facilitating Document Annotation Using Content and Querying Value.
Consumer Health Question Answering Systems Rohit Chandra Sourabh Singh
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
©2003 Paula Matuszek CSC 9010: AeroText, Ontologies, AeroDAML Dr. Paula Matuszek (610)
Aakarsh Malhotra ( ) Gandharv Kapoor( )
Information Retrieval in Practice
Search Engine Architecture
Cognitive Computing for Democratizing Domain-Specific Knowledge.
Reading Report on Hybrid Question Answering System
Web IR: Recent Trends; Future of Web Search
Question Answering & Linked Data
Introduction to Artificial Intelligence
Presentation transcript:

CSC 9010 Spring Paula Matuszek A Brief Overview of Watson

2 CSC 9010 Spring Paula Matuszek Watson QA system developed by IBM and collaborators “massively parallel probabilistic evidence-based architecture” Hardware is a high-end IBM system, the IBM Power7 platform. –10 Power7 server blades –90 servers –4 processors/server –8 cores/processor Robotic arm to press the buzzer. Input is text only, no speech recognition, no visual.

3 CSC 9010 Spring Paula Matuszek Watson Software is built on top of UIMA: unstructured information management application. UIMA is a framework build by IBM and since open-sourced. The information corpus was downloaded and indexed offline; no web access during the game. Corpus was developed from a large variety of text sources: –baseline from wikipedia, Project Gutenberg, newspaper articles, thesauri, etc. –extend with web retrieval, extract potentially relevant text “nuggets”, score for informative, merge best into corpus Primary corpus is unstructured text, not semantically tagged or formal knowledge base. About 2% of Jeopardy! answers can be looked up directly. Also leverages semistructured and structured sources such as Wordnet and Yago.

4 CSC 9010 Spring Paula Matuszek Components of DeepQA About 100 different techniques overall. Content acquisition: corpus, sample games. Offline, before game itself. Preprocessing Natural Language Tools Retrieve possible answers Score answers Buzz in Game strategies

5 CSC 9010 Spring Paula Matuszek Preprocessing Determine question category –factoid –decomposable –puzzle –Note: excluded questions with AV components and “special instruction” categories Determine lexical answer type (LAT) –film? person? place? novel? song? –about 2500 in sample of 20,000 questions. About 12% of clues do not indicate type

6 CSC 9010 Spring Paula Matuszek Initial Natural Language Processing Parse question Semantically tag the components of the question Reference or coreference resolution Named entity recognition Relation detection Decomposition into subqueries

7 CSC 9010 Spring Paula Matuszek Retrieve Relevant Text Component most similar to a web search Focus is on recall Search engines include Indri, Lucene, SPARQL For some “Closed” LATs (All US States, presidents, etc) can generate candidate list directly Otherwise extract actual answer –title? –person? etc Several hundred hypotheses typically generated

8 CSC 9010 Spring Paula Matuszek Score Hypotheses Evaluate candidate answers –soft filtering. Fast light-weight filters prune answers to about 100 –evidence retrieval. Additional structured or unstructured queries Score answers –LOTS of algorithms! -- More than 50 components –Range from simple word counts to complex spatial and temporal reasoning –Creates an evidence profile: taxonomic, geospatial, temporal, source reliability, etc Merge answers Determine ranking and confidence estimation

9 CSC 9010 Spring Paula Matuszek And Game Strategies! Picking a category –tries to find the daily double –goes for lower cost categories to help learn the category When to bet? –Normally will buzz in if >50% certain –Will buzz in lower if only way to win –Will not buzz in if can’t lose except with a mistake How much to bet?

10 CSC 9010 Spring Paula Matuszek References A clip of the end of the Jeopardy! game 0yE 0yE A good high-level overview. theswimmingsubmarine.blogspot.com/2011/02/ how-ibms-deep-question-answering.htmlA good high-level overview. theswimmingsubmarine.blogspot.com/2011/02/ how-ibms-deep-question-answering.html A detailed description: DeepQA.pdf DeepQA.pdf Many clips, blogs, and links: ibmwatson.com