IBM’s DeepQA, or Watson. Little history Carnegie Mellon (CMU) collab. OpenEphyra (2002) Piquant (2004) Initially 15% accuracy 15% is not very good, is.

Slides:

Advertisements

Similar presentations

Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.

Advertisements

Supervised Learning Recap

Tuomas Sandholm Carnegie Mellon University Computer Science Department

Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수

Leveraging Community-built Knowledge For Type Coercion In Question Answering Aditya Kalyanpur, J William Murdock, James Fan and Chris Welty Mehdi AllahyariSpring.

Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Decision Making in IBM Watson™ Question Answering Dr. J

Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

The Implicit Mapping into Feature Space. In order to learn non-linear relations with a linear machine, we need to select a set of nonlinear features.

© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.

CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.

Radial Basis Function Networks

CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

Automatically Identifying Localizable Queries Center for E-Business Technology Seoul National University Seoul, Korea Nam, Kwang-hyun Intelligent Database.

Software School of Hunan University Database Systems Design Part III Section 5 Design Methodology.

An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.

Chapter 9 Neural Network.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering April 4, 2011 Marco Valtorta How Does Watson Work?

Question Answering From Zero to Hero Elena Eneva 11 Oct 2001 Advanced IR Seminar.

Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.

Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

Transfer Learning Motivation and Types Functional Transfer Learning Representational Transfer Learning References.

BOOSTING David Kauchak CS451 – Fall Admin Final project.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Ensemble Methods: Bagging and Boosting

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

EXPLOITING DYNAMIC VALIDATION FOR DOCUMENT LAYOUT CLASSIFICATION DURING METADATA EXTRACTION Kurt Maly Steven Zeil Mohammad Zubair WWW/Internet 2007 Vila.

Making Watson Fast Daniel Brown HON111. Need for Watson to be fast to play Jeopardy successfully – All computations have to be done in a few seconds –

Insight: Steal from Existing Supervised Learning Methods! Training = {X,Y} Error = target output – actual output.

© Devi Parikh 2008 Devi Parikh and Tsuhan Chen Carnegie Mellon University April 3, ICASSP 2008 Bringing Diverse Classifiers to Common Grounds: dtransform.

AQUAINT IBM PIQUANT ARDACYCORP Subcontractor: IBM Question Answering Update piQuAnt ARDA/AQUAINT December 2002 Workshop This work was supported in part.

Classification Ensemble Methods 1

Data Mining and Decision Support

Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.

AQUAINT AQUAINT Evaluation Overview Ellen M. Voorhees.

NTU & MSRA Ming-Feng Tsai

… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.

Improving QA Accuracy by Question Inversion John Prager, Pablo Duboue, Jennifer Chu-Carroll Presentation by Sam Cunningham and Martin Wintz.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.

FNA/Spring CENG 562 – Machine Learning. FNA/Spring Contact information Instructor: Dr. Ferda N. Alpaslan

Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.

Aakarsh Malhotra ( ) Gandharv Kapoor( )

Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.

An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)

Machine Learning with Spark MLlib

Deep Feedforward Networks

An Artificial Intelligence Approach to Precision Oncology

CSCE 190 November 15, 2016 Marco Valtorta

Semi-Structured Reasoning for Answering Science Questions

Machine Learning Basics

Cost-Sensitive Learning

CSCE 190 November 17, 2015 Marco Valtorta

Combining Base Learners

Cost-Sensitive Learning

CSCE 390 Professional Issues in Computer Science and Engineering

CSCE 190 September 25, 2017 Marco Valtorta

Overfitting and Underfitting

Somi Jacob and Christian Bach

Introduction to Artificial Intelligence

Modeling IDS using hybrid intelligent systems

Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.

What is Artificial Intelligence?

Presentation transcript:

IBM’s DeepQA, or Watson

Little history Carnegie Mellon (CMU) collab. OpenEphyra (2002) Piquant (2004) Initially 15% accuracy 15% is not very good, is it?

OpenEphyra, Piquant, & Jeopardy! Source: [1]

Principles Massive parallelism: Exploit massive parallelism in the consideration of multiple interpretations and hypotheses. Many experts: Facilitate the integration, application, and contextual evaluation of a wide range of loosely coupled probabilistic question and content analytics. Pervasive confidence estimation: No component commits to an answer; all components produce features and associated confidences, scoring different question and content interpretations. An underlying confidence-processing substrate learns how to stack and combine the scores. Integrate shallow and deep knowledge: Balance the use of strict semantics and shallow semantics, leveraging many loosely formed ontologies.

Source: [4] Randall Munroe (CC BY-NC 2.5)

20 researchers, 3 years later (2008) Source: [1]

What’s Watson’s source of information? Structured content databases, taxonomies, ontologies Domain data encyclopedias, dictionaries, thesauri, newswire articles, literary works Machine learning? Test question training sets

Learning framework Trained with a set of approximately 25,000 Jeopardy! questions comprising 5.7 million question-answer pairs (instances) where each instance had 550 features. Implemented machine learning techniques such as: transfer learning, stacking, and successive refinement.

Learning framework is based on phases Configurable Uses 7 phases for Jeopardy Trained with a set of approximately 25,000 Jeopardy! questions comprising 5.7 million question-answer pairs (instances) where each instance had 550 features. Implemented machine learning techniques such as: transfer learning, stacking, and successive refinement.

Phases 1. Hitlist normalization 2. Base 3. Transfer learning 4. Merge evidence 5. Elite 6. Evidence Diffusion 7. Multi-answers –Within phases there are 3 main steps: 1. Evidence Merging 2. Postprocessing 3. Classifier: Training/Application?

1. Hitlist Normalization: Merge identical strings from different sources. Partition into question classes. Different classes of questions such as multiple choice, useless LAT: eg. “it” “this”, date questions, and so forth may require different weighing of evidence. The DeepQA confidence estimation framework supports this through the concept of routes. In the Jeopardy! system, profitable question classes for specialized routing were manually identified. Routes are archetypes.

2. Base: Weed out extremely bad candidates; the top 100 candidates after hitlist normalization are passed to later phases. With at most 100 answers per question, the standardized features are recomputed. The recomputation of the standardized features at the start of the base phase is the primary reason that the Hitlist Normalization phase exists: By eliminating a large volume of junk answers (ie. ones that were not remotely close to being considered), the remaining answers provide a more useful distribution of feature values to compare each answer to.

3. Transfer learning: For uncommon question classes ie. Adding more routed models The phase-based framework supports a straightforward parameter-transfer approach to transfer learning by passing one phase’s output of a general model into the next phase as a feature into a specialized model. Logistic regression uses a linear combination of weights, the weights that are learned in the transfer phase can be roughly interpreted as an update to the parameters learned from the general task.

Logistic regression Research group experimented with: Logistic regression Support vector machines (SVMs) with linear and nonlinear kernels, Single and multilayer neural nets Boosting Decision trees Locally weighted learning Etc. Logistic regression found to be the best method for classifying / gauging weights. Used in all phases / steps.

4. Evidence Merging(=ANSWERS): Between equivalent answers Selecting a canonical form. E.g.: John F. Kennedy J.F.K. Kennedy. Need robust methods! Neuro-linguistic programming (NLP)

Can merge answers that are connected by a relation other than equivalence. It merges answers when it detects a more_specific relation between them. “MYTHING IN ACTION: One legend says this was given by the Lady of the Lake & thrown back in the lake on King Arthur’s death.” Watson merged the two answers ”sword”, ”Excalibur” and selected ”sword” as canonical form because it had higher initial points.

5. Elite: Near the end of the learning pipeline trains and applies to only the top five answers as ranked by the previous phase. Similar to phase 2. Base.

6. Evidence Diffusion: Diffusing evidence between related answers. Diffusion criteria. Similar to the Answer Merging phase but combines evidence from related answers, not equivalent ones

WORLD TRAVEL: If you want to visit this country, you can fly into Sunan International Airport or... or not visit this country. (Correct answer: North Korea) Most sources would cite Pyongjang as the location of the airport, overwhelming the answer North Korea. Evidence may be diffused in this phase from source (North Korea) to target (Pyongjang) 1. Has to meet expected target type (is a country) 2. There is a semantic relation (located_in) 3. The transitivity of the relation allows for meaningful diffusion given the question.

7. Multi-answers: Join answer candidates for questions requiring multiple answers.

3 Steps… 1. Evidence Merging: combines evidence for a given question-answer pair across different occurrences (e.g., different passages containing a given answer). 2. Postprocessing: transforms the matrix of question- answer pairs and their feature values (e.g., removing answers and/or features, deriving new features from existing features). Sensitivity and dynamic range. Relativity! 3. Classifier Training/Application: runs in either training mode, in which a model is produced over training data, or application mode, where the previously trained models are used to rank and estimate confidence in answers for given question.

3. Application classifier After merging, time to rank answer confidence based on merged scores. Watson uses machine learning to assign a confidence level to each of the merged answers, on how likely they are correct. Ensemble methods: Mixture of experts Stacked generalisation metalearner

This is just the learning framework.

Sources [1] ASSOCIATION FOR THE ADVANCEMENT OF ARTIFICIAL INTELLIGENCE Building Watson: An Overview of the DeepQA Project Published in AI Magazine Fall, Copyright ©2010 AAAI. All rights reserved. Written by David Ferrucci, Eric Brown, Jennifer Chu-Carroll, James Fan, David Gondek, Aditya A. Kalyanpur, Adam Lally, J. William Murdock, Eric Nyberg, John Prager, Nico Schlaefer, and Chris Welty [2] merging%20and%20ranking%20answers.pdf merging%20and%20ranking%20answers.pdf A framework for merging and ranking of answers in DeepQA. D. C. Gondek A. Lally A. Kalyanpur J. W. Murdock P. A. Duboue L. Zhang Y. Pan Z. M. Qiu C. Welty [3] thinks-on-jeopardy/ thinks-on-jeopardy/ Blog, Free Won’t [4]

Questions? Further reading: linguistic_programming linguistic_programming