Decision Making in IBM Watson™ Question Answering Dr. J

Slides:



Advertisements
Similar presentations
The Social Scientific Method An Introduction to Social Science Research Methodology.
Advertisements

Watson and the Jeopardy! Challenge Michael Sanchez
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Algorithms + L. Grewe.
Class Project Due at end of finals week Essentially anything you want, so long as it’s AI related and I approve Any programming language you want In pairs.
CSE 5522: Survey of Artificial Intelligence II: Advanced Techniques Instructor: Alan Ritter TA: Fan Yang.
Intelligent systems Lecture 6 Rules, Semantic nets.
Robert J. Mislevy & Min Liu University of Maryland Geneva Haertel SRI International Robert J. Mislevy & Min Liu University of Maryland Geneva Haertel SRI.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
IBM’s DeepQA, or Watson. Little history Carnegie Mellon (CMU) collab. OpenEphyra (2002) Piquant (2004) Initially 15% accuracy 15% is not very good, is.
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 4: Modeling Decision Processes Decision Support Systems in the.
Simulation.
Machine Learning CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning.
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
WRITING the Research Problem.
Database Design Overview. 2 Database DBMS File Record Field Cardinality Keys Index Pointer Referential Integrity Normalization Data Definition Language.
POLICY ANALYSIS WHAT IS IT? HOW DO WE DO IT? WHY IS IT IMPORTANT?
1 Functional Testing Motivation Example Basic Methods Timing: 30 minutes.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Standard Error of the Mean
RESEARCH DESIGN.
Learning at Low False Positive Rate Scott Wen-tau Yih Joshua Goodman Learning for Messaging and Adversarial Problems Microsoft Research Geoff Hulten Microsoft.
Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.
How do you know?: Interpreting and Analyzing Data NCLC 203 New Century College, George Mason University April 6, 2010.
Chapter 3 Exploring Careers
1 9/8/2015 MATH 224 – Discrete Mathematics Basic finite probability is given by the formula, where |E| is the number of events and |S| is the total number.
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 2 Adapted from slides of Yoonsuck.
IBM’s Watson. IBM’s Watson represents an innovation in Data Analysis Computing called Deep QA (Question Answering) Their project is a hybrid technology.
CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.
Copyright © 2002 by The McGraw-Hill Companies, Inc. Information Technology & Management 2 nd Edition, Thompson Cats-Baril Chapter 8 I/S and Organizational.
1 9/23/2015 MATH 224 – Discrete Mathematics Basic finite probability is given by the formula, where |E| is the number of events and |S| is the total number.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering April 4, 2011 Marco Valtorta How Does Watson Work?
Process Improvement with Solitaire Using the PC Solitaire game to learn basic (and advanced) techniques of Process Improvement (So easy, even a can do.
Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.
The World Wide Web is a great place to find more information about a topic. But there are a lot of sites out there—some are good and some are not so good.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 4, 2013.
Chapter 7 File I/O 1. File, Record & Field 2 The file is just a chunk of disk space set aside for data and given a name. The computer has no idea what.
CPS 270: Artificial Intelligence Machine learning Instructor: Vincent Conitzer.
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
Promotion of e-Commerce sites. A business which uses e- commerce to trade online must also advertise. Several traditional methods can be used, such as.
CSC Intro. to Computing Lecture 22: Artificial Intelligence.
© Mark E. Damon - All Rights Reserved Another Presentation © All rights Reserved
The World Wide Web is a great place to find more information about a topic. But there are a lot of sites out there—some are good and some are not so good.
STEP 4 Manage Delivery. Role of Project Manager At this stage, you as a project manager should clearly understand why you are doing this project. Also.
Making Watson Fast Daniel Brown HON111. Need for Watson to be fast to play Jeopardy successfully – All computations have to be done in a few seconds –
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: One-way ANOVA Marshall University Genomics Core.
Automatic Question Answering  Introduction  Factoid Based Question Answering.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
AQUAINT IBM PIQUANT ARDACYCORP Subcontractor: IBM Question Answering Update piQuAnt ARDA/AQUAINT December 2002 Workshop This work was supported in part.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
Copyright Paula Matuszek Kinds of Machine Learning.
CCOT Essay Change-over-Time Writing. What is the purpose?  Change over time is a major theme of historical study  This essay measure your ability to.
Improving QA Accuracy by Question Inversion John Prager, Pablo Duboue, Jennifer Chu-Carroll Presentation by Sam Cunningham and Martin Wintz.
Introduction to Machine Learning © Roni Rosenfeld,
More on Logic Today we look at the for loop and then put all of this together to look at some more complex forms of logic that a program will need The.
Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Artificial Intelligence, simulation and modelling.
Copyright © 2010 Pearson Education, Inc. Chapter 16 Random Variables.
Test Taking Skills. Multiple Choice Timing Plan for 30 sec. to 1 min. per question If you are stuck on a question, make a note of it then move on! Don’t.
Copyright © 2009 Pearson Education, Inc. Chapter 11 Understanding Randomness.
1 Fahiem Bacchus, University of Toronto CSC384: Intro to Artificial Intelligence Game Tree Search I ● Readings ■ Chapter 6.1, 6.2, 6.3, 6.6 ● Announcements:
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Aakarsh Malhotra ( ) Gandharv Kapoor( )
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
The Power of Using Artificial Intelligence
Introduction to Machine Learning
CHAPTER 4 Designing Studies
Introduction Artificial Intelligent.
Presentation transcript:

Decision Making in IBM Watson™ Question Answering Dr. J Decision Making in IBM Watson™ Question Answering Dr. J. William Murdock IBM Watson Research Center IBM, the IBM logo, ibm.com, Smarter Planet and the planet icon are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Jeopardy! Elements TM & © 2015 Jeopardy Productions, Inc. All Rights Reserved. A current list of IBM trademarks is available on the Web at www.ibm.com/legal/copytrade.shtml. © International Business Machines Corporation 2015. 1

IBM Watson IBM Watson is an automated question answering system. It competed against Jeopardy!’s two all-time greatest champions. This match appeared on television in February of 2011. Watson won the match, outscoring both opponents combined. More recent work on IBM Watson focuses on business applications such as medicine and customer service.

Decision Making in Question Answering Choosing answers to questions! IBM Watson generates many candidate answers For each answer, how confident are we that the answer is right? Deciding whether to answer Based on how confident we are that the answer is right Based on cost/benefit of right answers and wrong answers Deciding how many answers to provide Deciding whether to hedge

Jeopardy! Watson Statistical Machine Learning Creates instances & features Learned Models help combine and weigh the Evidence Evidence Sources Models Models Answer Sources Models Models Evidence Retrieval Deep Evidence Scoring Question Answer Scoring Models Models Primary Search Candidate Answer Generation Question & Topic Analysis Hypothesis Generation Hypothesis and Evidence Scoring Final Confidence Merging & Ranking Adds more features Adds more features Statistical & rule-based NLP Answer & Confidence 4

Watson Paths (e.g., Medical Diagnosis) (Lally et al., 2014) Scenario Basically the same QA system (different details) Basically the same Machine Learning Assertion Assertion Assertion Assertion Assertion Question Assertion Answer & Confidence Answer & Confidence Assertion Assertion 5

Watson Engagement Advisor Basically the same Statistical ML Known Questions and Answers Learned Models help combine and weigh the Evidence Primary Search Candidate Answer Generation Answer Scoring Deep Evidence Scoring Question Models Models Hypothesis Generation Hypothesis and Evidence Scoring Many of the same features Many of the same features Models Models Answer Docs Models Models Primary Search Candidate Answer Generation Answer Scoring Deep Evidence Scoring Question & Topic Analysis Resolve Feature Name Conflicts Hypothesis Generation Hypothesis and Evidence Scoring Final Confidence Merging & Ranking Off-Topic Classification Answer & Confidence 6

Final Confidence Merging & Ranking Start with: Answers in isolation, with feature values Answers embedded in evidence, with feature values Configured for each feature: What to do when it is missing (typically if F is missing, then set F=0 and set F-Missing=1) Standardization (see next slide) How to merge across instances? (Max? Min? Sum? Decaying sum?) Combine “equivalent” answers, merging feature values “frog”:f1=3,f2=4,f3=1 “order Anura”:f3=3 “frog”:“Frogs like ponds”:f4=0.8,f5=4.5 “frog”:“I saw a frog in a pond”:f4=0.6,f5=2.5 “frog”: f1=3,f2=4,f3=1,f4=1.1,f5=4.5 Run ML At training time, record features and class label (right or wrong) for training a model At run time, apply the model to the features to compute a confidence score

Combined Standardized Question 3 Standardized Standardization Question 1 Question 2 Question 3 F2 F2 F2 Standardization subtracts the mean for that question and divides by the standard deviation for that question F1 F1 F1 For each question, F1 and F2 are very useful for separating right answers from wrong answers. Combined Standardized Combined Across questions, it is much harder to use these features to separate right from wrong. Question 3 Standardized F2 F2 Subtracting the mean drags the points to the middle. F1 Dividing by the standard deviation spreads them out or squeezes them in. F1

Phases and Routes Merging in ranking occurs in multiple phases All answers to all questions are processed in all phases Phases are used for sequential learning operations: cases where the outputs of earlier learning should influence later learning. (Example on next slide) Within each phase are alternative routes Within a phase, all answers to a single question are processed by only one route Routes are generally used for special kinds of questions In Jeopardy! this includes puzzles, multiple-choice questions, etc. In business applications in can include factoid, definition, why, how-to, and many more.

Phases: Answer Merging (Made-Up) Example 1976: This Kansas legislator was Gerald Ford’s running mate. These two are definitely the same!!! These might be the same These might also be the same Merge these two first! Then rank the answers. Then merge the others. Bob Dole Robert Dole Senator Dole Dole Bananas from Kansas Ford’s running mate a legislator available in Kansas

(Gondek et al., 2011)

Learning to rank? Jeopardy! Watson (1.0) and all the other configurations described so far rank answers one at a time using logistic regression. Learning is instance based. One instance is a single answer to a single question. Should be possible for a system to do a better job ranking answers by looking at the answers as a group and learning to rank them instead of just scoring them one at a time in isolation. Learning to rank is very successful in other applications We tried learning to rank for Jeopardy! Watson and found it was not more effective than multi-phase logistic regression. We’re not really scoring them one at a time in isolation: standardization and multiple phases are effective ways to let answers influence each other. We design features and test them using logistic regression and discard the ones that are not effective. Result: features are well suited to logistic regression. However, the field of learning to rank has continued to progress. Stay tuned!!!

Decision Making in Question Answering Choosing answers to questions! IBM Watson generates many candidate answers For each answer, how confident are we that the answer is right? Deciding whether to answer Based on how confident we are that the answer is right Based on cost/benefit of right answers and wrong answers Deciding how many answers to provide Deciding whether to hedge

Deciding whether to answer In Jeopardy!, if you answer wrong, you lose money (hence the name). Very important to not answer when you don’t know the answer. A big part of what motivated us to pursue Jeopardy!: Systems that know what they don’t know. Lots of data in Jeopardy! to optimize win rate based on outcomes. Jeopardy! Watson (1.0) had extensive game playing capability for key Jeopardy! decisions like what category to select, how much to wager, and whether to try to answer a question (see upcoming slides) The core Watson question answering capability is responsible for selecting an answer AND deciding how likely the answer is to be correct. The Watson Jeopardy! playing application decides whether to answer using the confidence AND the game state (e.g., if Watson is way behind, it may take bigger risks). Similarly in business applications, core Watson QA ranks answers and assigns confidence scores; each application has its own logic for what to do with it. For example, applications with very casual users may want to avoid wasting the users time with answers with a low probability of correctness In contrast, applications used by highly motivated users may be more aggressive.

Stochastic Process Model of Jeopardy! Clues For Jeopardy! we had lots of data about human opponents behavior. This plus our own confidence in our own answer allows us to precisely model expected outcomes of answering or not answering. (Tesauro et al., 2012) 79.7% 5.9% 0.4% right right right wrong wrong wrong Initial buzz Rebound 1 Rebound 2 0.2% No buzz No buzz No buzz 8.6% 4.1% 1.1%

Stochastic Process Model in Business Applications? We could do something similar in many business applications, e.g.: right 7% Watson tries to answer Customer buys product Customer asks question about product 2% wrong Watson admits it does not know the answer 4% Customer does not buy product (customer is most likely to buy the product if Watson answers correctly but more likely to buy if Watson doesn’t answer than it if it answers wrong) However, typically you don’t really have data like that. Core Watson just provides answers + confidence, applications must decide. Mostly just have a pre-defined confidence threshold (subjective user experience).

Decision Making in Question Answering Choosing answers to questions! IBM Watson generates many candidate answers For each answer, how confident are we that the answer is right? Deciding whether to answer Based on how confident we are that the answer is right Based on cost/benefit of right answers and wrong answers Deciding how many answers to provide Deciding whether to hedge

Deciding how many answers + hedging In Jeopardy!, you can give only one answer. However, in real life you can show multiple answers to a user. Generally not as good as just providing the one correct answer However, sometimes IBM Watson has several answers that all have reasonably good scores and no one answer with a very good score. Show one answer when you are very confident of one answer, and show multiple answers you have several that all have similar confidences? A system can hedge an answer, by saying something like: “I don’t really know, but I think the answer might be…” before answering. It would do this for medium confidence answers: high enough to be worth showing a user but not high enough to Users may be more forgiving of wrong answers if they are hedged. Like deciding whether to answer, these are addressed at the application level. The core IBM Watson QA system provides many ranked answers with confidences, and the application decides what to do with that information.

References A Lally, S Bagchi, M A Barborak, D W Buchanan, J Chu-Carroll, D A Ferrucci, M R Glass, A Kalyanpur, E T Mueller, J W Murdock, S Patwardhan, J M Prager, C A Welty. WatsonPaths: Scenario-based Question Answering and Inference over Unstructured Information. Technical Report Research Report RC25489, IBM Research, 2014. D C Gondek, A Lally, A Kalyanpur, J W Murdock, P Duboue, L Zhang, Y Pan, Z M Qiu, C A Welty. A framework for merging and ranking of answers in DeepQA. IBM Journal of Research and Development 56(3/4), 14, 2012 J W Murdock, G Tesauro. Statistical Approaches to Question Answering in Watson. Mathematics Awareness Month theme essay, Joint Policy Board for Mathematics (JPBM), 2012 G Tesauro, D Gondek, J Lenchner, J Fan and J Prager. Simulation, Learning and Optimization Techniques in Watson’s Game Strategies. IBM Journal of Research and Development 56(3/4), 14, 2012. D Ferrucci, E Brown, J Chu-Carroll, J Fan, D Gondek, AA Kalyanpur, A Lally, J W Murdock, E Nyberg, J Prager, N Schlaefer, and C Welty. Building Watson: An overview of the DeepQA project. AI Magazine 31(3), 59-79, American Association for Artificial Intelligence, 2010.