CS 9633 Machine Learning Explanation Based Learning

Slides:

Advertisements

Similar presentations

Explanation-Based Learning (borrowed from mooney et al)

Advertisements

Analytical Learning.

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

 2002, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci Deductive.

Knowledge Representation and Reasoning Learning Sets of Rules and Analytical Learning Harris Georgiou – 4.

Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수

Knowledge in Learning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 19 Spring 2004.

Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages Chapter 3 : Describing Syntax and Semantics Axiomatic Semantics.

ISBN Chapter 3 Describing Syntax and Semantics.

처음 페이지로 이동 Chapter 11: Analytical Learning Inductive learning training examples n Analytical learning prior knowledge + deductive reasoning n Explanation.

Learning control knowledge and case-based planning Jim Blythe, with additional slides from presentations by Manuela Veloso.

Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.

1 Chapter 18 Learning from Observations Decision tree examples Additional source used in preparing the slides: Jean-Claude Latombe’s CS121 slides: robotics.stanford.edu/~latombe/cs121.

Machine Learning CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning.

Describing Syntax and Semantics

CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 22 Jim Martin.

MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

17.5 Rule Learning Given the importance of rule-based systems and the human effort that is required to elicit good rules from experts, it is natural to.

Inductive Logic Programming Includes slides by Luis Tari CS7741L16ILP.

Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.

1 Machine Learning: Lecture 11 Analytical Learning / Explanation-Based Learning (Based on Chapter 11 of Mitchell, T., Machine Learning, 1997)

Machine Learning Chapter 11. Analytical Learning

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Friday, February 4, 2000 Lijun.

November 10, Machine Learning: Lecture 9 Rule Learning / Inductive Logic Programming.

Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.

Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.

Overview Concept Learning Representation Inductive Learning Hypothesis

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

Explanation Based Learning (EBL) By M. Muztaba Fuad.

For Monday Finish chapter 19 No homework. Program 4 Any questions?

CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.

For Monday Finish chapter 19 Take-home exam due. Program 4 Any questions?

CS 5751 Machine Learning Chapter 10 Learning Sets of Rules1 Learning Sets of Rules Sequential covering algorithms FOIL Induction as the inverse of deduction.

CpSc 810: Machine Learning Analytical learning. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various.

Data Mining and Decision Support

CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.

CS 5751 Machine Learning Chapter 12 Comb. Inductive/Analytical 1 Combining Inductive and Analytical Learning Why combine inductive and analytical learning?

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

More Symbolic Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.

Concept Learning and The General-To Specific Ordering

Computational Learning Theory Part 1: Preliminaries 1.

Chap. 10 Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과.

Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.

Machine Learning: Ensemble Methods

Chapter 7. Propositional and Predicate Logic

Chapter 2 Concept Learning

CS 9633 Machine Learning Concept Learning

CS 9633 Machine Learning Inductive-Analytical Methods

Resolution in the Propositional Calculus

Analytical Learning Discussion (4 of 4):

Artificial Intelligence Chapter 17 Knowledge-Based Systems

Knowledge-Based Systems Chapter 17.

Artificial Intelligence Chapter 17 Knowledge-Based Systems

Knowledge Representation

Objective of This Course

Computational Learning Theory

Introduction to Machine Learning and Knowledge Representation

Computational Learning Theory

Why Machine Learning Flood of data

Knowledge in Learning Chapter 19

This Lecture Substitution model

Machine Learning: Lecture 6

Machine learning: building agents that are capable to learn from their own experience An autonomous agent is expected to learn from its own experience,

Artificial Intelligence Chapter 17 Knowledge-Based Systems

Machine Learning: UNIT-3 CHAPTER-1

Lecture 14 Learning Inductive inference

Machine Learning Chapter 2

Inductive Learning (2/2) Version Space and PAC Learning

Version Space Machine Learning Fall 2018.

Machine Learning Chapter 2

Presentation transcript:

CS 9633 Machine Learning Explanation Based Learning

Analytical Learning Inductive learning Given a large set of examples generalize to find features that distinguish positive and negative examples Examples include NNs, GAs, Decision trees, support vector machines, etc. Problem is that they perform poorly with very small training sets Analytical learning: combines examples and domain model

Learning by People People can often learn a concept from a single example. They appear to do this by analyzing the example in terms of previous knowledge to determine the most relevant features. Some inductive algorithms use domain knowledge to increase the hypothesis space Explanation based learning uses domain knowledge to decrease the size of the hypothesis space.

Example Positive example of: Chess positions in which black will lose its queen within two moves

Inductive versus Analytical Learning Inductive Learning: Given a hypothesis space H, set of training examples D, desired output is hypothesis consistent with training examples. Analytical Learning: Given hypothesis space H, set of training examples D, and a domain theory B, the desired output is hypothesis consistent with B and D.

SafeToStack Problem Instances Instance Space: Each instance describes a pair of objects represented by the predicates Type (Ex. Box, Endtable, …) Color Volume Owner Material Density On

SafeToStack Hypothesis Space Hypothesis space H is a set of Horn clause rules. The head of each rule is a literal containing the target predicate SafeToStack The body of each rule is a conjunction of literals based on The predicates used to describe the instances Additional general purpose predicates like: LessThan Equal Greater Additional general purpose functions like: Plus Minus Times SafeToStack(x,y) Volume(x,vx)  Volume(y,vy)  LessThan(vx,vy)

SafeToStack Target Concept SafeToStack(x,y)

SafeToStack Training Examples SafeToStack(Obj1, Obj2) On(Obj1, Obj2) Type(Obj1,Box) Type(Obj2, Endtable) Color(Obj1, Red) Color(Obj2, Blue) Volume(Obj1, 2) Owner(Obj1, Fred) Owner(Obj2, Louise) Density(Obj1, 0.3) Material(Obj1, Cardboard) Material(Obj2, Wood)

SafeToStack Domain Theory B SafeToStack(x,y)  Fragile(y) SafeToStack(x,y) Lighter(x,y) Lighter(x,y) Weight(x,wx)  Weight(y, wy)  LessThan(wx,wy) Weight(x,w)  Volume(x,v) Density(x,d)  Equal(w, times(v,d)) Weight(x,5) Type(x,Endtable) Fragile(x)  Material(x, Glass)

Analytical Learning Problem We must provide a domain theory sufficient to explain why observed positive examples satisfy the target concept. The domain theory is a set of Horn clauses.

Learning with Perfect Domain Theories Prolog EBG is an example system. Domain theory must be: Correct Complete with respect to target concept and instance space

Reasonableness of Perfect Domain Theories In some cases it is feasible to develop a perfect domain theory (chess is an example). Can help improve the performance of search intensive planning and optimization problems. It is often not feasible to develop a perfect domain theory. Must be able to generate plausible explanations

Prolog-EBL (see Table 11.2 for details) For each new positive training example not yet covered by a learned Horn clause, form a new Horn clause by Explaining the new positive training example by “proving” its truth Analyzing this explanation to determine an appropriate generalization Refine the current hypothesis by adding a new Horn clause to cover this positive example as well as other similar instances.

1. Explaining the Training Example Provide a proof that the training example satisfies the target concept. If the domain theory is correct and complete, use a proof procedure like resolution. If the domain theory is not correct and complete, must extend “proof procedure” to allow plausible approximate arguments.

SafeToStack(Obj1,Obj2) Lighter(Obj1,Obj2) Weight(Obj1,0.6) LessThan(0.6,5) Weight(Obj2,5) Type(Obj2,Endtable) Volume(Obj1,2) Density(Obj1,0.3) Equal(0.6,2*0.3)

EndTable Type Obj2 On Material Wood Owner Obj1 Density Volume 0.3 Color Louise 2 Blue Type Material Color Owner Box Cardboard Red Fred

2. Generating a General Rule General rule from domain theory SafeToStack(x,y)Volume(x,2)Density(x,0.3)Type(y, EndT) Note that we omitted the leaf nodes that are always satisfied independent of x and y Equal(0.6, times(2,0.3)) LessThan(0.6, 5) However, we would like an even more general rule

Weakest Preimage Goal is to compute the most general rule that can be justified by the explanation. We do this by computing the weakest preimage Definition: the weakest preimage of a conclusion C with respect to proof P is the most general set of assertions A, such that A entails C according to P.

Most General Rule The most general rule that can be justified by the explanation is: SafeToStack(x,y)Volume(x,vx)Density(x,dx)Equal(wx,times(vx,dx))LessThan(wx,5) Type(y, EndT) Use general procedure called regression to generate this rule Start with the target concept with respect to the final step in the explanation Generate weakest preimage of the target concept with respect to the preceding step Terminate after iterating over all steps in the explanation.

SafeToStack(Obj1,Obj2) SafeToStack(x,y)

SafeToStack(Obj1,Obj2) SafeToStack(x,y) Lighter(Obj1,Obj2) Lighter(x,y)

SafeToStack(Obj1,Obj2) SafeToStack(x,y) Lighter(Obj1,Obj2) Lighter(x,y) Weight(Obj1,0.6) Weight(x,wx) LessThan(0.6,5) LessThan(wx,wy) Weight(Obj2,5) Weight(y,wy)

SafeToStack(Obj1,Obj2) SafeToStack(x,y) Lighter(Obj1,Obj2) Lighter(x,y) Weight(Obj1,0.6) Weight(x,wx) LessThan(0.6,5) LessThan(wx,wy) Weight(Obj2,5) Weight(y,wy) Volume(Obj1,2) Density(Obj1, 0.3) Equal(0.6,2*0.3) Volume(x,xv) Density(x,dx) Equal(wx,vx*dx) LessThan(wx,wv) Weight(y,wy)

SafeToStack(Obj1,Obj2) SafeToStack(x,y) Lighter(Obj1,Obj2) Lighter(x,y) Weight(Obj1,0.6) Weight(x,wx) LessThan(0.6,5) LessThan(wx,wy) Weight(Obj2,5) Weight(y,wy) Volume(Obj1,2) Density(Obj1, 0.3) Equal(0.6,2*0.3) Volume(x,xv) Density(x,dx) Equal(wx,vx*dx) LessThan(wx,wy) Weight(y,wy) Type((obj2, EndT) Volume(x,xv) Density(x,dx) Equal(wx,vx*dx) LessThan(wx,5) Type(y,EndT)

3. Refine the Current Hypothesis The current hypothesis is the set of Horn clauses learned so far. At each stage, a new positive example is picked that is not yet covered by the current hypothesis and a new rule is developed to cover it. Only positive examples are covered by the rules. Instances not covered by the rules are classified as negative (negation-as-failure approach)

EBL Summary Individual examples are explained (proven) using prior knowledge Attributes included in the proof are considered relevant. Regression is used to generalize the rule. Generality of learned clauses depends on the formulation of the domain theory, the order in which examples are encountered, and other instances that share the same explanation. Assumes domain theory is complete and correct.

Different Perspectives on EBL EBL is a theory-guided generalization of examples. EBL is an example-guided reformulation of theories. Rules created that Follow deductively from the domain theory Classify the observed training examples in a single inference step EBL is just a restating of what the learner already knows (knowledge compilation)

Inductive Bias of EBL Domain theory Algorithm (sequential covering) used to choose among alternative Horn clauses. Generalization procedure favors small sets of Horn clauses.

EBL for Search Strategies Requirement for correct and complete domain theory is often difficult to meet, but can often be met in complex search tasks. This type of learning is called speedup learning. Can use EBL to learn efficient sequences of operators (evolve meta-operators)