1 Chapter 19 Knowledge in Learning Version spaces examples Additional sources used in preparing the slides: Jean-Claude Latombe’s CS121 slides: robotics.stanford.edu/~latombe/cs121.

Slides:

Advertisements

Similar presentations

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

Advertisements

1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)

The Theory of NP-Completeness

Concept Learning DefinitionsDefinitions Search Space and General-Specific OrderingSearch Space and General-Specific Ordering The Candidate Elimination.

Knowledge in Learning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 19 Spring 2004.

Università di Milano-Bicocca Laurea Magistrale in Informatica

1 Machine Learning: Symbol-based 10a 10.0Introduction 10.1A Framework for Symbol-based Learning 10.2Version Space Search 10.3The ID3 Decision Tree Induction.

Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007 Machine Learning Lecture 2: Concept Learning and Version Spaces 1.

1 Chapter 18 Learning from Observations Decision tree examples Additional source used in preparing the slides: Jean-Claude Latombe’s CS121 slides: robotics.stanford.edu/~latombe/cs121.

Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.

Machine Learning CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning.

Machine Learning: Symbol-based

Inductive Learning (1/2) Decision Tree Method (If it’s not simple, it’s not worth learning it) R&N: Chap. 18, Sect. 18.1–3.

Machine Learning: Symbol-Based

Logical Agents Chapter 7 Feb 26, Knowledge and Reasoning Knowledge of action outcome enables problem solving –a reflex agent can only find way from.

MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

Inductive Learning (1/2) Decision Tree Method

Machine Learning Version Spaces Learning. 2  Neural Net approaches  Symbolic approaches:  version spaces  decision trees  knowledge discovery  data.

Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.

Inductive learning Simplest form: learn a function from examples

CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias.

1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.

Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.

1 Machine Learning What is learning?. 2 Machine Learning What is learning? “That is what learning is. You suddenly understand something you've understood.

Machine Learning Chapter 11.

Machine Learning CSE 681 CH2 - Supervised Learning.

Learning Holy grail of AI. If we can build systems that learn, then we can begin with minimal information and high-level strategies and have systems better.

Learning from Observations Chapter 18 Through

CHAPTER 18 SECTION 1 – 3 Learning from Observations.

Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 22, 2001 William.

Chapter 2: Concept Learning and the General-to-Specific Ordering.

George F Luger ARTIFICIAL INTELLIGENCE 6th edition Structures and Strategies for Complex Problem Solving Machine Learning: Symbol-Based Luger: Artificial.

1 The Theory of NP-Completeness 2 Cook ’ s Theorem (1971) Prof. Cook Toronto U. Receiving Turing Award (1982) Discussing difficult problems: worst case.

Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.

CpSc 810: Machine Learning Concept Learning and General to Specific Ordering.

Concept Learning and the General-to-Specific Ordering 이 종우 자연언어처리연구실.

Outline Inductive bias General-to specific ordering of hypotheses

Overview Concept Learning Representation Inductive Learning Hypothesis

CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.

START OF DAY 1 Reading: Chap. 1 & 2. Introduction.

Learning, page 1 CSI 4106, Winter 2005 Symbolic learning Points Definitions Representation in logic What is an arch? Version spaces Candidate elimination.

1 Inductive Learning (continued) Chapter 19 Slides for Ch. 19 by J.C. Latombe.

KU NLP Machine Learning1 Ch 9. Machine Learning: Symbol- based  9.0 Introduction  9.1 A Framework for Symbol-Based Learning  9.2 Version Space Search.

Automated Reasoning Early AI explored how to automated several reasoning tasks – these were solved by what we might call weak problem solving methods as.

CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.

1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.

1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 2-Concept Learning (1/3) Eduardo Poggi

Machine Learning A Quick look Sources: Artificial Intelligence – Russell & Norvig Artifical Intelligence - Luger By: Héctor Muñoz-Avila.

Machine Learning Concept Learning General-to Specific Ordering

Introduction to Machine Learning

More Symbolic Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.

Concept Learning and The General-To Specific Ordering

Computational Learning Theory Part 1: Preliminaries 1.

Chap. 10 Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과.

Chapter 18 Section 1 – 3 Learning from Observations.

Concept learning Maria Simi, 2011/2012 Machine Learning, Tom Mitchell Mc Graw-Hill International Editions, 1997 (Cap 1, 2).

Inductive Learning (2/2) Version Space and PAC Learning Russell and Norvig: Chapter 18, Sections 18.5 through 18.7 Chapter 18, Section 18.5 Chapter 19,

Learning From Observations Inductive Learning Decision Trees Ensembles.

Chapter 2 Concept Learning

Machine Learning: Symbol-Based

Machine Learning: Symbol-Based

CS 9633 Machine Learning Concept Learning

Ordering of Hypothesis Space

Why Machine Learning Flood of data

Machine Learning Chapter 2

Inductive Learning (2/2) Version Space and PAC Learning

Implementation of Learning Systems

Version Space Machine Learning Fall 2018.

Machine Learning Chapter 2

Presentation transcript:

1 Chapter 19 Knowledge in Learning Version spaces examples Additional sources used in preparing the slides: Jean-Claude Latombe’s CS121 slides: robotics.stanford.edu/~latombe/cs121

2 A learning agent environment sensors actuators Learning element KB Critic

3 A learning game with playing cards I would like to show what a full house is. I give you several examples. Some are full houses, some are not: 6  6  6  9  9  is a full house 6  6  6  6  9  is not a full house 3  3  3  6  6  is a full house 1  1  1  6  6  is a full house Q  Q  Q  6  6  is a full house 1  2  3  4  5  is not a full house 1  1  3  4  5  is not a full house 1  1  1  4  5  is not a full house 1  1  1  4  4  is a full house

4 A learning game with playing cards The concept of a full house can be described as: three of a kind and a pair of another kind. 6  6  6 9  9 is a full house 6  6  6 6  9 is not a full house 3  3 3  6  6  is a full house 1  1 1  6  6  is a full house Q  Q Q  6  6  is a full house 1  2  3 4  5 is not a full house 1  1  3 4  5 is not a full house 1  1  1 4  5 is not a full house 1  1  1 4  4 is a full house

5 Intuitively, I’m asking you to describe a set. This set is the concept I want you to learn. This is called inductive learning, i.e., learning a generalization from a set of examples. Concept learning is a typical inductive learning problem: given examples of some concept, such as “cat,” “soybean disease,” or “good stock investment,” we attempt to infer a definition that will allow the learner to correctly recognize future instances of that concept.

6 Supervised learning This is called supervised learning because we assume that there is a teacher who classified the training data: the learner is told whether an instance is a positive or negative example of a target concept.

7 Supervised learning – the question This definition might seem counter intuitive. If the teacher knows the concept, why doesn’t s/he tell us directly and save us all the work?

8 Supervised learning – the answer The teacher only knows the classification, the learner has to find out what the classification is. Imagine an online store: there is a lot of data concerning whether a customer returns to the store. The information is there in terms of attributes and whether they come back or not. However, it is up to the learning system to characterize the concept, e.g., If a customer bought more than 4 books, s/he will return. If a customer spent more than $50, s/he will return.

9 Rewarded card example Deck of cards, with each card designated by [r,s], its rank and suit, and some cards “rewarded” Background knowledge in the KB: ((r=1)  …  (r=10))  NUM (r) ((r=J)  (r=Q)  (r=K))  FACE (r) ((s=S)  (s=C))  BLACK (s) ((s=D)  (s=H))  RED (s) Training set: REWARD([4,C])  REWARD([7,C])  REWARD([2,S])   REWARD([5,H])   REWARD([J,S])

10 Rewarded card example Training set: REWARD([4,C])  REWARD([7,C])  REWARD([2,S])   REWARD([5,H])   REWARD([J,S]) CardIn the target set? 4  yes 7  yes 2  yes 5  no J  no Possible inductive hypothesis, h,: h = (NUM (r)  BLACK (s))  REWARD([r,s])

11 Learning a predicate Set E of objects (e.g., cards, drinking cups, writing instruments) Goal predicate CONCEPT (X), where X is an object in E, that takes the value True or False (e.g., REWARD, MUG, PENCIL, BALL) Observable predicates A(X), B(X), … (e.g., NUM, RED, HAS-HANDLE, HAS-ERASER) Training set: values of CONCEPT for some combinations of values of the observable predicates Find a representation of CONCEPT of the form CONCEPT(X)  A(X)  ( B(X)  C(X) )

12 How can we do this? Go with the most general hypothesis possible: “any card is a rewarded card” This will cover all the positive examples, but will not be able to eliminate any negative examples. Go with the most specific hypothesis possible: “the rewarded cards are 4 , 7 , 2  ” This will correctly sort all the examples in the training set, but it is overly specific, will not be able to sort any new examples. But the above two are good starting points.

13 Version space algorithm What we want to do is start with the most general and specific hypotheses, and when we see a positive example, we minimally generalize the most specific hypothesis when we see a negative example, we minimally specialize the most general hypothesis When the most general hypothesis and the most specific hypothesis are the same, the algorithm has converged, this is the target concept

14 Pictorially ? ? ? ?? ? ? ? ? ? ? ? ? boundary of S potential target concepts boundary of G

15 Hypothesis space When we shrink G, or enlarge S, we are essentially conducting a search in the hypothesis space A hypothesis is any sentence h of the form CONCEPT(X)  A(X)  ( B(X)  C(X) ) where, the right hand side is built with observable predicates The set of all hypotheses is called the hypothesis space, or H A hypothesis h agrees with an example if it gives the correct value of CONCEPT

16 Size of the hypothesis space n observable predicates 2^n entries in the truth table A hypothesis is any subset of observable predicates with the associated truth tables: so there are 2^(2^n) hypotheses to choose from: BIG! n=6  2 ^ 64 = 1.8 x 10 ^ 19 BIG! Generate-and-test won’t work. 2 2n2n

17 Simplified Representation for the card problem For simplicity, we represent a concept by rs, with: r = a, n, f, 1, …, 10, j, q, k s = a, b, r, , , , For example: n  represents: NUM(r)  (s=  )  REWARD([r,s]) aa represents: ANY-RANK(r)  ANY-SUIT(s)  REWARD([r,s])

18 Extension of an hypothesis The extension of an hypothesis h is the set of objects that verifies h. For instance, the extension of f  is: {j , q , k  }, and the extension of aa is the set of all cards.

19 More general/specific relation Let h1 and h2 be two hypotheses in H h1 is more general than h2 iff the extension of h1 is a proper superset of the extension of h2 For instance, aa is more general than f , f is more general than q, fr and nr are not comparable

20 More general/specific relation (cont’d) The inverse of the “more general” relation is the “more specific” relation The “more general” relation defines a partial ordering on the hypotheses in H

21 aa naab nb nn 44 4b aa 4a A subset of the partial order for cards

22 G-Boundary / S-Boundary of V An hypothesis in V is most general iff no hypothesis in V is more general G-boundary G of V: Set of most general hypotheses in V An hypothesis in V is most specific iff no hypothesis in V is more general S-boundary S of V: Set of most specific hypotheses in V

23 aa naab nb nn 44 4b aa 4a aa 44 11 k …… S G Example: The starting hypothesis space

24 We replace every hypothesis in S whose extension does not contain 4  by its generalization set 4  is a positive example aa naab nb nn 44 4b aa 4a The generalization set of a hypothesis h is the set of the hypotheses that are immediately more general than h Generalization set of 4  Specialization set of aa

25 Legend: G S Minimally generalize the most specific hypothesis set 7  is the next positive example aa naab nb nn 44 4b aa 4a We replace every hypothesis in S whose extension does not contain 7  by its generalization set

26 Minimally generalize the most specific hypothesis set 7  is positive(cont’d) aa naab nb nn 44 4b aa 4a

27 Minimally generalize the most specific hypothesis set 7  is positive (cont’d) aa naab nb nn 44 4b aa 4a

28 Minimally specialize the most general hypothesis set 5  is a negative example aa naab nb nn 44 4b aa 4a Specialization set of aa

29 Minimally specialize the most general hypothesis set 5  is negative(cont’d) aa naab nb nn 44 4b aa 4a

30 ab nb nn aa G and S, and all hypotheses in between form exactly the version space 1. If an hypothesis between G and S disagreed with an example x, then an hypothesis G or S would also disagree with x, hence would have been removed After 3 examples (2 positive,1 negative)

31 ab nb nn aa G and S, and all hypotheses in between form exactly the version space After 3 examples (2 positive,1 negative) 2. If there were an hypothesis not in this set which agreed with all examples, then it would have to be either no more specific than any member of G – but then it would be in G – or no more general than some member of S – but then it would be in S

32 ab nb nn aa Do 8 , 6 , j  satisfy CONCEPT? Yes No Maybe At this stage

33 ab nb nn aa 2  is the next positive example Minimally generalize the most specific hypothesis set

34 j  is the next negative example Minimally specialize the most general hypothesis set ab nb

35 nb + 4  7  2  – 5 j  (NUM(r)  BLACK(s))  REWARD([r,s]) Result

36 The version space algorithm Begin Initialize G to be the most general concept in the space Initialize S to the first positive training instance For each example x If x is positive, then (G,S)  POSITIVE-UPDATE(G,S,x) else (G,S)  NEGATIVE-UPDATE(G,S,x) If G = S and both are singletons, then the algorithm has found a single concept that is consistent with all the data and the algorithm halts (the version space converged) If G and S become empty, then there is no concept that covers all the positive instances and none of the negative instances (the version space collapsed) End

37 The version space algorithm (cont’d) POSITIVE-UPDATE(G,S,p) Begin Delete all members of G that fail to match p For every s  S, if s does not match p, replace s with its most specific generalizations that match p; Delete from S any hypothesis that is more general than some other hypothesis in S; Delete from S any hypothesis that is neither more specific than nor equal to a hypothesis in G; End;

38 The version space algorithm (cont’d) NEGATIVE-UPDATE(G,S,n) Begin Delete all members of S that match n For every g  G, that matches n, replace g with its most general specializations that do not match n; Delete from G any hypothesis that is more specific than some other hypothesis in G; Delete from G any hypothesis that is neither more general nor equal to hypothesis in S; End;

39 Comments on Version Space Learning (VSL) It is a bi-directional search. One direction is specific to general and is driven by positive instances. The other direction is general to specific and is driven by negative instances. It is an incremental learning algorithm. The examples do not have to be given all at once (as opposed to learning decision trees.) The version space is meaningful even before it converges. The order of examples matters for the speed of convergence As is, cannot tolerate noise (misclassified examples), the version space might collapse

40 More on generalization operators Replacing constants with variables. For example, color (ball,red) generalizes to color (X,red) Dropping conditions from a conjunctive expression. For example, shape (X, round)  size (X, small)  color (X, red) generalizes to shape (X, round)  color (X, red)

41 More on generalization operators (cont’d) Adding a disjunct to an expression. For example, shape (X, round)  size (X, small)  color (X, red) generalizes to shape (X, round)  size (X, small)  ( color (X, red)  (color (X, blue) ) Replacing a property with its parent in a class hierarchy. If we know that primary_color is a superclass of red, then color (X, red) generalizes to color (X, primary_color)

42 Another example sizes = {large, small} colors = {red, white, blue} shapes = {sphere, brick, cube} object (size, color, shape) If the target concept is a “red ball,” then size should not matter, color should be red, and shape should be sphere If the target concept is “ball,” then size or color should not matter, shape should be sphere.

43 A portion of the concept space

44 Learning the concept of a “red ball” G : { obj (X, Y, Z)} S : { } positive: obj (small, red, sphere) G: { obj (X, Y, Z)} S : { obj (small, red, sphere) } negative: obj (small, blue, sphere) G: { obj (large, Y, Z), obj (X, red, Z), obj (X, white, Z) obj (X,Y, brick), obj (X, Y, cube) } S: { obj (small, red, sphere) } delete from G every hypothesis that is neither more general than nor equal to a hypothesis in S G: {obj (X, red, Z) } S: { obj (small, red, sphere) }

45 Learning the concept of a “red ball” (cont’d) G: { obj (X, red, Z) } S: { obj (small, red, sphere) } positive: obj (large, red, sphere) G: { obj (X, red, Z)} S : { obj (X, red, sphere) } negative: obj (large, red, cube) G: { obj (small, red, Z), obj (X, red, sphere), obj (X, red, brick)} S: { obj (X, red, sphere) } delete from G every hypothesis that is neither more general than nor equal to a hypothesis in S G: {obj (X, red, sphere) } S: { obj (X, red, sphere) } converged to a single concept

46 LEX: a program that learns heuristics Learns heuristics for symbolic integration problems Typical transformations used in performing integration include OP1:  r f(x) dx  r  f(x) dx OP2:  u dv  uv -  v du OP3: 1 * f(x)  f(x) OP4:  (f 1 (x) + f 2 (x)) dx   f 1 (x) dx +  f 2 (x) dx A heuristic tells when an operator is particularly useful: If a problem state matches  x transcendental(x) dx then apply OP2 with bindings u = x dv = transcendental (x) dx

47 A portion of LEX’s hierarchy of symbols

48 The overall architecture A generalizer that uses candidate elimination to find heuristics A problem solver that produces positive and negative heuristics from a problem trace A critic that produces positive and negative instances from a problem traces (the credit assignment problem) A problem generator that produces new candidate problems

49 A version space for OP2 (Mitchell et al.,1983)

50 Comments on LEX The evolving heuristics are not guaranteed to be admissible. The solution path found by the problem solver may not actually be a shortest path solution. Empirical studies: before: 5 problems solved in an average of 200 steps train with 12 problems after: 5 problems solved in an average of 20 steps

51 More comments on VSL Uses breadth-first search which might be inefficient:  might need to use beam-search to prune hypotheses from G and S if they grow excessively  another alternative is to use inductive-bias and restrict the concept language How to address the noise problem? Maintain several G and S sets.