1 Inductive Learning (continued) Chapter 19 Slides for Ch. 19 by J.C. Latombe.

Slides:



Advertisements
Similar presentations
2. Concept Learning 2.1 Introduction
Advertisements

1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Ch. 19 – Knowledge in Learning Supplemental slides for CSE 327 Prof. Jeff Heflin.
Concept Learning DefinitionsDefinitions Search Space and General-Specific OrderingSearch Space and General-Specific Ordering The Candidate Elimination.
Knowledge in Learning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 19 Spring 2004.
Università di Milano-Bicocca Laurea Magistrale in Informatica
Knowledge in Learning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 19 Spring 2005.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Artificial Intelligence Chapter 11: Planning
1 Machine Learning: Symbol-based 10a 10.0Introduction 10.1A Framework for Symbol-based Learning 10.2Version Space Search 10.3The ID3 Decision Tree Induction.
18 LEARNING FROM OBSERVATIONS
Learning R&N: ch 19, ch 20. Types of Learning Supervised Learning - classification, prediction Unsupervised Learning – clustering, segmentation, pattern.
1 Machine Learning Chapter , 19.1, skim CMSC 471 Adapted from slides by Tim Finin and Marie desJardins. Some material adopted from notes.
Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007 Machine Learning Lecture 2: Concept Learning and Version Spaces 1.
Machine Learning Tuomas Sandholm Carnegie Mellon University Computer Science Department.
1 Chapter 19 Knowledge in Learning Version spaces examples Additional sources used in preparing the slides: Jean-Claude Latombe’s CS121 slides: robotics.stanford.edu/~latombe/cs121.
Search in the semantic domain. Some definitions atomic formula: smallest formula possible (no sub- formulas) literal: atomic formula or negation of an.
Inductive Learning (1/2) Decision Tree Method (If it’s not simple, it’s not worth learning it) R&N: Chap. 18, Sect. 18.1–3.
Machine Learning: Symbol-Based
Last time Proof-system search ( ` ) Interpretation search ( ² ) Quantifiers Equality Decision procedures Induction Cross-cutting aspectsMain search strategy.
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Constraint Satisfaction Problems
Inductive Learning (1/2) Decision Tree Method
Machine Learning Version Spaces Learning. 2  Neural Net approaches  Symbolic approaches:  version spaces  decision trees  knowledge discovery  data.
Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias.
1 Machine Learning What is learning?. 2 Machine Learning What is learning? “That is what learning is. You suddenly understand something you've understood.
Machine Learning Chapter 11.
Learning Holy grail of AI. If we can build systems that learn, then we can begin with minimal information and high-level strategies and have systems better.
General-to-Specific Ordering. 8/29/03Logic Based Classification2 SkyAirTempHumidityWindWaterForecastEnjoySport SunnyWarmNormalStrongWarmSameYes SunnyWarmHighStrongWarmSameYes.
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 22, 2001 William.
Decision Tree Learning R&N: Chap. 18, Sect. 18.1–3.
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
CpSc 810: Machine Learning Concept Learning and General to Specific Ordering.
Concept Learning and the General-to-Specific Ordering 이 종우 자연언어처리연구실.
Outline Inductive bias General-to specific ordering of hypotheses
Overview Concept Learning Representation Inductive Learning Hypothesis
Learning, page 1 CSI 4106, Winter 2005 Symbolic learning Points Definitions Representation in logic What is an arch? Version spaces Candidate elimination.
KU NLP Machine Learning1 Ch 9. Machine Learning: Symbol- based  9.0 Introduction  9.1 A Framework for Symbol-Based Learning  9.2 Version Space Search.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
First-Order Logic and Inductive Logic Programming.
Machine Learning A Quick look Sources: Artificial Intelligence – Russell & Norvig Artifical Intelligence - Luger By: Héctor Muñoz-Avila.
Machine Learning Concept Learning General-to Specific Ordering
Web-Mining Agents Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)
Arc Consistency CPSC 322 – CSP 3 Textbook § 4.5 February 2, 2011.
More Symbolic Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Concept Learning and The General-To Specific Ordering
Computational Learning Theory Part 1: Preliminaries 1.
Concept learning Maria Simi, 2011/2012 Machine Learning, Tom Mitchell Mc Graw-Hill International Editions, 1997 (Cap 1, 2).
Inductive Learning (2/2) Version Space and PAC Learning Russell and Norvig: Chapter 18, Sections 18.5 through 18.7 Chapter 18, Section 18.5 Chapter 19,
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Learning From Observations Inductive Learning Decision Trees Ensembles.
Web-Mining Agents Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)
1 CMSC 671 Fall 2003 Class #24/25 –Wednesday, November 19 / Monday, November 24.
Logical Agents. Inference : Example 1 How many variables? 3 variables A,B,C How many models? 2 3 = 8 models.
Chapter 2 Concept Learning
Class #22/23 –Tuesday, November 16 / Thursday, November 18
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
CS 9633 Machine Learning Concept Learning
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Ordering of Hypothesis Space
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Version Spaces Learning
Ch. 19 – Knowledge in Learning
Machine Learning Chapter 2
Inductive Learning (2/2) Version Space and PAC Learning
Implementation of Learning Systems
Version Space Machine Learning Fall 2018.
Machine Learning Chapter 2
Presentation transcript:

1 Inductive Learning (continued) Chapter 19 Slides for Ch. 19 by J.C. Latombe

Learning logical descriptions The process of constructing a decision tree can be seen as searching the hypothesis space H. The goal is to construct an hypothesis H that explains the data in the training set. The hypothesis H is a logical description of the form: H: H 1 \/ H 2 \/ … \/ H n H i :  x Q(x) C i (x) where Q(x) is the goal predicate and C i (x) are candidate definitions.

Current-best-hypothesis Search Key idea: –Maintain a single hypothesis throughout. –Update the hypothesis to maintain consistency as a new example comes in.

Definitions Positive example: an instance of the hypothesis Negative example: not an instance of the hypothesis False negative example: the hypothesis predicts it should be a negative example but it is in fact positive False positive example: should be positive but it is actually negative.

Current best learning algorithm function Current-Best-Learning(examples) returns hypothesis H H := hypothesis consistent with first example for each remaining example e in examples do if e is false positive for H then H := choose a specialization of H consistent with examples else if e is false negative for H then H := choose a generalization of H consistent with examples if no consistent generalization/specialization found then fail end return H Note: choose operations are nondeterministic and indicate backtracking points.

Specialization and generalization + indicates positive examples  indicates negative examples Circled + and  indicates the example being added

How to Generalize a)Replacing Constants with Variables: Object(Animal,Bird)  Object (X,Bird) b)Dropping Conjuncts : Object(Animal,Bird) & Feature(Animal,Wings)  Object(Animal,Bird) c)Adding Disjuncts : Feature(Animal,Feathers)  Feature(Animal,Feathers) v Feature(Animal,Fly) d)Generalizing Terms: Feature(Bird,Wings)  Feature(Bird,Primary-Feature)

How to Specialize a)Replacing Variables with Constants: Object (X, Bird)  Object(Animal, Bird) b)Adding Conjuncts: Object(Animal,Bird)  Object(Animal,Bird) & Feature(Animal,Wings) c)Dropping Disjuncts: Feature(Animal,Feathers) v Feature(Animal,Fly)  Feature(Animal,Fly) d)Specializing Terms: Feature(Bird,Primary-Feature)  Feature(Bird,Wings)

Discussion The choice of initial hypothesis and specialization and generalization is nondeterministic (use heuristics) Problems –Extension made not necessarily lead to the simplest hypothesis. –May lead to an unrecoverable situation where no simple modification of the hypothesis is consistent with all of the examples. –What heuristics to use –Could be inefficient (need to check consistency with all examples), and perhaps backtrack or restart –Handling noise

10 Version spaces READING: Russell & Norvig, 19.1 alt: Mitchell, Machine Learning, Ch. 2 (through section 2.5) Hypotheses are represented by a set of logical sentences. Incremental construction of hypothesis. Prior “domain” knowledge can be included/used. Enables using the full power of logical inference. Version space slides adapted from Jean-Claude Latombe

11 Predicate-Learning Methods Decision tree Version space Explicit representation of hypothesis space H Need to provide H with some “structure”

12 Version Spaces The “version space” is the set of all hypotheses that are consistent with the training instances processed so far. An algorithm: –V := H ;; the version space V is ALL hypotheses H –For each example e: Eliminate any member of V that disagrees with e If V is empty, FAIL –Return V as the set of consistent hypotheses

13 Version Spaces: The Problem PROBLEM: V is huge!! Suppose you have N attributes, each with k possible values Suppose you allow a hypothesis to be any disjunction of instances There are k N possible instances  |H| = 2 k N If N=5 and k=2, |H| = 2 32 !! How many boolean functions can you write for 1 attribut? =4

Number of hypothesis 14 0x¬x1 xh1h2h3h

15 Version Spaces: The Tricks First Trick: Don’t allow arbitrary disjunctions –Organize the feature values into a hierarchy of allowed disjunctions, e.g. any-color yellowwhite pale blue dark black –Now there are only 7 “abstract values” instead of 16 disjunctive combinations (e.g., “black or white” isn’t allowed) Second Trick: Define a partial ordering on H (“general to specific”) and only keep track of the upper bound and lower bound of the version space RESULT: An incremental, efficient algorithm!

Why partial ordering? 16

17 Rewarded Card Example (r=1) v … v (r=10) v (r=J) v (r=Q) v (r=K)  ANY-RANK(r) (r=1) v … v (r=10)  NUM(r) (r=J) v (r=Q) v (r=K)  FACE(r) (s=  ) v (s=  ) v (s=  ) v (s= )  ANY-SUIT(s) (s=  ) v (s=  )  BLACK(s) (s=  ) v (s= )  RED(s) A hypothesis is any sentence of the form: R(r)  S(s) where: R(r) is ANY-RANK(r), NUM(r), FACE(r), or (r=x) S(s) is ANY-SUIT(s), BLACK(s), RED(s), or (s=y)

18 Simplified Representation For simplicity, we represent a concept by rs, with: r  {a, n, f, 1, …, 10, j, q, k} s  {a, b, r, , , , } For example: n  represents: NUM(r)  (s=  ) aa represents: ANY-RANK(r)  ANY-SUIT(s)

19 Extension of a Hypothesis The extension of a hypothesis h is the set of objects that satisfies h Examples: The extension of f  is: {j , q , k  } The extension of aa is the set of all cards

20 More General/Specific Relation Let h 1 and h 2 be two hypotheses in H h 1 is more general than h 2 iff the extension of h 1 is a proper superset of the extension of h 2 Examples: aa is more general than f  f is more general than q fr and nr are not comparable

21 More General/Specific Relation Let h 1 and h 2 be two hypotheses in H h 1 is more general than h 2 iff the extension of h 1 is a proper superset of the extension of h 2 The inverse of the “more general” relation is the “more specific” relation The “more general” relation defines a partial ordering on the hypotheses in H

22 Example: Subset of Partial Order aa naab nb nn 44 4b aa 4a

23 G-Boundary / S-Boundary of V A hypothesis in V is most general iff no hypothesis in V is more general G-boundary G of V: Set of most general hypotheses in V

24 G-Boundary / S-Boundary of V A hypothesis in V is most general iff no hypothesis in V is more general G-boundary G of V: Set of most general hypotheses in V A hypothesis in V is most specific iff no hypothesis in V is more specific S-boundary S of V: Set of most specific hypotheses in V all inconsistent G1 G2 G3 S1 S2 S3

25 aa naab nb nn 44 4b aa 4a Example: G-/S-Boundaries of V aa 44 11 k …… Now suppose that 4  is given as a positive example S G We replace every hypothesis in S whose extension does not contain 4  by its generalization set

26 Example: G-/S-Boundaries of V aa naab nb nn 44 4b aa 4a Here, both G and S have size 1. This is not the case in general! S G

27 Example: G-/S-Boundaries of V aa naab nb nn 44 4b aa 4a Let 7  be the next (positive) example Generalization set of 4  The generalization set of an hypothesis h is the set of the hypotheses that are immediately more general than h

28 Example: G-/S-Boundaries of V aa naab nb nn 44 4b aa 4a Let 7  be the next (positive) example

29 Example: G-/S-Boundaries of V aa naab nb nn aa Let 5 be the next (negative) example Specialization set of aa

30 Example: G-/S-Boundaries of V ab nb nn aa G and S, and all hypotheses in between form exactly the version space

31 Example: G-/S-Boundaries of V ab nb nn aa Do 8 , 6 , j  satisfy CONCEPT? Yes No Maybe At this stage …

32 Example: G-/S-Boundaries of V ab nb nn aa Let 2  be the next (positive) example

33 Example: G-/S-Boundaries of V ab nb Let j  be the next (negative) example

34 Example: G-/S-Boundaries of V nb + 4  7  2  – 5 j  NUM(r)  BLACK(s)

35 Example: G-/S-Boundaries of V ab nb nn aa … and let 8  be the next (negative) example Let us return to the version space … The only most specific hypothesis disagrees with this example, so no hypothesis in H agrees with all examples

36 Example: G-/S-Boundaries of V ab nb nn aa … and let j be the next (positive) example Let us return to the version space … The only most general hypothesis disagrees with this example, so no hypothesis in H agrees with all examples

37 Version Space Update 1.x  new example 2.If x is positive then (G,S)  POSITIVE-UPDATE(G,S,x) 3.Else (G,S)  NEGATIVE-UPDATE(G,S,x) 4.If G or S is empty then return failure

38 POSITIVE-UPDATE(G,S,x) 1.Eliminate all hypotheses in G that do not agree with x

39 POSITIVE-UPDATE(G,S,x) 1.Eliminate all hypotheses in G that do not agree with x 2.Minimally generalize all hypotheses in S until they are consistent with x Using the generalization sets of the hypotheses

40 POSITIVE-UPDATE(G,S,x) 1.Eliminate all hypotheses in G that do not agree with x 2.Minimally generalize all hypotheses in S until they are consistent with x 3.Remove from S every hypothesis that is neither more specific than nor equal to a hypothesis in G This step was not needed in the card example

41 POSITIVE-UPDATE(G,S,x) 1.Eliminate all hypotheses in G that do not agree with x 2.Minimally generalize all hypotheses in S until they are consistent with x 3.Remove from S every hypothesis that is neither more specific than nor equal to a hypothesis in G 4.Remove from S every hypothesis that is more general than another hypothesis in S 5.Return (G,S)

42 NEGATIVE-UPDATE(G,S,x) 1.Eliminate all hypotheses in S that do agree with x 2.Minimally specialize all hypotheses in G until they are consistent with (exclude) x 3.Remove from G every hypothesis that is neither more general than nor equal to a hypothesis in S 4.Remove from G every hypothesis that is more specific than another hypothesis in G 5.Return (G,S)

43 Example-Selection Strategy Suppose that at each step the learning procedure has the possibility to select the object (card) of the next example Let it pick the object such that, whether the example is positive or not, it will eliminate one- half of the remaining hypotheses Then a single hypothesis will be isolated in O(log |H|) steps

44 aa naab nb nn aa Example 9  ? j ? j  ?

45 Example-Selection Strategy Suppose that at each step the learning procedure has the possibility to select the object (card) of the next example Let it pick the object such that, whether the example is positive or not, it will eliminate one-half of the remaining hypotheses Then a single hypothesis will be isolated in O(log |H|) steps But picking the object that eliminates half the version space may be expensive

46 Noise If some examples are misclassified, the version space may collapse Possible solution: Maintain several G- and S-boundaries, e.g., consistent with all examples, all examples but one, etc…

47 VSL vs DTL Decision tree learning (DTL) is more efficient if all examples are given in advance; else, it may produce successive hypotheses, each poorly related to the previous one Version space learning (VSL) is incremental DTL can produce simplified hypotheses that do not agree with all examples DTL has been more widely used in practice