Concept Learning and The General-To Specific Ordering

Slides:



Advertisements
Similar presentations
Concept Learning and the General-to-Specific Ordering
Advertisements

2. Concept Learning 2.1 Introduction
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Concept Learning DefinitionsDefinitions Search Space and General-Specific OrderingSearch Space and General-Specific Ordering The Candidate Elimination.
CS 484 – Artificial Intelligence1 Announcements Project 1 is due Tuesday, October 16 Send me the name of your konane bot Midterm is Thursday, October 18.
Università di Milano-Bicocca Laurea Magistrale in Informatica
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007 Machine Learning Lecture 2: Concept Learning and Version Spaces 1.
Chapter 2 - Concept learning
Machine Learning: Symbol-Based
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Concept Learning and Version Spaces
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Artificial Intelligence 6. Machine Learning, Version Space Method
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, January 19, 2001.
Computing & Information Sciences Kansas State University Lecture 01 of 42 Wednesday, 24 January 2008 William H. Hsu Department of Computing and Information.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Friday, 25 January 2008 William.
Machine Learning Version Spaces Learning. 2  Neural Net approaches  Symbolic approaches:  version spaces  decision trees  knowledge discovery  data.
CS 484 – Artificial Intelligence1 Announcements List of 5 source for research paper Homework 5 due Tuesday, October 30 Book Review due Tuesday, October.
CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias.
For Friday Read chapter 18, sections 3-4 Homework: –Chapter 14, exercise 12 a, b, d.
1 Machine Learning What is learning?. 2 Machine Learning What is learning? “That is what learning is. You suddenly understand something you've understood.
Machine Learning Chapter 11.
Machine Learning CSE 681 CH2 - Supervised Learning.
CpSc 810: Machine Learning Decision Tree Learning.
General-to-Specific Ordering. 8/29/03Logic Based Classification2 SkyAirTempHumidityWindWaterForecastEnjoySport SunnyWarmNormalStrongWarmSameYes SunnyWarmHighStrongWarmSameYes.
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Gun Ho Lee Soongsil University, Seoul.
1 Concept Learning By Dong Xu State Key Lab of CAD&CG, ZJU.
机器学习 陈昱 北京大学计算机科学技术研究所 信息安全工程研究中心. 课程基本信息  主讲教师:陈昱 Tel :  助教:程再兴, Tel :  课程网页:
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 22, 2001 William.
Chapter 2: Concept Learning and the General-to-Specific Ordering.
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
CpSc 810: Machine Learning Concept Learning and General to Specific Ordering.
Concept Learning and the General-to-Specific Ordering 이 종우 자연언어처리연구실.
Outline Inductive bias General-to specific ordering of hypotheses
Overview Concept Learning Representation Inductive Learning Hypothesis
Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.
CS Machine Learning 15 Jan Inductive Classification.
START OF DAY 1 Reading: Chap. 1 & 2. Introduction.
For Monday Finish chapter 19 Take-home exam due. Program 4 Any questions?
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 2-Concept Learning (1/3) Eduardo Poggi
Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Thursday, August 26, 1999 William.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 34 of 41 Wednesday, 10.
Machine Learning: Lecture 2
Machine Learning Concept Learning General-to Specific Ordering
Kansas State University Department of Computing and Information Sciences CIS 690: Implementation of High-Performance Data Mining Systems Thursday, 20 May.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
CS464 Introduction to Machine Learning1 Concept Learning Inducing general functions from specific training examples is a main issue of machine learning.
Computational Learning Theory Part 1: Preliminaries 1.
Concept learning Maria Simi, 2011/2012 Machine Learning, Tom Mitchell Mc Graw-Hill International Editions, 1997 (Cap 1, 2).
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Chapter 2 Concept Learning
Concept Learning Machine Learning by T. Mitchell (McGraw-Hill) Chp. 2
CSE543: Machine Learning Lecture 2: August 6, 2014
CS 9633 Machine Learning Concept Learning
Computational Learning Theory
Analytical Learning Discussion (4 of 4):
Machine Learning Chapter 2
Ordering of Hypothesis Space
Concept Learning.
IES 511 Machine Learning Dr. Türker İnce (Lecture notes by Prof. T. M
Concept Learning Berlin Chen 2005 References:
Machine Learning Chapter 2
Implementation of Learning Systems
Version Space Machine Learning Fall 2018.
Machine Learning Chapter 2
Presentation transcript:

Concept Learning and The General-To Specific Ordering Machine Learning Seminar 2010-01-07 Seoul National University

Overview Concept Learning Find-S Algorithm. Version Space (List-then-Eliminate Algorithm.) Candidate-Elimination Learning Algorithm. Inductive Bias

Concept Learning • Concept • Concept Learning - ‘car’ , ‘bird’ - ‘situations in which I should study more in order to pass the exam’ • Concept Learning – Inferring a boolean-valued function from training examples of its input and output

Concept Learning Task • Example target concept – days on which my friend Aldo enjoys his favorite water sport • Hypothesis representation –  ? : any value is acceptable for this attribute – Single value(e.g. Warm, Strong etc.) –  Ф : no value is acceptable - ex) •   < ?, Cold, High, ?, ?, ? > •   < ?, ?, ?, ?, ?, ? > most general •   <Ф,Ф,Ф,Ф,Ф,Ф> most specific

Concept Learning Task Enjoy Sport e.g. learning task • Given: – Instance X : Possible days, each described by the attributes – Hypothesis H : Each hypothesis is described by a conjunction of constraints on the attributes. The constraints may be “?”, “Ф”, or specific value. – Target concept c : EnjoySport: X→ {0,1} c(x) = 1 positive example c(x) = 0 negative example – Training Example D Example Sky Airtemp Humidity Wind Water Forecast EnjoySport 1 Sunny Warm Normal Strong Same Yes 2 High 3 Rainy Cold Change No 4 Cool

Concept Learning Task Inductive learning hypothesis • Determine: – A hypothesis h in H such that h(x) = c(x) for all x in X. Inductive learning hypothesis • Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples.   

General-To-Specific Ordering X1 h1 h2 h3 X2 Specific General Instances X Hypotheses H x1 = < Sunny, Warm, High, Strong, Cool, Same > x2 = < Sunny, Warm, High, Light, Warm, Same > Note the subset of instances characterized by h2 subsumes the subset characterized by h1, hence h2 is more_general_than h1 !!! h1 = < Sunny, ?, ?, Strong, ?, ? > h2 = < Sunny, ?, ?, ?, ?, ? > h3 = < Sunny, ?, ?, ?, Cool, ? >

Let hj and hk be boolean-valued functions defined over X Then, The ≥g relation defines a partial order over the hypothesis space H (The relation is reflexive, antisymmetric, and transitive). “The structure is a partial order” ⇒ There may be pairs of hypotheses such as h1 and h3, such that h1 ≥g h3 and h3 ≥g h1. hj is more_general_than_or_equal_to ( hj ≥g hk ) hk if and only if (∀x∈X)[(hk(x)=1)→(hj(x)=1)] hj more_general_than ( hj >g hk ) hk if and only if ( hj ≥g hk ) ∧ ( hk ≥g hj )

Find-S Algorithm • Algorithm 1. Initialize h to be the most specific hypothesis in H 2. For each positive training instance x – For each attribute constraint ai is satisfied by x • If the constraint ai is satisfied by x – then do Nothing • Else replace ai in h by the next more general constraint that is satisfied by x    3. Output hypothesis h -------------------------------------------------------------------------- • Steps – h ← < Ф, Ф, Ф, Ф, Ф, Ф > : most specific –  h ← < Sunny,Warm,Normal,Strong,Warm,Same> –  h ← < Sunny, Warm, ?, Strong, Warm, Same> –  h ← < Sunny, Warm, ?, Strong, ?, ? >  • Find-S Algo. Simply ignores every negative example! • Find-S is guaranteed to output the most specific hypothesis within H that is consistent with the positive training examples.

Key Property of Find-S Algo. - For hypothesis space described by conjunctions of attribute constraints (such as H for the EnjoySport task) , Find-S is guaranteed to output the most specific hypothesis within H that is consistent with the positive training examples. - Its final hypothesis will also be consistent with the negative examples, provided the correct target concept is contained in H, and provided the training examples are correct.

Problems • Has the learner converged to the correct target concept ? • Why prefer the most specific hypothesis ? • Are the training examples consistent ? • What if there are several maximally specific consistent hypotheses ?

Version Space and The Candidate Elimination algo. • Key idea – Output a description of the set of all hypotheses consistent with the training examples. • Limit – Performs poorly when given noisy training data both Candidate Elimination algo. And Find-S   Representation • Consistent – A hypothesis h is consistent with a set of training examples D if and only if h(x) = c(x) for each example <x, c(x)> in D. – Consistent(h,D) ≡ (∀<x,c(x)>∈D)h(x)=c(x) • Version Space – VSH,D = { h ∈H | Consistent( h, D )}

List-Then-Eliminate algorithm. • Initializes the version space to contain all hypotheses in H, then eliminates any hypothesis found inconsistent with any training example. • When hypothesis space H is finite. • Exhaustive!

Version space (Diagram) { < Sunny, ?, ?, ?, ?, ? > , < ?, Warm, ?, ?, ?, ? > } { < Sunny, ?, ?, Strong, ?, ? > } { < Sunny, Warm, ?, ?, ?, ? > } { < ?, Warm, ?, Strong, ?, ? > } { < Sunny, Warm, ?, Strong, ?, ? > } G: S:

Compact Representation of Version Space • Represented by its most general and least general members. (general/specific boundary) • general boundary G, with respect to hypothesis space H and training data D, is the set of maximally general members of H consistent with D. –  G ≡ { g ∈H | Consistent ( g, D ) ∧ (¬∃g’∈H)[(g’>gg) ∧Consistent(g’,D)]}  • specific boundary S, with respect to hypothesis space H and training data D, is the set of maximally specific members of H consistent with D. – S ≡ { s ∈H | Consistent( s, D ) ∧ (¬∃s’∈H)[(s>g s’) ∧Consistent(s’,D)]}   • Version Space representation(thm) – VSH,D = { h∈H | (∃s∈S)(∃g∈G) (g≥gh≥gs)}

Candidate-Elimination Learning Algorithm Initialize G to the set of maximally general hypotheses in H Initialize S to the set of maximally specific hypotheses in H For each training example d, do If d is a positive example Remove from G any hypothesis inconsistent with d For each hypothesis s in S that is not consistent with d Remove s from S Add to S all minimal generalizations h of s such that h is consistent with d, and some member of G is more general than h Remove from S any hypothesis that is more general than another hypothesis in S   If d is a negative example Remove from S any hypothesis inconsistent with d For each hypothesis g in G that is not consistent with d Remove g from G Add to G all minimal specializations h of g such that h is consistent with d, and some member of S is more specific than h Remove from G any hypothesis that is less general than another hypothesis in G

Process making Version Space { < ?, ?, ?, ?, ?, ? > } G0,1,2: { < Sunny, ?, ?, ?, ?, ? > }, { < ?, Warm, ?, ?, ?, ? > }, { < ?, ?, ?, ?, ?, Same > } G3: { < Ф, Ф, Ф, Ф, Ф, Ф > } S0: { < Sunny, Warm, Normal, Strong, Warm, Same > } S1: { < Sunny, Warm, ?, Strong, Warm, Same > } S2,3: { < Sunny, Warm, ?, Strong, ?, ? > } S4: { < Sunny, ?, ?, ?, ?, ? > }, { < ?, Warm, ?, ?, ?, ? > } G4: Example Sky Airtemp Humidity Wind Water Forecast EnjoySport 1 Sunny Warm Normal Strong Same Yes 2 High 3 Rainy Cold Change No 4 Cool

Remarks on Version Space and Candidate-Elimination • Will the C-E algorithm converge to the correct hypothesis? • What training example should the Learner request next? • How can partially learned concepts be used?  

Will the C-E algorithm converge to the correct hypothesis? • Converge if... –  there are no errors in the training examples –  there is some hypothesis in H that correctly describes the target concept • Error example may result empty version space • Similar symptom when target concept cannot be described in the hypothesis representation. What Training Example Should the Learner Request Next? • The term ‘query’ to refer to such instances constructed by the learner, which are then classified by an external oracle. • to find an optimal hypothesis among all hypotheses of VS , queries must be classified as positive by some of hypothesis in version space, but negative by others. • the optimal query is to generate instances that satisfy exactly half the version space. –   experiments required to find correct target function

How Can Partially Learned Concepts Be Used? • The instance is classified as positive if and only if the instance satisfies every member of S. • The instance is classified as negative if and only if the instance satisfies none of the members of G. • When Classified as pos. by some members of VS, as neg. by the other members of VS –   don’t know!! (Note that in this case, the Find-S algorithm outputs “negative”) –   Majority voting : not exact (just probability)

Inductive Bias Question • As discussed above we assumed that initial hypothesis space contain the target concept. • What if the target concept is not in the hypothesis space????   A Biased Hypothesis Space • Bias the learner to consider only conjunctive hypotheses. • Hypothesis space is unable to represent even simple disjunctive target concepts such as “Sky=Sunny or Sky=Cloudy”. • So, we need more expressive hypothesis space

An unbiased learner • Extend hypothesis space to the power set of X(every teachable concept!) –   e.g: <Sunny,?,?,?,?,?> ∨<Cloudy,?,?,?,?,?> • Problem: Unable to generalize beyond the observed examples. –   Positive example (x1,x2,x3) –   negative example (x4,x5) –   S:{(x1 ∨ x2 ∨ x3)} G:{¬(x4 ∨ x5)}  - S boundary will always be simply the disjunction of the observed positive examples, while the G boundary will always be the negated disjunction of the observed negative examples. – The only examples that will be classified by S and G are the observed training examples themselves. –   In order to converge to a single, final target concept, we will have to present every single instance in X as a training example!

The Futility of Bias-Free Learning • Property of inductive inference: – a learner that makes no a priori assumptions regarding the identity of the target concept has no rational basis for classifying any unseen instances. •      (Dc ∧ xi) > L(xi , Dc) –   y>z : z is inductively inferred from y •      (B ∧ Dc ∧ xi) ├ L(xi , Dc) –   y ├z: z follows deductively from y Notation) Dc : an arbitrary set of training data xi : a new instance L : An inductive learning algorithm L(xi , Dc) : the classification that L assigns to xi after learning from the training data Dc

Inductive Bias – concept learning algo. L • Consider – concept learning algo. L – instance X, target concept c – training examples Dc={<x, c(x)>} – Let L(Xi, Dc) denote the classification assigned to the instance xi by L after training on the data Dc. • Definition – inductive bias B of L is minimal set of assertion B such that for any target concept c and corresponding training example Dc – ∀(xi ∈X)[(B ∧ Dc ∧ xi) ├ L(xi ∧ Dc)]   • Inductive bias of C-E algorithm – The target concept c is contained in the given hypothesis space H.

• Advantage of inductive bias – provides nonprocedural means of characterizing their policy for generalizing beyond the observed data – comparison of different learners according to the strength of the inductive bias   • Consider three learning algorithms, which are listed from weakest to strongest bias. 1. Rote-learning : storing each observed training example in memory. If the instance is found in memory,the stored classification is returned. Inductive bias : nothing – Weakest bias 2. Candidate-Elimination algo : new instances are classified only in the case where all members of the current version space agree in the classification. Inductive bias : Target concept can be represented in its hypothesis space

3. Find-S : find the most specific hypothesis consistent with the training examples. It then uses this hypothesis to classify all subsequent instances. Inductive bias : Target concept can be represented in its hypothesis space + All instances are negative instances unless the opposite is entailed by its other knowledge – Strongest bias More strongly biased methods make more inductive leaps, classifying a greater proportion of unseen instances!!