Chapter 2: Concept Learning and the General-to-Specific Ordering.

Slides:



Advertisements
Similar presentations
Concept Learning and the General-to-Specific Ordering
Advertisements

2. Concept Learning 2.1 Introduction
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Concept Learning DefinitionsDefinitions Search Space and General-Specific OrderingSearch Space and General-Specific Ordering The Candidate Elimination.
Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수
CS 484 – Artificial Intelligence1 Announcements Project 1 is due Tuesday, October 16 Send me the name of your konane bot Midterm is Thursday, October 18.
Università di Milano-Bicocca Laurea Magistrale in Informatica
Machine Learning II Decision Tree Induction CSE 473.
Decision Trees. DEFINE: Set X of Instances (of n-tuples x = ) –E.g., days decribed by attributes (or features): Sky, Temp, Humidity, Wind, Water, Forecast.
Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007 Machine Learning Lecture 2: Concept Learning and Version Spaces 1.
Chapter 2 - Concept learning
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Concept Learning and Version Spaces
Artificial Intelligence 6. Machine Learning, Version Space Method
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, January 19, 2001.
Computing & Information Sciences Kansas State University Lecture 01 of 42 Wednesday, 24 January 2008 William H. Hsu Department of Computing and Information.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Friday, 25 January 2008 William.
For Monday No reading Homework: –Chapter 18, exercises 1 and 2.
CS 484 – Artificial Intelligence1 Announcements List of 5 source for research paper Homework 5 due Tuesday, October 30 Book Review due Tuesday, October.
For Monday Read chapter 18, sections 5-6 Homework: –Chapter 18, exercises 1-2.
For Friday Read chapter 18, sections 3-4 Homework: –Chapter 14, exercise 12 a, b, d.
1 Machine Learning What is learning?. 2 Machine Learning What is learning? “That is what learning is. You suddenly understand something you've understood.
Machine Learning Chapter 11.
CpSc 810: Machine Learning Decision Tree Learning.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, January 19, 2000.
For Friday Finish Chapter 18 Homework: –Chapter 18, exercises 1-2.
General-to-Specific Ordering. 8/29/03Logic Based Classification2 SkyAirTempHumidityWindWaterForecastEnjoySport SunnyWarmNormalStrongWarmSameYes SunnyWarmHighStrongWarmSameYes.
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Gun Ho Lee Soongsil University, Seoul.
1 Concept Learning By Dong Xu State Key Lab of CAD&CG, ZJU.
机器学习 陈昱 北京大学计算机科学技术研究所 信息安全工程研究中心. 课程基本信息  主讲教师:陈昱 Tel :  助教:程再兴, Tel :  课程网页:
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 22, 2001 William.
CpSc 810: Machine Learning Concept Learning and General to Specific Ordering.
Concept Learning and the General-to-Specific Ordering 이 종우 자연언어처리연구실.
Outline Inductive bias General-to specific ordering of hypotheses
Overview Concept Learning Representation Inductive Learning Hypothesis
CS Machine Learning 15 Jan Inductive Classification.
For Monday Finish chapter 19 Take-home exam due. Program 4 Any questions?
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 2-Concept Learning (1/3) Eduardo Poggi
Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Thursday, August 26, 1999 William.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 34 of 41 Wednesday, 10.
Machine Learning: Lecture 2
Machine Learning Concept Learning General-to Specific Ordering
1 Decision Trees ExampleSkyAirTempHumidityWindWaterForecastEnjoySport 1SunnyWarmNormalStrongWarmSameYes 2SunnyWarmHighStrongWarmSameYes 3RainyColdHighStrongWarmChangeNo.
Kansas State University Department of Computing and Information Sciences CIS 690: Implementation of High-Performance Data Mining Systems Thursday, 20 May.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
CS464 Introduction to Machine Learning1 Concept Learning Inducing general functions from specific training examples is a main issue of machine learning.
Concept Learning and The General-To Specific Ordering
Computational Learning Theory Part 1: Preliminaries 1.
Concept learning Maria Simi, 2011/2012 Machine Learning, Tom Mitchell Mc Graw-Hill International Editions, 1997 (Cap 1, 2).
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Chapter 2 Concept Learning
Concept Learning Machine Learning by T. Mitchell (McGraw-Hill) Chp. 2
CSE543: Machine Learning Lecture 2: August 6, 2014
CS 9633 Machine Learning Concept Learning
Analytical Learning Discussion (4 of 4):
Machine Learning Chapter 2
Introduction to Machine Learning Algorithms in Bioinformatics: Part II
Ordering of Hypothesis Space
Concept Learning.
IES 511 Machine Learning Dr. Türker İnce (Lecture notes by Prof. T. M
Machine Learning: Lecture 6
Concept Learning Berlin Chen 2005 References:
Machine Learning Chapter 2
Implementation of Learning Systems
Version Space Machine Learning Fall 2018.
Machine Learning Chapter 2
Presentation transcript:

Chapter 2: Concept Learning and the General-to-Specific Ordering

Concept of Concepts t Examples of Concepts  “birds”, “car”, “situations in which I should study more in order to pass the exam” t Concept  Some subset of objects or events defined over a larger set, or  A boolean-valued function defined over this larger set.  Concept “birds” is the subset of animals that constitute birds.

Concept Learning t Learning  Inducing general functions from specific training examples t Concept learning  Acquiring the definition of a general category given a sample of positive and negative training examples of the category  Inferring a boolean-valued function from training examples of its input and output.

A Concept Learning Task t Target concept EnjoySport  “days on which Aldo enjoys water sport” t Hypothesis  A vector of 6 constraints, specifying the values of the six attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast.  For each attribute the hypo will either “?”, single value (e.g. Warm), or “0”  expresses the hypothesis that Aldo enjoys his favorite sport only on cold days with high humidity.

Training Examples for EnjoySport Instance Sky AirTemp Humidity Wind Water Forecast EnjoySport A Sunny Warm Normal Strong Warm Same No B Sunny Warm High Strong Warm Same Yes C Rainy Cold High Strong Warm Change No D Sunny Warm High Strong Cool Change Yes Training examples for the target concept EnjoySport

The Learning Task t Given:  Instances X: set of items over which the concept is defined.  Hypotheses H: conjunction of constraints on attributes.  Target concept c: c : X → {0, 1}  Training examples (positive/negative):  Training set D: available training examples t Determine:  A hypothesis h in H such that h(x) = c(x), for all x in X

Inductive Learning Hypothesis t Learning task is to determine h identical to c over the entire set of instances X. t But the only information about c is its value over D. t Inductive learning algorithms can at best guarantee that the induced h fits c over D. t Assumption is that the best h regarding unseen instances is the h that best fits the observed data in D. t Inductive learning hypothesis  Any good hypothesis over a sufficiently large set of training examples will also approximate the target function. well over unseen examples.

Concept Learning as Search t Search  Find a hypothesis that best fits training examples  Efficient search in hypothesis space (finite/infinite) t Search s pace in EnjoySport  3*2*2*2*2*2 = 96 distinct instances (eg. Sky={Sunny, Cloudy, Rainy}  5*4*4*4*4*4 = 5120 syntactically distinct hypotheses within H (considering 0 and ? in addition)  1+4*3*3*3*3*3 = 973 semantically distinct hypotheses (count just one 0 for each attribute since every hypo having one or more 0 symbols is empty)

General-to-Specific Ordering t G eneral-to-specific ordering of hypotheses: t x satisfies h ⇔ h(x)=1 t M ore_general_than_or_equal_to relation t (S trictly) more_general_than relation t > g

More_General_Than Relation

Find-S Find-S: Finding a Maximally Specific Hypothesis 1. Initialize h to the most specific hypothesis in H 2. For each positive training example x For each attribute constraint a i in h If the constraint a i is satisfied by x Then do nothing Else replace a i in h by the next more general constraint satisfied by x 3. Output hypothesis h

Find-S Hypothesis Space Search by Find-S

Find-S Properties of Find-S t Ignores every negative example (no revision to h required in response to negative examples). Why? What’re the assumptions for this? t Guaranteed to output the most specific hypothesis consistent with the positive training examples (for conjunctive hypothesis space). t Final h also consistent with negative examples provided the target c is in H and no error in D.

Weaknesses of Find-S t Has the learner converged to the correct target concept? N o way to know whether the solution is unique. t Why prefer the most specific hypothesis? How about the most general hypothesis? t Are the training examples consistent? Training sets containing errors or noise can severely mislead the algorithm Find-S. t What if there are several maximally specific consistent hypotheses? No backtrack to explore a different branch of partial ordering.

Version Spaces (VSs) t Output all hypotheses consistent with the training examples. t Version space  Consistent(h,D) ⇔ ( ∀  D) h(x) = c(x)  VS H,D ⇔ {h  H | Consistent(h,D)} t List-Then-Eliminate Algorithm  Lists all hypotheses, then removes inconsistent ones.  Applicable to finite H

Compact Representation of VSs t More compact representation for version spaces  General boundary G  Specific boundary S  Version Space redefined with S and G

A Version Space with S and G Boundaries

CE: Candidate-Elimination Algorithm Initialize G to the set of maximally general hypotheses in H Initialize S to the set of maximally specific hypotheses in H For each training example d, do If d is a positive example Remove from G any hypothesis inconsistent with d For each hypothesis s in S that is not consistent with d Remove s from S Add to S all minimal generalizations h of s such that h is consistent with d, and some member of G is more general than h Remove from S any hypothesis that is more general than another hypothesis in S

If d is a negative example Remove from S any hypothesis inconsistent with d For each hypothesis g in G that is not consistent with d Remove g from G Add to G all minimal specializations h of g such that h is consistent with d, and some member of S is more specific than h Remove from G any hypothesis that is less general than another hypothesis in G Candidate-Elimination Algorithm

Given First Two Examples

After the Third Example

After the Fourth Example

The Concept Learned

Remarks on Candidate Elimination t Will the CE algorithm converge to the correct hypothesis? t What training example should the learner request next? t How can partially learned concepts be used?

When Does CE Converge? t Will the Candidate-Elimination algorithm converge to the correct hypothesis? t Prerequisites 1. No error in training examples 2. The target hypothesis exists which correctly describes c(x). t If S and G boundary sets converge to an empty set, this means there is no hypothesis in H consistent with observed examples.

Who Provides Examples? t What training example should the learner request next? t Two methods  Fully supervised learning: External teacher provides all training examples (input + correct output)  Learning by query: The learner generates instances (queries) by conducting experiments, then obtains the correct classification for this instance from an external oracle (nature or a teacher). t Negative training examples specializes G, positive ones generalize S.

Optimal Query Strategies t What would be a good query? The learner should attempt to discriminate among alternative competing hypotheses in its current version space. t A good query is the one that is classified positive by some of these hypos, but negative by others. t In general, the optimal query strategy for a concept learner is to generate instances that satisfy exactly half the hypos in the current version space. t Experiments needed to find the correct target concept:

How to Use Partially Learned Concepts? t Suppose the learner is asked to classify the four new instances shown in the following table. Instance Sky AirTemp Humidity Wind Water Forecast EnjoySport A Sunny Warm Normal Strong Cool Change ? B Rainy Cold Normal Light Warm Same ? C Sunny Warm Normal Light Warm Same ? D Sunny Cold Normal Strong Warm Same ? A: classified as positive by all hypos in the current version space (Fig. 2.3) B: classified as negative by all hypos C: 3 positive, 3 negative D: 2 positive, 4 negative (can be decided by majority vote, for example)

Partially Learned VS Revisited

t CE will converge toward the target concept provided that it is contained in its initial hypo space and training examples contain no errors. t What if the target concept is not contained in the hypo space? t One solution: Use a hypothesis space that includes every possible hypothesis (more expressive hypo space). t New problem: Generalize poorly or do not generalize at all. Fundamental Questions for Inductive Inference

Inductive Bias t EnjoySport: H contains only conjunctions of attribute values. t This H is unable to represent even simple disjunctive target concepts such as ∨ t Given the following three training examples of this disjunctive hypothesis, CE would find that there are zero hypo in VS.

t The problem is that we have biased the learner to consider only conjunctive hypotheses. We require more expressive hypothesis space. - Example Sky AirTemp Humidity Wind Water Forecast EnjoySport 1 Sunny Warm Normal Strong Cool Change Yes 2 Cloudy Warm Normal Strong Cool Change Yes 3 Rainy Warm Normal Strong Cool Change No A Biased Hypothesis Space

An Unbiased Learner t One solution: Provide H contains every teachable concept (every possible subset of instances X).  Power set of X: set of all subsets of a set X EnjoySport: |X| = 96  Size of the power set: 2 |X| = 2 96 = (the number of distinct target concepts) t In contrast, our conjunctive H contains only 973 (semantically distinct) of these. t New problem: unable to generalize beyond the observed examples.  Observed examples are only unambiguously classified.  Voting results in no majority or minority.

Futility of Bias-Free Learning t Fundamental property of inductive inference: “A learner that makes no a priori assumptions regarding the identity of the target concept has no rational basis for classifying any unseen instances.”

t L: an arbitrary learning algorithm t c: some arbitrary target concept t D c = { }: an arbitrary set of training data t L(x i, D c ): classification that L assigns to x i after learning D c. t Inductive inference step performed by L: (D c ^ x i ) I> L(x i, D c ) Inductive Inference

Inductive Bias Formally Defined t Because L is an inductive learning algorithm, the result L(x i, D c ): will not in general provably correct; L need not follow deductively from D c and x i. t However, additional assumptions can be added to D c ^ x i so that L(x i, D c ) would follow deductively. t Definition: The inductive bias of L is any minimal set of assertions B (assumptions, background knowledge etc.) such that for any target concept c and corresponding training examples D c

Inductive Bias of CE Algorithm t Given the assumption c ∈ H, the inductive inference performed by the CE algorithm can be justified deductively. Why?  If we assume c ∈ H, it follows deductively that c ∈ VS H,Dc.  Since we defined L(x i, D c ) to be unanimous vote of all hypos in VS, if L outputs the classification L(x i, D c ), it must be the case the every hypo in L(x i, D c ) also produces this classification, including the hypo c ∈ VS H,Dc. t Inductive bias of CE: The target concept c is contained in the given hypothesis space H.

Inductive & Deductive Systems

Strength of Inductive Biases (1) Rote-Learner: weakest (no bias) (2) Candidate-Elimination Algorithm (3) Find-S: strongest bias of the three

Inductive Bias of Rote-Learner t Simply stores each observed training example in memory. t New instances are classified by looking them up in memory:  If it is found in memory, the stored classification is returned.  Otherwise, the system refuses to classify the new instance. t No inductive bias: The classifications for new instances follow deductively from D with no additional assumptions required.

Inductive Bias of Cand.-Ellim. t New instances are classified only if all hypos in VS agree. Otherwise, it refuses to classify. t Inductive bias: The target concept can be represented in its hypothesis space. t This inductive bias is stronger than that of rote-learner since CE will classify some instances that the rote-learner will not.

Inductive Bias of Find-S t Find the most specific hypo consistent with D and uses this hypo to classify new instances. t Even stronger inductive biase  The target concept can be described in its hypo space.  All instances are negative unless opposite is entailed by its other knowledge (default reasoning)

Summary (1/3) t Concept learning can be cast as a problem of searching through a large predefined space of potential hypotheses. t General-to-specific partial ordering of hypotheses provides a useful structure for search. t Find-S algorithm performs specific-to- general search to find the most specific hypothesis.

t Candidate-Elimination algorithm computes version space by incrementally computing the sets of maximally specific (S) and maximally general (G) hypotheses. t S and G delimit the entire set of hypotheses consistent with the data. t Version spaces and Candidate-Elimination algorithm provide a useful conceptual framework for studying concept learning. Summary (2/3)

Summary (3/3) t Candidate-Elimination algorithm is not robust to noisy data or to situations where the unknown target concept is not expressible in the provided hypothesis space. t Inductive bias in Candidate-Elimination algorithm is that target concept exists in H t If the hypothesis space is enriched to the point where there is every possible hypothesis (the power set of instances), then this will remove the inductive bias of CE and thus remove the ability to classify any instance beyond the observed examples.

Home Work t Exercise 2.1 t Exercise 2.5