Concept Learning and the General-to-Specific Ordering

Slides:

Advertisements

Similar presentations

Computational Learning Theory

Advertisements

Artificial Neural Networks

Analytical Learning.

Combining Inductive and Analytical Learning

2. Concept Learning 2.1 Introduction

Machine Learning: Intro and Supervised Classification

1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)

Concept Learning DefinitionsDefinitions Search Space and General-Specific OrderingSearch Space and General-Specific Ordering The Candidate Elimination.

CS 484 – Artificial Intelligence1 Announcements Project 1 is due Tuesday, October 16 Send me the name of your konane bot Midterm is Thursday, October 18.

Università di Milano-Bicocca Laurea Magistrale in Informatica

Decision Tree Learning

Learning set of rules.

Decision Trees. DEFINE: Set X of Instances (of n-tuples x = ) –E.g., days decribed by attributes (or features): Sky, Temp, Humidity, Wind, Water, Forecast.

Instance-Based Learning

Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007 Machine Learning Lecture 2: Concept Learning and Version Spaces 1.

Chapter 2 - Concept learning

Machine Learning: Symbol-Based

Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.

Concept Learning and Version Spaces

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, January 19, 2001.

Computing & Information Sciences Kansas State University Lecture 01 of 42 Wednesday, 24 January 2008 William H. Hsu Department of Computing and Information.

CS 484 – Artificial Intelligence1 Announcements List of 5 source for research paper Homework 5 due Tuesday, October 30 Book Review due Tuesday, October.

CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias.

For Friday Read chapter 18, sections 3-4 Homework: –Chapter 14, exercise 12 a, b, d.

1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.

1 Machine Learning What is learning?. 2 Machine Learning What is learning? “That is what learning is. You suddenly understand something you've understood.

Machine Learning Chapter 11.

CpSc 810: Machine Learning Decision Tree Learning.

General-to-Specific Ordering. 8/29/03Logic Based Classification2 SkyAirTempHumidityWindWaterForecastEnjoySport SunnyWarmNormalStrongWarmSameYes SunnyWarmHighStrongWarmSameYes.

1 Concept Learning By Dong Xu State Key Lab of CAD&CG, ZJU.

机器学习陈昱北京大学计算机科学技术研究所信息安全工程研究中心. 课程基本信息  主讲教师：陈昱 Tel ：  助教：程再兴， Tel ：  课程网页：

Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 22, 2001 William.

Chapter 2: Concept Learning and the General-to-Specific Ordering.

CpSc 810: Machine Learning Concept Learning and General to Specific Ordering.

Concept Learning and the General-to-Specific Ordering 이 종우 자연언어처리연구실.

Outline Inductive bias General-to specific ordering of hypotheses

Overview Concept Learning Representation Inductive Learning Hypothesis

For Monday Finish chapter 19 Take-home exam due. Program 4 Any questions?

1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 2-Concept Learning (1/3) Eduardo Poggi

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Thursday, August 26, 1999 William.

Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 34 of 41 Wednesday, 10.

Machine Learning: Lecture 2

Machine Learning Concept Learning General-to Specific Ordering

1 Decision Trees ExampleSkyAirTempHumidityWindWaterForecastEnjoySport 1SunnyWarmNormalStrongWarmSameYes 2SunnyWarmHighStrongWarmSameYes 3RainyColdHighStrongWarmChangeNo.

Kansas State University Department of Computing and Information Sciences CIS 690: Implementation of High-Performance Data Mining Systems Thursday, 20 May.

CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.

CS464 Introduction to Machine Learning1 Concept Learning Inducing general functions from specific training examples is a main issue of machine learning.

Concept Learning and The General-To Specific Ordering

Computational Learning Theory Part 1: Preliminaries 1.

Concept learning Maria Simi, 2011/2012 Machine Learning, Tom Mitchell Mc Graw-Hill International Editions, 1997 (Cap 1, 2).

Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.

Chapter 2 Concept Learning

Concept Learning Machine Learning by T. Mitchell (McGraw-Hill) Chp. 2

CSE543: Machine Learning Lecture 2: August 6, 2014

CS 9633 Machine Learning Concept Learning

Analytical Learning Discussion (4 of 4):

Machine Learning Chapter 2

Ordering of Hypothesis Space

Concept Learning.

IES 511 Machine Learning Dr. Türker İnce (Lecture notes by Prof. T. M

Concept Learning Berlin Chen 2005 References:

Machine Learning Chapter 2

Implementation of Learning Systems

Version Space Machine Learning Fall 2018.

Machine Learning Chapter 2

Presentation transcript:

Concept Learning and the General-to-Specific Ordering

Context The learning task Concept learning as search Notation The inductive learning hypothesis Concept learning as search Find-S: finding a maximally specific hypothesis Version Space and the Candidate-Elimination algorithm Remarks Inductive Bias Summary

The learning task Concept learning: Deriving a boolean-valued function from training examples (inputs and output).

Notation Example: Days on which my friend Aldo enjoys his favorite water sport Representation: hypothesis consists of constraints on the instance attributes For each attribute: ?: any value is acceptable : no value is acceptable Example: <?,Cold,High,?,?,?> Most general hypothesis: every day a is positive example: <?,?,?,?,?> Most specific hypothesis: no day is a positive example:

Notation 2 Given Instance X: some possible days, each described by the attributes (Sky with possible values Sunny, Cloudy and Rainy, AirTemp with Warm, Cold, ...) Hypothesis H: each hypothesis is described by a conjunction of constraints on the attributes Sky, Airtemp, Humidity, Wind, Water and Forecast. The constraint contains '?', ' ' and/or specific values Target concept c: EnjoySport Training examples D: positive and negative examples of target function Determine A hypothesis h in H such that h(x)=c(x) for all x in X

Notation 3 Set of training examples: each consisting of an instance x from X along with its target concept value c(x) <x, c(x)> c(x) = 1: pos. example or member of the target concept c(x) =0: neg. example or non-member of the target concept

The inductive learning hypothesis The only information available about c is its value over the training examples Therefore an inductive learning algorithm can at best guarantee that the output hypothesis fits the target concept over the training data Fundamental assumption: the best hypothesis regarding unseen instances is the hypothesis that best fits the observed data The Inductive Learning Hypothesis: any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples.

Content The learning task Concept learning as search General-to-Specific Ordering of Hypothesis Find-S: finding a maximally specific hypothesis Version Space and the Candidate-Elimination algorithm Remarks Inductive Bias Summary

Concept learning as search Note: by selecting a hypothesis representation the designer of the learning algorithm implicitly defines the space of all hypotheses, that the program can ever represent and therefore can ever learn. Homework: Example: 3*2*2*2*2*2= 96 distinct instance Example: 5*4*4*4*4*4= 5120 syntactically distinct hypothesis Example: 1+(4*3*3*3*3*3)= 973 semantically distinct hypothesis

General-to-Specific Ordering of Hypothesis Example general-to-specific: second is less constrained ->it classifies more positive instances In detail: for any instance x in X and hypothesis h in H, we say that x satisfies h if and only if h(x) = 1 Let and be boolean-valued functions defined over X. Then is more_general_than_or_equal_to (written ) if and only if Let and be boolean-valued function defined over X. Then is more_general_than (written ) if and only if

General-to-Specific Ordering of Hypothesis 2 The relations are defined independently of the target concept The relation more_general_than_or_equal_to defines a partial order over the hypothesis space H Informally: there may be pairs of and , such that and

Example Each hypothesis corresponds to some subset of X (the subset of instances that it classifies positive) The arrow represents the more_general_than relation with the arrow pointing toward the less general hypothesis

Find-S: finding a maximally specific hypothesis Use the more_general_than partial ordering: Begin with the most specific possible hypothesis in H Generalise this hypothesis each time it fails to cover an observed positive example 1. Initialise h to the most specific hypothesis in H 2. For each positive training instance x For each attribute constraint in h If the constraint is satisfied by x Then do nothing Else replace in h by the next more general constraint that is satisfied by x 3. Output hypothesis h

Find-S: finding a maximally specific hypothesis (example) 1. Step: 2.Step: 1.Example + 1 Step: 3. Step: substituting a '?' in place of any attribute value in h that is not satisfied by new example 3. negative Example:FIND-S algorithm simply ignores every negative example 4.Step:

Remarks In the general case, as long as we assume that the hypothesis space H contains a hypothesis that describes the true target concept c and that the training data contains no errors, then the current hypothesis h can never require a revision in response to a negative example. Why? The current hypothesis is the most specific consistent one with the observed positive example Target concept c must be more_general_than_or_equal_to h But the target concept c will never cover a negative example, thus neither h. Therefore no revision to h will be required in response to any negative example In the literature there are many different algorithms that use the same more_general_than partial ordering

Content Representation The list-then-eliminate algorithm The learning task Concept learning as search Find-S: finding a maximally specific hypothesis Version Space and the Candidate-Elimination Algorithm Key Idea: output a description of the set of all hypotheses consistent with the training examples Representation The list-then-eliminate algorithm A more compact representation of the version space Candidate-elimination learning algorithm Example Remarks Inductive Bias Summary

Representation Definition: a hypothesis h is consistent with the set of training examples D if and only if h(x)=c(x) for each example in D Difference between consistent and satisfies: x satisfies h <=> h(x) = 1, no matter if x is a positive or negative example of the target concept x is consistent with h <=> h(x) =c(x), the hypothesis h has to classify the example correctly, corresponding to the target concept Definition: The version space denoted , with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with the training examples in D.

The List-Then-Eliminate algorithm Simplest way to represent the version space: list all of its elements: The List-Then-Eliminate Algorithm 1. Version Space <- a list containing every hypothesis in H 2. For each training example remove from VersionSpace any hypothesis h for which 3. Output the list of hypotheses in VersionSpace This algorithm cannot be applied whenever the hyp. space H is infinite Advantage: It is guarantied to output all hypotheses consistent with the training data Disadvantage: It's not efficiently computable (enumerating all hyp. in H).

A more compact representation of the version space Representation: The version space is represented by its most general and least general members. These two members form the general and specific boundary sets which delimit the version space within the partially ordered hyp. space.

Example of the List-Then-Eliminate Alg. The result: Arrows indicate more_general_than relation Candidate-Elimination represents the version space by storing only its most general members (labeled G) and its most specific ones (labeled S). With this two sets it is possible to enumerate all members of the version space needed for generating the hypothesis. The hypothesis we are looking for lies between these two sets in the general-to-specific partial ordering over hypothesis.

Definition of the Boundary Sets Definition: the general boundary G, with respect to hyp. space H and training data D, is the set of maximally general members of H consistent with D Definition: the specific boundary S, with respect to hyp. space H and training data D, is the set of minimally general (i.e. maximally specific) members of H consistent with D.

Definition of the Boundary Sets (2) Theorem: (version space representation theorem) let X be an arbitrary set of instances and let H be a set of boolean-valued hypotheses defined over X. Let be an arbitrary target concept defined over x and let D be an arbitrary set of training examples For all X, H, c and D such that S and G are well defined

Candidate-Eliminations Learning algorithm The algorithm computes the version space containing all hypotheses from H that are consistent with an observed sequence of training examples Begin: version space set of all hypotheses in H G: most general hypothesis in H: S: most specific hypothesis in H: Delimit the entire hypothesis space As each training example is considered the S and G boundary sets are generalised and specialised respectively to eliminate from the version space any hypothesis found inconsistent with the new training example. After all examples have been processed the computed version space contains all the hypotheses consistent with these examples and only these hypotheses.

Candidate-Elimination Learning algorithm 2 1. Initialise G to set of maximally general hypotheses in H 2. Initialise S to set of maximally specific hypotheses in H For each training example d, do If d is a positive example Remove from G any hyp. inconsistent with d For each hypothesis s in S that is not consistent with d Remove s from S Add to S all minimal generalisations h of s such that h is consistent with d and some member of G is more general than h Remove from S any hyp. that is more general than another hyp. in S If d is a negative example Remove from S any hyp. inconsistent with d For each hypothesis g in G that is not consistent with d Remove g from G Add to G all minimal specialisation h of g such that h is consistent with d and some member of S is more specific than h Remove from G any hyp. That is less general than another hyp. in G

An illustrative example 1st Example (Sunny,Warm, Normal, Strong, Warm, Same, Yes): S is overly specific: fails to cover this example Moving boundary to least more general hypothesis No update of the G boundary is needed 2nd Example (Sunny,Warm, High, Strong, Warm, Same, Yes): Generalising S, leaving G again unchanged

An illustrative example 2 3rd Example (Rainy, Cold, High, Strong, Warm, Change, No): It reveals that G is overly general (incorrectly predicts this example as positive -> G must be specialised There are 6 attributes That could be specified, why are there only 3 new hypothesis? For example is a min spec and correctly labels the new example but it is not included in WHY? This hypotheses is inconsistent with the previously encountered positive examples Algorithm determine this by noting h is not more general than the current specific boundary

An illustrative example 3

An illustrative example 4 4th Example (Sunny, Warm, High, Strong, Cool, Change, Yes): Generalises S Removing one member of G because this member fails to cover the new positive example. Why? It cannot be specialised (it would not make it cover the new example) It cannot be generalised (definition G; any more general hyp will cover at least one negative training example) Therefore the hypothesis must be dropped from G

An illustrative example 5 and delimit the version space of all hyp. consistent with the set of incrementally observed training examples This learned version space is independent of the sequence in which the training examples are presented

Content The learning task Concept learning as search Find-S: finding a maximally specific hypothesis Version Space and the Candidate-Elimination algorithm Remarks Does the candidate-elimination algorithm converge to a correct hypothesis? What training examples should the learner request next? Inductive Bias Summary

Does the Candidate-Elimination algorithm converge to the correct hypothesis? Convergence: There are no errors in the training examples There are some hyp. in H that correctly describe the target concept. The target concept is exactly learned when the S and G boundary sets converge to a single identical hypothesis What will happen if the training data contains errors Assume example 2 incorrectly as negative => remove the correct target concept (every hyp. inconsistent with the training examples is removed) Given sufficient additional training data the learner will eventually detect an inconsistency by noticing that S and G eventually converge to an empty version space.

What training example should the learner request next? Before, we assumed that the training examples are provided to the learner by some external teacher. Definition: Query: the learner is allowed to conduct experiments in which it chooses the next instance, then obtains the correct classification for this instance from an external oracle. Example EnjoySport: what would be a general query-strategy? Clearly the learner should choose an instance that would be classified positive by some hypothesis, but negative by others. Shrinking the version space from six hypothesis to half this number In general the optimal query-strategy for a concept learner is to generate instances that satisfy exactly half the hypotheses in the current version space When it is possible the correct target concept can found with experiments

Content The learning task Concept learning as search Find-S: finding a maximally specific hypothesis Version Space and the Candidate-Elimination algorithm Remarks Inductive Bias A biased hypothesis space An unbiased learner Summary

Inductive Bias Questions: What if the target concept is not contained in the hyp. space? Can this difficulty be avoided by using a hyp. space that includes every possible hypothesis? How does the size of this hypothesis space influence the ability of the algorithm to generalise the unobserved instance? How does the size of the hypothesis space influence the number of training examples that must be observed?

A biased hypothesis space Suppose we wish to assure that the hyp. space contains the unkown target concept Obvious solution: Enrich the hyp. space to include every possible hyp. Example: EnjoySport restricts hyp. space to include only the conjunctions of the attribute values In fact for these training examples the alg. would find that there are zero hyp. in the version space Why?: the most spec. hyp. consistent with the first two examples and representable in the given hyp space H is It is overly general: it incorrectly covers the third (negative example) Problem: bias: only the conjunctive hyp.

An Unbiased Learner Goal: Assuming the target concept is in the hypothesis space Solution: Provide a hyp. space which is capable of representing every teachable concept => capable of representing every possible subset of the instance X; in general the power set of X Example: EnjoySport size of instance space X is 96 => distinct target concepts Reformulate the example: defining H' corresponding to the powerset of X Allow arbitrary disjunction, conjunction and negation of the earlier hyp. One way to define such an H' for example Sky = Sunny or Sky = Cloudy

An Unbiased Learner 2 Candidate-Elimination alg. can be used BUT new problem: The concept learning alg. is now completely unable to generalize beyond the observed example Why? Suppose we represent 3 positive and 2 negative examples to the learner S will contain the hyp. which is just the disjunction of the positive examples G will cover the negative examples S and G will be always the disjungtion/ negated disjungtion of the training example => the only examples that will be unambiguously classified by S and G are the observed training examples themselves

Summary Concept learning can be seen as a problem of searching through a large predefined space of potential hypotheses The general_to_specific partial ordering of hyp. provides a useful structure for organizing the search through the hyp. space The FIND-S performs a specific-to general search along one branch of the partial ordering to find the most specific hyp. consistent with the training examples. The Candidate-Elimination incrementally computes the sets of maximally specific (S) and maximally general (G) hyp. S and G delimit the entire set of hyp. consistent with the data and provide to identify the target concept. By looking at S and G one can determine whether the learner has converged to the target concept, wether the training data is inconsistent, which next query would be the most useful to refine the version space. Version space and the Candidate-Elimination alg. are not robust to noisy data. If the unknown target concept is not expressible in the provided hypothesis space that arises many problems.