Version Spaces Learning

Slides:



Advertisements
Similar presentations
Concept Learning and the General-to-Specific Ordering
Advertisements

2. Concept Learning 2.1 Introduction
Rulebase Expert System and Uncertainty. Rule-based ES Rules as a knowledge representation technique Type of rules :- relation, recommendation, directive,
Concept Learning DefinitionsDefinitions Search Space and General-Specific OrderingSearch Space and General-Specific Ordering The Candidate Elimination.
Università di Milano-Bicocca Laurea Magistrale in Informatica
Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007 Machine Learning Lecture 2: Concept Learning and Version Spaces 1.
Machine Learning: Symbol-Based
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Machine Learning Version Spaces Learning. 2  Neural Net approaches  Symbolic approaches:  version spaces  decision trees  knowledge discovery  data.
Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias.
10.4 How to Find a Perfect Matching We have a condition for the existence of a perfect matching in a graph that is necessary and sufficient. Does this.
Machine Learning CSE 681 CH2 - Supervised Learning.
General-to-Specific Ordering. 8/29/03Logic Based Classification2 SkyAirTempHumidityWindWaterForecastEnjoySport SunnyWarmNormalStrongWarmSameYes SunnyWarmHighStrongWarmSameYes.
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 22, 2001 William.
STAR Sti, main features V. Perevoztchikov Brookhaven National Laboratory,USA.
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
Concept Learning and the General-to-Specific Ordering 이 종우 자연언어처리연구실.
Outline Inductive bias General-to specific ordering of hypotheses
Overview Concept Learning Representation Inductive Learning Hypothesis
1 Inductive Learning (continued) Chapter 19 Slides for Ch. 19 by J.C. Latombe.
KU NLP Machine Learning1 Ch 9. Machine Learning: Symbol- based  9.0 Introduction  9.1 A Framework for Symbol-Based Learning  9.2 Version Space Search.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 2-Concept Learning (1/3) Eduardo Poggi
Machine Learning A Quick look Sources: Artificial Intelligence – Russell & Norvig Artifical Intelligence - Luger By: Héctor Muñoz-Avila.
Machine Learning Concept Learning General-to Specific Ordering
Ensemble Methods in Machine Learning
Data Mining and Decision Support
Concept Learning and The General-To Specific Ordering
Computational Learning Theory Part 1: Preliminaries 1.
Concept learning Maria Simi, 2011/2012 Machine Learning, Tom Mitchell Mc Graw-Hill International Editions, 1997 (Cap 1, 2).
Inductive Learning (2/2) Version Space and PAC Learning Russell and Norvig: Chapter 18, Sections 18.5 through 18.7 Chapter 18, Section 18.5 Chapter 19,
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Lecture 1.31 Criteria for optimal reception of radio signals.
Machine Learning in Practice Lecture 18
CS 9633 Machine Learning Explanation Based Learning
Chapter 2 Concept Learning
SETS AND VENN DIAGRAMS.
Introduce to machine learning
Topic: Conditional Statements – Part 1
Concept Learning Machine Learning by T. Mitchell (McGraw-Hill) Chp. 2
CS 9633 Machine Learning Concept Learning
Analytical Learning Discussion (4 of 4):
Challenges in Creating an Automated Protein Structure Metaserver
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Reasoning About Code.
Reasoning about code CSE 331 University of Washington.
Ordering of Hypothesis Space
Data Mining Lecture 11.
Monday, April 16, 2018 Announcements… For Today…
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Lecture 7: Introduction to Parsing (Syntax Analysis)
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Concept Learning.
Machine Learning in Practice Lecture 17
Machine Learning: Lecture 6
Machine learning: building agents that are capable to learn from their own experience An autonomous agent is expected to learn from its own experience,
Machine Learning: UNIT-3 CHAPTER-1
Machine Learning Chapter 2
Inductive Learning (2/2) Version Space and PAC Learning
Implementation of Learning Systems
Version Space Machine Learning Fall 2018.
Machine Learning Chapter 2
Presentation transcript:

Version Spaces Learning Machine Learning Version Spaces Learning

Introduction: Very many approaches to machine learning: Neural Net approaches Symbolic approaches: version spaces decision trees knowledge discovery data mining speed up learning inductive learning …

A concept learning technique based on refining models of the world. Version Spaces A concept learning technique based on refining models of the world.

Concept Learning Example: + - + - - a student has the following observations about having an allergic reaction after meals: Restaurant Meal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No + - + - - Concept to learn: under which circumstances do I get an allergic reaction after meals ??

In general There is a set of all possible events: 3 X 3 X 7 X 2 = 126 Example: Restaurant Meal Day Cost 3 X 3 X 7 X 2 = 126 There is a boolean function (implicitly) defined on this set. Example: Reaction: Restaurant X Meal X Day X Cost --> Bool We have the value of this function for SOME examples only. Find an inductive ‘guess’ of the concept, that covers all the examples!

Pictured: + - Set of all possible events Given some examples:  Find a concept that covers all the positive examples and none of the negative ones! 

Non-determinism + - Many different ways to solve this! Set of all possible events  + - How to choose ?

An obvious, but bad choice: Restaurant Meal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No - + The concept IS: Alma 3 and breakfast and Friday and cheap OR Alma 3 and lunch and Saturday and cheap Does NOT generalize the examples any !!

Pictured: + + + - + - - - Only the positive examples are positive ! Set of all possible events  + + + - + - - -

Equally bad is: + - The concept is anything EXCEPT: Restaurant Meal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No + - The concept is anything EXCEPT: De Moete and lunch and Friday and expensive AND Sedes and breakfast and Sunday and cheap AND Alma 3 and breakfast and Sunday and expensive

Pictured: Everything except the negative examples are positive: Set of all possible events     +  +   +     -  +       -   -       -        

Solution: fix a language of hypotheses: We introduce a fix language of concept descriptions. = hypothesis space The concept can only be identified as being one of the hypotheses in this language avoids the problem of having ‘useless’ conclusions forces some generalization/induction to cover more than just the given examples.

Reaction - Example: + - + - - Every hypothesis is a 4-tuple: Restaurant Meal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No + - + - - Every hypothesis is a 4-tuple: most general hypothesis: [ ?, ?, ?, ?] maximally specific: ex.: [ Sedes, lunch, Monday, cheap] combinations of ? and values are allowed: ex.: [ De Moete, ?, ?, expensive] or [ ?, lunch, ?, ?] One more hypothesis:  (bottom = denotes no example)

Hypotheses relate to sets of possible events  Events x2 x1 x1 = < Alma 3, lunch, Monday, expensive> x2 = < Sedes, lunch, Sunday, cheap> h1 = [?, lunch, Monday, ?] h2 = [?, lunch, ?, cheap] h3 = [?, lunch, ?, ?] General Specific  Hypotheses h2 h3 h1

Expressive power of this hypothesis language: Conjunctions of explicit, individual properties Examples: [?, lunch, Monday, ?] : Meal = lunch  Day = Monday [?, lunch, ?, cheap] : Meal = lunch  Cost = cheap [?, lunch, ?, ?] : Meal = lunch In addition to the 2 special hypotheses: [ ?, ?, ?, ?] : anything  : nothing

Other languages of hypotheses are allowed Example: identify the color of a given sequence of colored objects. the examples: red : + purple : - blue : + A useful language of hypotheses: any_color mono_color polyo_color red blue green orange purple 

Important about hypothesis languages: They should have a specific <-> general ordering. Corresponding to the set-inclusion of the events they cover.

Defining Concept Learning: Given: A set X of possible events : Ex.: Eat-events: <Restaurant, Meal, Day, Cost> An (unknown) target function c: X -> {-, +} Ex.: Reaction: Eat-events -> { -, +} A language of hypotheses H Ex.: conjunctions: [ ?, lunch, Monday, ?] A set of training examples D, with their value under c Ex.: (<Alma 3, breakfast,Friday,cheap>,+) , … Find: A hypothesis h in H such that for all x in D: x is covered by h  c(x) = +

The inductive learning hypothesis: If a hypothesis approximates the target function well over a sufficiently large number of examples, then the hypothesis will also approximate the target function well on other unobserved examples.

Find-S: a naïve algorithm Initialize: h :=  For each positive training example x in D Do: If h does not cover x: Replace h by a minimal generalization of h that covers x Return h

Reaction example: + - + - -   h =  Restaurant Meal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No + - + - - Initially: h =  no more positive examples: return h Example 2: h = [Alma 3, ?, ?,cheap]  Generalization = replace something by ? Example 1: h = [Alma 3,breakfast,Friday,cheap]  minimal generalizations = the individual events

Properties of Find-S Non-deterministic:  Depending on H, there maybe several minimal generalizations: Beth Jo Alex Hacker Scientist Footballplayer  H Beth can be minimally generalized in 2 ways to include a new example Jo.

Properties of Find-S (2) May pick incorrect hypothesis (w.r.t. the negative examples): D : Beth Alex Jo + - Beth Jo Alex Hacker Scientist Footballplayer  H Is a wrong solution

Properties of Find-S (3) Cannot detect inconsistency of the training data: D : Beth Jo + - Nor inability of the language H to learn the concept: Jo Alex Scientist  H Beth D : Beth Alex Jo + -

Nice about Find-S: It doesn’t have to remember previous examples !  X If the previous h already covered all previous examples, then a minimal generalization h’ will too ! h  X H  h’ If h already covered the 20 first examples, then h’ will as well

Dual Find-S: Initialize: h := [ ?, ?, .., ?] For each negative training example x in D Do: If h does cover x: Replace h by a minimal specialization of h that does not cover x Return h

Reaction example: + -    Initially: h = [ ?, ?, ?, ?] Example 1: Restaurant Meal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No + - Initially: h = [ ?, ?, ?, ?] Example 1:  h= [ ?, breakfast, ?, ?] Example 2:  h= [ Alma 3, breakfast, ?, ?] Example 3:  h= [ Alma 3, breakfast, ?, cheap]

Version Spaces: the idea: Perform both Find-S and Dual Find-S: Find-S deals with positive examples Dual Find-S deals with negative examples BUT: do NOT select 1 minimal generalization or specialization at each step, but keep track of ALL minimal generalizations or specializations

Version spaces: initialization G = { [ ?, ?, …, ?] } S = { } The version spaces G and S are initialized to be the smallest and the largest hypotheses only.

Negative examples: G = { [ ?, ?, …, ?] } G = {h1, h2, …, hn} S = { } Replace the top hypothesis by ALL minimal specializations that DO NOT cover the negative example. Invariant: only the hypotheses more specific than the ones of G are still possible: they don’t cover the negative example

Positive examples: G = {h1, h2, …, hn} S = {h1’, h2’, …,hm’ } S = { } Replace the bottom hypothesis by ALL minimal generalizations that DO cover the positive example. Invariant: only the hypotheses more general than the ones of S are still possible: they do cover the positive example

Later: negative examples S = {h1’, h2’, …,hm’ } G = {h1, h2, …, hn} G = {h1, h21, h22, …, hn} Replace the all hypotheses in G that cover a next negative example by ALL minimal specializations that DO NOT cover the negative example. Invariant: only the hypotheses more specific than the ones of G are still possible: they don’t cover the negative example

Later: positive examples G = {h1, h21, h22, …, hn} S = {h1’, h2’, …,hm’ } S = {h11’, h12’, h13’, h2’, …,hm’ } Replace the all hypotheses in S that do not cover a next positive example by ALL minimal generalizations that DO cover the example. Invariant: only the hypotheses more general than the ones of S are still possible: they do cover the positive example

Optimization: negative: G = {h1, h21, …, hn} S = {h1’, h2’, …,hm’ } … are more general than ... Only consider specializations of elements in G that are still more general than some specific hypothesis (in S) Invariant: on S used !

Optimization: positive: G = {h1, h21, h22, …, hn} S = {h13’, h2’, …,hm’ } … are more specific than ... Only consider generalizations of elements in S that are still more specific than some general hypothesis (in G) Invariant: on G used !

Pruning: negative examples G = {h1, h21, …, hn} S = {h1’, h3’, …,hm’ } Cover the last negative example! The new negative example can also be used to prune all the S - hypotheses that cover the negative example. Invariant only works for the previous examples, not the last one

Pruning: positive examples G = {h1, h21, …, hn} S = {h1’, h3’, …,hm’ } Don’t cover the last positive example The new positive example can also be used to prune all the G - hypotheses that do not cover the positive example.

Eliminate redundant hypotheses Obviously also for S ! G = {h1, h21, …, hn} S = {h1’, h3’, …,hm’ } Reason: Invariant acts as a wave front: anything above G is not allowed. The most general elements of G define the real boundary More specific than another general hypothesis If a hypothesis from G is more specific than another hypothesis from G: eliminate it !

Convergence: G = {h1, h21, h22, …, hn} S = {h13’, h2’, …,hm’ } Eventually, if G and S MAY get a common element: Version Spaces has converged to a solution. Remaining examples need to be verified for the solution.

Reaction example Initialization: [ ?, ?, ?, ?]  Most general Most specific

Alma3, breakfast, Friday, cheap: + Positive example: minimal generalization of  [ ?, ?, ?, ?]  [Alma3, breakfast, Friday, cheap]

DeMoete, lunch, Friday, expensive: - Negative example: minimal specialization of [ ?, ?, ?, ?] 15 possible specializations !! [Alma3, ?, ?, ?] [DeMoete, ?, ?, ?] [Sedes, ?, ?, ?] [?, breakfast, ?, ?] [?, lunch, ?, ?] [?, dinner, ?, ?] [?, ?, Monday, ?] [?, ?, Tuesday, ?] [?, ?, Wednesday, ?] [?, ?, Thursday, ?] [?, ?, Friday, ?] [?, ?, Saturday, ?] [?, ?, Sunday, ?] [?, ?, ?, cheap] [?, ?, ?, expensive] Remain ! Matches the negative example X Does not generalize the specific model Specific model: [Alma3, breakfast, Friday, cheap] X

Result after example 2:  [ ?, ?, ?, ?] [Alma3, breakfast, Friday, cheap] [Alma3, ?, ?, ?] [?, breakfast, ?, ?] [?, ?, ?, cheap]

Alma3, lunch, Saturday, cheap: + Positive example: minimal generalization of [Alma3, breakfast, Friday, cheap] [Alma3, breakfast, Friday, cheap] [Alma3, ?, ?, ?] [?, breakfast, ?, ?] [?, ?, ?, cheap] does not match new example [Alma3, ? , ? , cheap]

Sedes, breakfast, Sunday, cheap: - Negative example: minimal specialization of the general models: [Alma3, ?, ?, ?] [?, ?, ?, cheap] [Alma3, ? , ? , cheap] [Alma3, ?, ?, cheap] The only specialization that is introduced is pruned, because it is more specific than another general hypothesis

Alma 3, breakfast, Sunday, expensive: - Negative example: minimal specialization of [Alma3, ?, ?, ?] [Alma3, ?, ?, ?] [Alma3, ? , ? , cheap] [Alma3, ?, ?, cheap] Same hypothesis !!! Cheap food at Alma3 produces the allergy !

Version Space Algorithm: Initially: G := { the hypothesis that covers everything} S := {} For each new positive example: Generalize all hypotheses in S that do not cover the example yet, but ensure the following: - Only introduce minimal changes on the hypotheses. - Each new specific hypothesis is a specialization of some general hypothesis. - No new specific hypothesis is a generalization of some other specific hypothesis. Prune away all hypotheses in G that do not cover the example.

Version Space Algorithm (2): . For each new negative example: Specialize all hypotheses in G that cover the example, but ensure the following: - Only introduce minimal changes on the hypotheses. - Each new general hypothesis is a generalization of some specific hypothesis. - No new general hypothesis is a specialization of some other general hypothesis. Prune away all hypotheses in S that cover the example. Until there are no more examples: report S and G OR S or G become empty: report failure

Properties of VS: Symmetry: positive and negative examples are dealt with in a completely dual way. Does not need to remember previous examples. Noise: VS cannot deal with noise ! If a positive example is given to be negative then VS eliminates the desired hypothesis from the Version Space G !

Termination: If it terminates because of “no more examples”: Example (spaces on termination): [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?] Then all these hypotheses, and all intermediate hypotheses, are still correct descriptions covering the test data. VS makes NO unnecessary choices !

Termination (2): If it terminates because of S or G being empty: Then either: The data is inconsistent (noise?) The target concept cannot be represented in the hypothesis-language H. [Alma3, breakfast, ?, cheap]  [Alma3, lunch, ?, cheap] <Alma3, dinner, Sunday, cheap> - <Alma3, breakfast, Sunday, cheap> + <Alma3, lunch, Sunday, cheap> + Example: target concept is: given examples like: this cannot be learned in our language H.

Which example next? VS can decide itself which example would be most useful next. It can ‘query’ a user for the most relevant additional classification ! Example: [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?] <Alma3,lunch,Monday,expensive> classified negative by 3 hypotheses classified positive by 3 hypotheses is the most informative new example

Use of partially learned concepts: [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?] Example: <Alma3,lunch,Monday,cheap> can be classified as positive is covered by all remaining hypotheses ! it is enough to check that it is covered by the hypotheses in S ! (all others generalize these)

Use of partially learned concepts (2): [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?] Example: <Sedes,lunch,Sunday,cheap> can be classified as negative is not covered by any remaining hypothesis ! it is enough to check that it is not covered by any hypothesis in G ! (all others specialize these)

Use of partially learned concepts (3): [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?] Example: <Alma3,lunch,Monday,expensive> can not be classified is covered by 3, not covered 3 hypotheses no conclusion

Use of partially learned concepts (4): [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?] Example: <Sedes,lunch,Monday,expensive> can only be classified with a certain degree of precision is covered by 1, not covered 5 hypotheses probably does not belong to the concept : ratio : 1/6

The relevance of inductive BIAS: choosing H Our hypothesis language L fails to learn some concepts. See example : [Alma3, breakfast, ?, cheap]  [Alma3, lunch, ?, cheap] What about choosing a more expressive language H? Assume H’: allows conjunctions (as before) allows disjunction and negation too ! Example: Restaurant = Alma3  ~(Day = Monday)

Inductive BIAS (2): This language H’ allows to represent ANY subset of the complete set of all events X But X has 126 elements we can express 2126 different hypotheses now ! Restaurant Meal Day Cost 3 X 3 X 7 X 2 = 126

Inductive BIAS (3) Version Spaces using H’: + - ?  Restaurant Meal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No + - ?  {~[DeMoete. Lunch, Friday, expensive]} {~[DeMoete. Lunch, Friday, expensive]  ~[Sedes, breakfast, Sunday, cheap]} {~[DeMoete. Lunch, Friday, expensive]  ~[Sedes, breakfast, Sunday, cheap]  ~[Alma 3, breakfast, Sunday, expensive]} {[Alma 3, breakfast, Friday, cheap] [Alma 3, lunch, Saturday, cheap]} {[Alma 3, breakfast, Friday, cheap]}

Inductive BIAS (3) Resulting Version Spaces: G = {~[DeMoete. Lunch, Friday, expensive]  ~[Sedes, breakfast, Sunday, cheap]  ~[Alma 3, breakfast, Sunday, expensive]} S = {[Alma 3, breakfast, Friday, cheap]  [Alma 3, lunch, Saturday, cheap]} We haven’t learned anything merely restated our positive and negative examples ! In general: in order to be able to learn, we need an inductive BIAS (= assumption): Example: “ The desired concept CAN be described as a conjunction of features “

Shift of Bias Practical approach to the Bias problem: Start VS with a very weak hypothesis language. If the concept is learned: o.k. Else Refine your language and restart VS Avoids the choice. Gives the most general concept that can be learned.