Machine Learning Chapter 11.

Slides:



Advertisements
Similar presentations
Concept Learning and the General-to-Specific Ordering
Advertisements

2. Concept Learning 2.1 Introduction
Concept Learning DefinitionsDefinitions Search Space and General-Specific OrderingSearch Space and General-Specific Ordering The Candidate Elimination.
Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수
CS 484 – Artificial Intelligence1 Announcements Project 1 is due Tuesday, October 16 Send me the name of your konane bot Midterm is Thursday, October 18.
1er. Escuela Red ProTIC - Tandil, de Abril, Introduction How to program computers to learn? Learning: Improving automatically with experience.
Università di Milano-Bicocca Laurea Magistrale in Informatica
Machine Learning II Decision Tree Induction CSE 473.
Decision Trees. DEFINE: Set X of Instances (of n-tuples x = ) –E.g., days decribed by attributes (or features): Sky, Temp, Humidity, Wind, Water, Forecast.
Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007 Machine Learning Lecture 2: Concept Learning and Version Spaces 1.
Chapter 2 - Concept learning
Machine Learning CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning.
Machine Learning: Symbol-Based
Concept Learning and Version Spaces
Artificial Intelligence 6. Machine Learning, Version Space Method
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, January 19, 2001.
Computing & Information Sciences Kansas State University Lecture 01 of 42 Wednesday, 24 January 2008 William H. Hsu Department of Computing and Information.
CS 484 – Artificial Intelligence1 Announcements List of 5 source for research paper Homework 5 due Tuesday, October 30 Book Review due Tuesday, October.
For Friday Read chapter 18, sections 3-4 Homework: –Chapter 14, exercise 12 a, b, d.
CpSc 810: Machine Learning Design a learning system.
1 Machine Learning What is learning?. 2 Machine Learning What is learning? “That is what learning is. You suddenly understand something you've understood.
Machine Learning CSE 681 CH2 - Supervised Learning.
For Friday Finish Chapter 18 Homework: –Chapter 18, exercises 1-2.
General-to-Specific Ordering. 8/29/03Logic Based Classification2 SkyAirTempHumidityWindWaterForecastEnjoySport SunnyWarmNormalStrongWarmSameYes SunnyWarmHighStrongWarmSameYes.
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Gun Ho Lee Soongsil University, Seoul.
1 Concept Learning By Dong Xu State Key Lab of CAD&CG, ZJU.
机器学习 陈昱 北京大学计算机科学技术研究所 信息安全工程研究中心. 课程基本信息  主讲教师:陈昱 Tel :  助教:程再兴, Tel :  课程网页:
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 22, 2001 William.
Chapter 2: Concept Learning and the General-to-Specific Ordering.
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
CpSc 810: Machine Learning Concept Learning and General to Specific Ordering.
Concept Learning and the General-to-Specific Ordering 이 종우 자연언어처리연구실.
Outline Inductive bias General-to specific ordering of hypotheses
Overview Concept Learning Representation Inductive Learning Hypothesis
CS Machine Learning 15 Jan Inductive Classification.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 2-Concept Learning (1/3) Eduardo Poggi
Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Thursday, August 26, 1999 William.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 34 of 41 Wednesday, 10.
Machine Learning: Lecture 2
Machine Learning Concept Learning General-to Specific Ordering
1 Decision Trees ExampleSkyAirTempHumidityWindWaterForecastEnjoySport 1SunnyWarmNormalStrongWarmSameYes 2SunnyWarmHighStrongWarmSameYes 3RainyColdHighStrongWarmChangeNo.
Kansas State University Department of Computing and Information Sciences CIS 690: Implementation of High-Performance Data Mining Systems Thursday, 20 May.
1 Introduction to Machine Learning Chapter 1. cont.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Introduction Machine Learning: Chapter 1. Contents Types of learning Applications of machine learning Disciplines related with machine learning Well-posed.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
CS464 Introduction to Machine Learning1 Concept Learning Inducing general functions from specific training examples is a main issue of machine learning.
Concept Learning and The General-To Specific Ordering
Computational Learning Theory Part 1: Preliminaries 1.
Concept learning Maria Simi, 2011/2012 Machine Learning, Tom Mitchell Mc Graw-Hill International Editions, 1997 (Cap 1, 2).
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Machine Learning & Datamining CSE 454. © Daniel S. Weld 2 Project Part 1 Feedback Serialization Java Supplied vs. Manual.
Supervise Learning Introduction. What is Learning Problem Learning = Improving with experience at some task – Improve over task T, – With respect to performance.
Data Mining Lecture 3.
Chapter 2 Concept Learning
Concept Learning Machine Learning by T. Mitchell (McGraw-Hill) Chp. 2
CSE543: Machine Learning Lecture 2: August 6, 2014
CS 9633 Machine Learning Concept Learning
Analytical Learning Discussion (4 of 4):
Machine Learning Chapter 2
Ordering of Hypothesis Space
Concept Learning.
Why Machine Learning Flood of data
Concept Learning Berlin Chen 2005 References:
Machine Learning Chapter 2
Version Space Machine Learning Fall 2018.
Machine Learning Chapter 2
Presentation transcript:

Machine Learning Chapter 11

Machine Learning What is learning?

Machine Learning What is learning? “That is what learning is. You suddenly understand something you've understood all your life, but in a new way.” (Doris Lessing – 2007 Nobel Prize in Literature)

Machine Learning How to construct programs that automatically improve with experience.

Machine Learning How to construct programs that automatically improve with experience. Learning problem: Task T Performance measure P Training experience E

Machine Learning Chess game: Task T: playing chess games Performance measure P: percent of games won against opponents Training experience E: playing practice games againts itself

Machine Learning Handwriting recognition: Task T: recognizing and classifying handwritten words Performance measure P: percent of words correctly classified Training experience E: handwritten words with given classifications

Designing a Learning System Choosing the training experience: Direct or indirect feedback Degree of learner's control Representative distribution of examples

Designing a Learning System Choosing the target function: Type of knowledge to be learned Function approximation

Designing a Learning System Choosing a representation for the target function: Expressive representation for a close function approximation Simple representation for simple training data and learning algorithms

Designing a Learning System Choosing a function approximation algorithm (learning algorithm)

Designing a Learning System Chess game: Task T: playing chess games Performance measure P: percent of games won against opponents Training experience E: playing practice games againts itself Target function: V: Board  R

Designing a Learning System Chess game: Target function representation: V^(b) = w0 + w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + w6x6 x1: the number of black pieces on the board x2: the number of red pieces on the board x3: the number of black kings on the board x4: the number of red kings on the board x5: the number of black pieces threatened by red x6: the number of red pieces threatened by black

Designing a Learning System Chess game: Function approximation algorithm: (<x1 = 3, x2 = 0, x3 = 1, x4 = 0, x5 = 0, x6 = 0>, 100) x1: the number of black pieces on the board x2: the number of red pieces on the board x3: the number of black kings on the board x4: the number of red kings on the board x5: the number of black pieces threatened by red x6: the number of red pieces threatened by black

Designing a Learning System What is learning?

Designing a Learning System Learning is an (endless) generalization or induction process.

Designing a Learning System Experiment Generator New problem (initial board) Hypothesis (V^) Performance System Generalizer Solution trace (game history) Training examples {(b1, V1), (b2, V2), ...} Critic

Issues in Machine Learning What learning algorithms to be used? How much training data is sufficient? When and how prior knowledge can guide the learning process? What is the best strategy for choosing a next training experience? What is the best way to reduce the learning task to one or more function approximation problems? How can the learner automatically alter its representation to improve its learning ability?

Example Experience Prediction Example Sky AirTemp Humidity Wind Water Forecast EnjoySport 1 Sunny Warm Normal Strong Same Yes 2 High 3 Rainy Cold Change No 4 Cool Low Weak Prediction 5 Rainy Cold High Strong Warm Change ? 6 Sunny Normal Same 7 Low Cool

Example Learning problem: Task T: classifying days on which my friend enjoys water sport Performance measure P: percent of days correctly classified Training experience E: days with given attributes and classifications

Concept Learning Inferring a boolean-valued function from training examples of its input (instances) and output (classifications).

Concept Learning Learning problem: Target concept: a subset of the set of instances X c: X  {0, 1} Target function: Sky  AirTemp  Humidity  Wind  Water  Forecast  {Yes, No} Hypothesis: Characteristics of all instances of the concept to be learned  Constraints on instance attributes h: X  {0, 1}

Concept Learning Satisfaction: Consistency: Correctness: h(x) = 1 iff x satisfies all the constraints of h h(x) = 0 otherwsie Consistency: h(x) = c(x) for every instance x of the training examples Correctness: h(x) = c(x) for every instance x of X

Concept Learning How to represent a hypothesis function?

<Sky, AirTemp, Humidity, Wind, Water, Forecast> Concept Learning Hypothesis representation (constraints on instance attributes): <Sky, AirTemp, Humidity, Wind, Water, Forecast> ?: any value is acceptable single required value : no value is acceptable

Concept Learning General-to-specific ordering of hypotheses: hj g hk iff xX: hk(x) = 1  hj(x) = 1 Specific h1 = <Sunny, ?, ?, Strong, ? , ?> h2 = <Sunny, ?, ?, ? , ? , ?> h3 = <Sunny, ?, ?, ? , Cool, ?> h1 h3 Lattice (Partial order) h2 H General

FIND-S h = <  ,  ,  ,  ,  ,  > Example Sky AirTemp Humidity Wind Water Forecast EnjoySport 1 Sunny Warm Normal Strong Same Yes 2 High 3 Rainy Cold Change No 4 Cool h = <  ,  ,  ,  ,  ,  > h = <Sunny, Warm, Normal, Strong, Warm, Same> h = <Sunny, Warm, ? , Strong, Warm, Same> h = <Sunny, Warm, ? , Strong, ? , ? >

FIND-S Initialize h to the most specific hypothesis in H: For each positive training instance x: For each attribute constraint ai in h: If the constraint is not satisfied by x Then replace ai by the next more general constraint satisfied by x Output hypothesis h

FIND-S Prediction h = <Sunny, Warm, ? , Strong, ? , ? > Example Sky AirTemp Humidity Wind Water Forecast EnjoySport 1 Sunny Warm Normal Strong Same Yes 2 High 3 Rainy Cold Change No 4 Cool h = <Sunny, Warm, ? , Strong, ? , ? > Prediction 5 Rainy Cold High Strong Warm Change No 6 Sunny Normal Same Yes 7 Low Cool

FIND-S The output hypothesis is the most specific one that satisfies all positive training examples.

FIND-S The result is consistent with the positive training examples.

FIND-S Is the result is consistent with the negative training examples?

FIND-S h = <Sunny, Warm, ? , Strong, ? , ? > Example Sky AirTemp Humidity Wind Water Forecast EnjoySport 1 Sunny Warm Normal Strong Same Yes 2 High 3 Rainy Cold Change No 4 Cool 5 h = <Sunny, Warm, ? , Strong, ? , ? >

FIND-S The result is consistent with the negative training examples if the target concept is contained in H (and the training examples are correct).

FIND-S The result is consistent with the negative training examples if the target concept is contained in H (and the training examples are correct). Sizes of the space: Size of the instance space: |X| = 3.2.2.2.2.2 = 96 Size of the concept space C = 2|X| = 296 Size of the hypothesis space H = (4.3.3.3.3.3) + 1 = 973 << 296  The target concept (in C) may not be contained in H.

FIND-S Questions: Has the learner converged to the target concept, as there can be several consistent hypotheses (with both positive and negative training examples)? Why the most specific hypothesis is preferred? What if there are several maximally specific consistent hypotheses? What if the training examples are not correct?

List-then-Eliminate Algorithm Version space: a set of all hypotheses that are consistent with the training examples. Algorithm: Initial version space = set containing every hypothesis in H For each training example <x, c(x)>, remove from the version space any hypothesis h for which h(x)  c(x) Output the hypotheses in the version space

List-then-Eliminate Algorithm Requires an exhaustive enumeration of all hypotheses in H

Compact Representation of Version Space G (the generic boundary): set of the most generic hypotheses of H consistent with the training data D: G = {gH | consistent(g, D)  g’H: g’ g g  consistent(g’, D)} S (the specific boundary): set of the most specific hypotheses of H consistent with the training data D: S = {sH | consistent(s, D)  s’H: s g s’  consistent(s’, D)}

Compact Representation of Version Space Version space = <G, S> = {hH | gG sS: g g h g s} S G

Candidate-Elimination Algorithm Example Sky AirTemp Humidity Wind Water Forecast EnjoySport 1 Sunny Warm Normal Strong Same Yes 2 High 3 Rainy Cold Change No 4 Cool S0 = {<, , , , , >} G0 = {<?, ?, ?, ?, ?, ?>} S1 = {<Sunny, Warm, Normal, Strong, Warm, Same>} G1 = {<?, ?, ?, ?, ?, ?>} S2 = {<Sunny, Warm, ?, Strong, Warm, Same>} G2 = {<?, ?, ?, ?, ?, ?>} S3 = {<Sunny, Warm, ?, Strong, Warm, Same>} G3 = {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Same>} S4 = {<Sunny, Warm, ?, Strong, ?, ?>} G4 = {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} S G

Candidate-Elimination Algorithm S4 = {<Sunny, Warm, ?, Strong, ?, ?>} <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> <?, Warm, ?, Strong, ?, ?> G4 = {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>}

Candidate-Elimination Algorithm Initialize G to the set of maximally general hypotheses in H Initialize S to the set of maximally specific hypotheses in H

Candidate-Elimination Algorithm For each positive example d: Remove from G any hypothesis inconsistent with d For each s in S that is inconsistent with d: Remove s from S Add to S all least generalizations h of s, such that h is consistent with d and some hypothesis in G is more general than h Remove from S any hypothesis that is more general than another hypothesis in S

Candidate-Elimination Algorithm For each negative example d: Remove from S any hypothesis inconsistent with d For each g in G that is inconsistent with d: Remove g from G Add to G all least specializations h of g, such that h is consistent with d and some hypothesis in S is more specific than h Remove from G any hypothesis that is more specific than another hypothesis in G

Candidate-Elimination Algorithm The version space will converge toward the correct target concepts if: H contains the correct target concept There are no errors in the training examples A training instance to be requested next should discriminate among the alternative hypotheses in the current version space:

Candidate-Elimination Algorithm Partially learned concept can be used to classify new instances using the majority rule. S4 = {<Sunny, Warm, ?, Strong, ?, ?>} <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> <?, Warm, ?, Strong, ?, ?> G4 = {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>}       5 Rainy Warm High Strong Cool Same ?

Inductive Bias Size of the instance space: |X| = 3.2.2.2.2.2 = 96 Number of possible concepts = 2|X| = 296 Size of H = (4.3.3.3.3.3) + 1 = 973 << 296

Inductive Bias Size of the instance space: |X| = 3.2.2.2.2.2 = 96 Number of possible concepts = 2|X| = 296 Size of H = (4.3.3.3.3.3) + 1 = 973 << 296  a biased hypothesis space

Inductive Bias An unbiased hypothesis space H’ that can represent every subset of the instance space X: Propositional logic sentences Positive examples: x1, x2, x3 Negative examples: x4, x5 h(x)  (x = x1)  (x = x2)  (x = x3)  x1  x2 x3 h’(x)  (x  x4)  (x  x5)  x4  x5

Inductive Bias x1 x2  x3 x1 x2  x3  x6 x4  x5 Any new instance x is classified positive by half of the version space, and negative by the other half  not classifiable

Inductive Bias Example Day Actor Price EasyTicket 1 Mon Famous Expensive Yes 2 Sat Moderate No 3 Sun Infamous Cheap 4 Wed 5 6 Thu 7 Tue 8 9 ? 10

Inductive Bias Example Quality Price Buy 1 Good Low Yes 2 Bad High No 3 Good High ? 4 Bad Low

Inductive Bias A learner that makes no prior assumptions regarding the identity of the target concept cannot classify any unseen instances.

Homework Exercises 2-1  2.5 (Chapter 2, ML textbook)