机器学习 陈昱 北京大学计算机科学技术研究所 信息安全工程研究中心. 课程基本信息  主讲教师:陈昱 Tel : 82529680  助教:程再兴, Tel : 62763742  课程网页:

Slides:



Advertisements
Similar presentations
Concept Learning and the General-to-Specific Ordering
Advertisements

2. Concept Learning 2.1 Introduction
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Concept Learning DefinitionsDefinitions Search Space and General-Specific OrderingSearch Space and General-Specific Ordering The Candidate Elimination.
CS 484 – Artificial Intelligence1 Announcements Project 1 is due Tuesday, October 16 Send me the name of your konane bot Midterm is Thursday, October 18.
Università di Milano-Bicocca Laurea Magistrale in Informatica
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007 Machine Learning Lecture 2: Concept Learning and Version Spaces 1.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2004.
Chapter 2 - Concept learning
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Machine Learning: Symbol-Based
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Concept Learning and Version Spaces
Artificial Intelligence 6. Machine Learning, Version Space Method
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, January 19, 2001.
Computing & Information Sciences Kansas State University Lecture 01 of 42 Wednesday, 24 January 2008 William H. Hsu Department of Computing and Information.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Friday, 25 January 2008 William.
CS 484 – Artificial Intelligence1 Announcements List of 5 source for research paper Homework 5 due Tuesday, October 30 Book Review due Tuesday, October.
For Friday Read chapter 18, sections 3-4 Homework: –Chapter 14, exercise 12 a, b, d.
1 Machine Learning What is learning?. 2 Machine Learning What is learning? “That is what learning is. You suddenly understand something you've understood.
Machine Learning Chapter 11.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, January 19, 2000.
General-to-Specific Ordering. 8/29/03Logic Based Classification2 SkyAirTempHumidityWindWaterForecastEnjoySport SunnyWarmNormalStrongWarmSameYes SunnyWarmHighStrongWarmSameYes.
Learning from Observations Chapter 18 Through
1 Concept Learning By Dong Xu State Key Lab of CAD&CG, ZJU.
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 22, 2001 William.
Chapter 2: Concept Learning and the General-to-Specific Ordering.
CpSc 810: Machine Learning Concept Learning and General to Specific Ordering.
Concept Learning and the General-to-Specific Ordering 이 종우 자연언어처리연구실.
Outline Inductive bias General-to specific ordering of hypotheses
Overview Concept Learning Representation Inductive Learning Hypothesis
CS Machine Learning 15 Jan Inductive Classification.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Thursday, August 26, 1999 William.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 34 of 41 Wednesday, 10.
Machine Learning: Lecture 2
Machine Learning Concept Learning General-to Specific Ordering
Kansas State University Department of Computing and Information Sciences CIS 690: Implementation of High-Performance Data Mining Systems Thursday, 20 May.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
CS464 Introduction to Machine Learning1 Concept Learning Inducing general functions from specific training examples is a main issue of machine learning.
Concept Learning and The General-To Specific Ordering
Computational Learning Theory Part 1: Preliminaries 1.
Concept learning Maria Simi, 2011/2012 Machine Learning, Tom Mitchell Mc Graw-Hill International Editions, 1997 (Cap 1, 2).
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Chapter 2 Concept Learning
Concept Learning Machine Learning by T. Mitchell (McGraw-Hill) Chp. 2
CSE543: Machine Learning Lecture 2: August 6, 2014
CS 9633 Machine Learning Concept Learning
Analytical Learning Discussion (4 of 4):
Machine Learning Chapter 2
Ordering of Hypothesis Space
Data Mining Lecture 11.
Machine Learning: Lecture 3
Concept Learning.
Machine Learning: Lecture 6
Concept Learning Berlin Chen 2005 References:
Machine Learning: UNIT-3 CHAPTER-1
Machine Learning Chapter 2
Machine Learning: Decision Tree Learning
Implementation of Learning Systems
Version Space Machine Learning Fall 2018.
Machine Learning Chapter 2
Presentation transcript:

机器学习 陈昱 北京大学计算机科学技术研究所 信息安全工程研究中心

课程基本信息  主讲教师:陈昱 Tel :  助教:程再兴, Tel :  课程网页: qxx2011.mht 2

Ch2 Concept Learning & General-to-Specific Ordering  Introduction to concept learning  Concept learning as search  FIND-S algorithm  Version space and CANDIDATE-ELIMINATION algorithm  Inductive bias 3

Types of learning  Based on types of feedback Supervised learning: correct answer for each training example (labeled example) Un-supervised learning: answer not given (unlabeled example) Mixture of labeled and unlabeled examples: semi- supervised learning Reinforcement learning: the teacher provides reward or penalty. 4

Concept Learning & General-to-Specific Ordering  Introduction to concept learning  Concept learning as search  FIND-S algorithm  Version space and CANDIDATE-ELIMINATION algorithm  Inductive bias 5

Definition & Example  Def. Concept learning is the task of inferring a boolean-valued function from labeled training examples  Example: learning the concept “days on which my friend Aldo enjoys his favorite water sport” from a set of training examples: 6

Example (contd) Representing hypotheses  One way is to represent a hypo as conjunction of constraints on attributes. Each constraint can be A specific value (e.g. Water=Warm) Don’t care (e.g. Water=?) No value allowed (e.g. Water=Ø)  An example of hypo in EnjoySport: 7

Example (contd)  Most general hypo—every day is a positive example—is represented by  Most specific hypo—every day is a negative example—is represented by 8

Prototypical Concept Learning Task  Given: Instance space X: possible days, each described by attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast. Sky (Sunny, Cloudy, Rainy); AirTemp (Warm, Cold); Huminity (Normal, High); Wind (Strong, Weak); Water (Warm, Cool); Forecast (Same, Change). Target function EnjoySport, c: X → {0,1} Hypo space H: conjunction of literals Set D of training examples: positive or negative examples of target function  Determine: a hypo h in H s.t. h(x)=c(x) for all x in D (a kind of inductive learning) 9

Inductive Learning: A Brief Overview  Simplest form: learn a function from examples Let f be the target function, then an example is a pair (x, f(x))  Statement of a inductive-learning problem: Given a collection of examples of f, return a function h that approximates f (h is called a hypothesis).  The fundamental problem of induction is the prediction power of learned h 10

Philosophical Foundation  One motivation behind inductive learning is an attempt to establish the source of knowledge Aristotle ( B.C.) was the first to formulate a precise set of laws governing the rational part of the mind  The empiricism movement, starting with Francis Bacon’s ( ) Novum Organum (“new instrument” in English), is characterized by a dictum of John Locke ( ): “Nothing is in the understanding, which is not the first in the senses”. 11

An Example: Curve Fitting a) Examples (x, f(x)) and a consistent linear hypothesis b) A consistent degree-7 polynomial for the same data set c) A different data set that admits an exact degree-6 polynomial fit or an approximate linear fit d) A simple, exact sinusoidal fit to the same data set in c) A learning problem is realizable if the hypothesis space contains the true function 12

Ockham’s razor  Q: How do we choose from among multiple consistent hypotheses?  Ockham’s razor: Prefer the simplest hypothesis consistent with the data— ”Entities are not to be multiplied beyond necessity” 13 William of Ockham ( ), the most influential philosopher of his century.

Inductive Learning Hypothesis  There is a fundamental assumption underlying the learned hypo, so-called inductive learning hypothesis: Any hypo found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples 14

Concept Learning & General-to-Specific Ordering  Introduction to concept learning  Concept learning as search  FIND-S algorithm  Version space and CANDIDATE-ELIMINATION algorithm  Inductive bias 15

An Example: EnjoySport  EnjoySport: Instance space X: possible days, each described by attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast. Sky (Sunny, Cloudy, Rainy); AirTemp (Warm, Cold); Huminity (Normal, High); Wind (Strong, Weak); Water (Warm, Cool); Forecast (Same, Change). Target function EnjoySport, c: X → {0,1} Hypo space H: conjunction of literals  Size of its instance space: 3×2×2×2×2×2=96  Size of its hypo space: 4×3×3×3×3×3+1=973  Q: does there exist a way to search the hypo space? 16

General-to-Specific Ordering of Hypo 17  An illustration:

“More General Than” Relationship  Def. Let h j and h k be boolean-valued functions defined over X, then h j is more_general_than_or_equal_to h k (written as h j ≥ g h k iff  Note: “ ≥ g ” is independent of target concept  Property: “ ≥ g ” is a partial order. 18

Concept Learning & General-to-Specific Ordering  Introduction to concept learning  Concept learning as search  FIND-S algorithm  Version space and CANDIDATE-ELIMINATION algorithm  Inductive bias 19

Find-S Algorithm Finding-S: Find a maximally specific hypothesis 1. Initialize h to the most specific hypothesis in H 2. For each positive training example x For each attribute constraint a i in h, if it is satisfied by x, then do nothing; otherwise replace a i by the next more general constraint that is satisfied by x. 3. Output hypo h 20

An Illustration of Find-S 21  Note: If we assume the target concept c is in H, and training examples are noise-free, then the h found via Find-S must also be consistent with c on negative training examples.

Complaints about Find-S  Has the learned h converged to the true target concept? Not sure!  Why prefer the most specific hypothesis?  Are the training examples consistent? We would prefer an algorithm that might be able to detect when training examples are inconsistent, or even better, be able to correct the error.  What if there are several maximally specific consistent hypothehses? 22

Concept Learning & General-to-Specific Ordering  Introduction to concept learning  Concept learning as search  FIND-S algorithm  Version space and CANDIDATE-ELIMINATION algorithm  Inductive bias 23

Version Space  Version space is the set of hypotheses that are consistent with the training data, i.e. 24

List-Then-Eliminate Algorithm  A “brute force” way of computing version space: list-then-eliminate Algorithm 1. Initialize VS by H 2. For each training example, eliminate any h in VS that is not consistent with c on x. 3. Output the resulting VS. 25

Version Space with Boundary Sets  Need a more compact representation of VS to efficiently compute version space One approach: Delimit VS by general and specific boundary sets and partial order between the hypotheses. Example: VS of EnjoySport has six elements which might be ordered in the following way: 26

VS Representation Theorem  Def. The general boundary G w.r.t. hypo space H and training data D, is the set of maximally general members of H consistent with D.  Def. The specific boundary S w.r.t. hypo space H and training data D, is the set of minimally general (i.e. maximally specific) members of H consistent with D. 27

VS Representation Theorem (2)  Let X be an arbitrary set of instances and let H be a set of boolean-valued hypotheses defined over X. Let c be an arbitrary boolean- valued target concept over X, and let D be training set of. For all X, H, c, and D s.t. S & G are well-defined, 28

CANDIDATE-ELIMINATION Algorithm  Initialize G to set of maximally general hypotheses in H  Initialize S to set of maximally specific hypotheses in H  For each training example d, do If d is a positive example, Remove from G any hypo inconsistent with d For each hypo s in S that is inconsistent with d Remove s from S Add to S all minimal generalizations h of s s.t. h is consistent with d, and some member of G is more general than h Remove from S any hypo that is more general than another hypo in S 29

Contd If d is a negative example Remove from S any hypo inconsistent with d For each hypo g in G that is inconsistent with d Remove g from G Add to G all minimal specifications h of g s.t. h is consistent with d, and some member of S is more specific than h Remove from G any hypo that is more specific than another hypo in G 30

An Illustrative Example  Find VS of EnjoySport via Candidate- Elimination Algorithm 31

An Illustrative Example (2) 32

An Illustrative Example (3) 33

An Illustrative Example (4) 34

An Illustrative Example (5)  Final VS learned from those 4 examples: 35

Remarks  CANDIDATE-ELIMINATION works when the conditions in “version space representation theorem” holds, however, in case that every instance can be represented as a fixed-length attribute vector with each attribute taking a finite number of possible values, and the hypo space is restricted to conjunctions of constraints on attributes as defined early,  then operations on S in the algorithm can be simplified to FIND-S (during the process S always be a single-element set) 36

Remarks (2)  Will the algorithm converge to the correct hypo? Converges if no error in training examples and the true target concept is in H.  What if some training example contains wrong target value? The true target concept won’t be in VS  What if the true target concept is not in H? The VS might be empty 37

Remarks (3)  What training examples should the learner request next? Consider the case that learner proposes the next instance, and obtain answer from teacher. E.g.  What query should be presented next?  One such instance is. In general, try generating queries that satisfy exactly half of the hypotheses. 38

Remarks (4)  How can partially learned concept be used? Consider the VS learned in previous page. Suppose no more training examples, and the learner is required to classify a new instance not yet observed during training. Look at the following 4 examples:  Assume the target concept is in VS, then labels of above 4 examples are (utilizing the partial order): ex 1 as “+”; ex 2 as “-”; ex 3 & 4 are ambiguous, and might be assigned a value by voting. 39

Concept Learning & General-to-Specific Ordering  Introduction to concept learning  Concept learning as search  FIND-S algorithm  Version space and CANDIDATE-ELIMINATION algorithm  Inductive bias 40

A Biased Hypo Space  Consider EnjoySport: If we restrict H to conjunctions of attributes, then it is unable to represent even a simple disjunctive concept such as “Sky=Sunny or Cloud”. E.g. given the following three training examples: Candidate-Elimination algorithm (actually any algorithm) will output empty VS. 41

An Unbiased Learner  One obvious approach for an unbiased hypo space is to alternatively propose a hypo space H’ capable of representing every teachable concept over X, i.e. power set of X Consider a couple numbers in EnjoySport: \X|=96, number of conjunctive hypotheses equal to 973 (vs )  Apply CANDIDATE-ELIMINATION algorithm to H’ and training set D, then learning algorithm completely loses its generalization power: Every new instance unseen in D will be classified ambiguously! 42

Futility of Bias-Free Learning  Fundamental Property of Inductive Inference: A learner that makes no a priori assumption (i.e. inductive bias) regarding the identity of target concept has no rational basis for classifying unseen instances.  An interesting idea: characterize various learning approaches by the inductive bias they employ. However, we need to define inductive bias more precisely first. 43

Inductive Bias  Let L(x i, D c ) denote the classification L assigned to x i after learning from training set D c. We describe inductive inference step performed by L as follows:  What additional assumptions could be added to D c ∧ x i s.t. L(x i, D c ) would follow deductively? Thus we define inductive bias of L as this set of additional assumptions. 44

Inductive Bias (2)  Def. Inductive bias of L is any minimal set of assertions B s.t. for any target concept c and training example D c we have where “y\-z” indicates z follows deductively from y.  If we define L(x i, D c ) as the unanimous votes by elements of VS found (undermined if not unanimously), then inductive bias of CANDIDATE-ELIMINATION algorithm is “target concept c is in H” 45

Inductive Bias of Various Learners  Rote-learner: learning by simply storing training examples in memory No inductive bias  CANDIDATE-ELIMINATION: New instances are classified only in case that all members in VS make the same decision Inductive bias: target concept is in VS  FIND-S: has an even stronger inductive bias than CANDIDATE-ELIMINATION 46

Inductive → Deductive 47

Summary  Concept learning as search through H  General-to-specific ordering over H  Candidate-Elimination algorithm  Learner can make useful queries  Inductive leaps possible only if learner is biased 48

More on Concept Learning  Bruner et al. (1957) did a pioneering study of concept learning in human being. Concept learning, also known as category learning or concept attainment, was defined in the book as “the search for and listing of attributes that can be used to distinguish exemplars from non exemplars of various categories”.  Simply put, concepts are the mental categories that help us classify objects, events, or ideas, and each object, event, or idea has a set of common relevant features. (Wikipedia) 49

On Bruner et al.’s book  Editorial Reviews (1986 ed.): “A Study of Thinking” is a pioneering account of how human beings achieve a measure of rationality in spite of the constraints imposed by bias, limited attention and memory, and the risks of error imposed by pressures of time and ignorance. First published in 1956 and hailed at its appearance as a groundbreaking study, it is still read three decades later as a major contribution to our understanding of the mind. In their insightful new introduction, the authors relate the book to the cognitive revolution and its handmaiden, artificial intelligence. 50

Concept Learning (contd)  Modern psychological theories regard it as a process of abstraction, data compression, simplification, and summarization: Rule-based theories Prototype theory Exemplar Theories Multiple-Prototype Theories Explanation-Based Theories Bayesian theories Component display theory 51

Concept Learning (contd)  Two leading machine learning approaches on it: Instance-based learning K-nearest neighborhood learning, locally weighted regression… Rule induction CANDIDATE-ELIMINAITON Read 2 nd paragraph of p. 47 of Mitchell’s book for extensions of CANDIDATE-ELIMINAITON Decision tree learning Genetic algorithm Sequential covering algorithm …  Further reading: “A United Approach to Concept Learning”, PhD dissertation by P. M. D. Domingos (1997)A United Approach to Concept Learning 52

HW  2.2, 2.4 & 2.7 in Mitchell’s book, 10pt each, due on Wednesday,