机器学习 陈昱 北京大学计算机科学技术研究所 信息安全工程研究中心
课程基本信息 主讲教师:陈昱 Tel : 助教:程再兴, Tel : 课程网页: qxx2011.mht 2
Ch2 Concept Learning & General-to-Specific Ordering Introduction to concept learning Concept learning as search FIND-S algorithm Version space and CANDIDATE-ELIMINATION algorithm Inductive bias 3
Types of learning Based on types of feedback Supervised learning: correct answer for each training example (labeled example) Un-supervised learning: answer not given (unlabeled example) Mixture of labeled and unlabeled examples: semi- supervised learning Reinforcement learning: the teacher provides reward or penalty. 4
Concept Learning & General-to-Specific Ordering Introduction to concept learning Concept learning as search FIND-S algorithm Version space and CANDIDATE-ELIMINATION algorithm Inductive bias 5
Definition & Example Def. Concept learning is the task of inferring a boolean-valued function from labeled training examples Example: learning the concept “days on which my friend Aldo enjoys his favorite water sport” from a set of training examples: 6
Example (contd) Representing hypotheses One way is to represent a hypo as conjunction of constraints on attributes. Each constraint can be A specific value (e.g. Water=Warm) Don’t care (e.g. Water=?) No value allowed (e.g. Water=Ø) An example of hypo in EnjoySport: 7
Example (contd) Most general hypo—every day is a positive example—is represented by Most specific hypo—every day is a negative example—is represented by 8
Prototypical Concept Learning Task Given: Instance space X: possible days, each described by attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast. Sky (Sunny, Cloudy, Rainy); AirTemp (Warm, Cold); Huminity (Normal, High); Wind (Strong, Weak); Water (Warm, Cool); Forecast (Same, Change). Target function EnjoySport, c: X → {0,1} Hypo space H: conjunction of literals Set D of training examples: positive or negative examples of target function Determine: a hypo h in H s.t. h(x)=c(x) for all x in D (a kind of inductive learning) 9
Inductive Learning: A Brief Overview Simplest form: learn a function from examples Let f be the target function, then an example is a pair (x, f(x)) Statement of a inductive-learning problem: Given a collection of examples of f, return a function h that approximates f (h is called a hypothesis). The fundamental problem of induction is the prediction power of learned h 10
Philosophical Foundation One motivation behind inductive learning is an attempt to establish the source of knowledge Aristotle ( B.C.) was the first to formulate a precise set of laws governing the rational part of the mind The empiricism movement, starting with Francis Bacon’s ( ) Novum Organum (“new instrument” in English), is characterized by a dictum of John Locke ( ): “Nothing is in the understanding, which is not the first in the senses”. 11
An Example: Curve Fitting a) Examples (x, f(x)) and a consistent linear hypothesis b) A consistent degree-7 polynomial for the same data set c) A different data set that admits an exact degree-6 polynomial fit or an approximate linear fit d) A simple, exact sinusoidal fit to the same data set in c) A learning problem is realizable if the hypothesis space contains the true function 12
Ockham’s razor Q: How do we choose from among multiple consistent hypotheses? Ockham’s razor: Prefer the simplest hypothesis consistent with the data— ”Entities are not to be multiplied beyond necessity” 13 William of Ockham ( ), the most influential philosopher of his century.
Inductive Learning Hypothesis There is a fundamental assumption underlying the learned hypo, so-called inductive learning hypothesis: Any hypo found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples 14
Concept Learning & General-to-Specific Ordering Introduction to concept learning Concept learning as search FIND-S algorithm Version space and CANDIDATE-ELIMINATION algorithm Inductive bias 15
An Example: EnjoySport EnjoySport: Instance space X: possible days, each described by attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast. Sky (Sunny, Cloudy, Rainy); AirTemp (Warm, Cold); Huminity (Normal, High); Wind (Strong, Weak); Water (Warm, Cool); Forecast (Same, Change). Target function EnjoySport, c: X → {0,1} Hypo space H: conjunction of literals Size of its instance space: 3×2×2×2×2×2=96 Size of its hypo space: 4×3×3×3×3×3+1=973 Q: does there exist a way to search the hypo space? 16
General-to-Specific Ordering of Hypo 17 An illustration:
“More General Than” Relationship Def. Let h j and h k be boolean-valued functions defined over X, then h j is more_general_than_or_equal_to h k (written as h j ≥ g h k iff Note: “ ≥ g ” is independent of target concept Property: “ ≥ g ” is a partial order. 18
Concept Learning & General-to-Specific Ordering Introduction to concept learning Concept learning as search FIND-S algorithm Version space and CANDIDATE-ELIMINATION algorithm Inductive bias 19
Find-S Algorithm Finding-S: Find a maximally specific hypothesis 1. Initialize h to the most specific hypothesis in H 2. For each positive training example x For each attribute constraint a i in h, if it is satisfied by x, then do nothing; otherwise replace a i by the next more general constraint that is satisfied by x. 3. Output hypo h 20
An Illustration of Find-S 21 Note: If we assume the target concept c is in H, and training examples are noise-free, then the h found via Find-S must also be consistent with c on negative training examples.
Complaints about Find-S Has the learned h converged to the true target concept? Not sure! Why prefer the most specific hypothesis? Are the training examples consistent? We would prefer an algorithm that might be able to detect when training examples are inconsistent, or even better, be able to correct the error. What if there are several maximally specific consistent hypothehses? 22
Concept Learning & General-to-Specific Ordering Introduction to concept learning Concept learning as search FIND-S algorithm Version space and CANDIDATE-ELIMINATION algorithm Inductive bias 23
Version Space Version space is the set of hypotheses that are consistent with the training data, i.e. 24
List-Then-Eliminate Algorithm A “brute force” way of computing version space: list-then-eliminate Algorithm 1. Initialize VS by H 2. For each training example, eliminate any h in VS that is not consistent with c on x. 3. Output the resulting VS. 25
Version Space with Boundary Sets Need a more compact representation of VS to efficiently compute version space One approach: Delimit VS by general and specific boundary sets and partial order between the hypotheses. Example: VS of EnjoySport has six elements which might be ordered in the following way: 26
VS Representation Theorem Def. The general boundary G w.r.t. hypo space H and training data D, is the set of maximally general members of H consistent with D. Def. The specific boundary S w.r.t. hypo space H and training data D, is the set of minimally general (i.e. maximally specific) members of H consistent with D. 27
VS Representation Theorem (2) Let X be an arbitrary set of instances and let H be a set of boolean-valued hypotheses defined over X. Let c be an arbitrary boolean- valued target concept over X, and let D be training set of. For all X, H, c, and D s.t. S & G are well-defined, 28
CANDIDATE-ELIMINATION Algorithm Initialize G to set of maximally general hypotheses in H Initialize S to set of maximally specific hypotheses in H For each training example d, do If d is a positive example, Remove from G any hypo inconsistent with d For each hypo s in S that is inconsistent with d Remove s from S Add to S all minimal generalizations h of s s.t. h is consistent with d, and some member of G is more general than h Remove from S any hypo that is more general than another hypo in S 29
Contd If d is a negative example Remove from S any hypo inconsistent with d For each hypo g in G that is inconsistent with d Remove g from G Add to G all minimal specifications h of g s.t. h is consistent with d, and some member of S is more specific than h Remove from G any hypo that is more specific than another hypo in G 30
An Illustrative Example Find VS of EnjoySport via Candidate- Elimination Algorithm 31
An Illustrative Example (2) 32
An Illustrative Example (3) 33
An Illustrative Example (4) 34
An Illustrative Example (5) Final VS learned from those 4 examples: 35
Remarks CANDIDATE-ELIMINATION works when the conditions in “version space representation theorem” holds, however, in case that every instance can be represented as a fixed-length attribute vector with each attribute taking a finite number of possible values, and the hypo space is restricted to conjunctions of constraints on attributes as defined early, then operations on S in the algorithm can be simplified to FIND-S (during the process S always be a single-element set) 36
Remarks (2) Will the algorithm converge to the correct hypo? Converges if no error in training examples and the true target concept is in H. What if some training example contains wrong target value? The true target concept won’t be in VS What if the true target concept is not in H? The VS might be empty 37
Remarks (3) What training examples should the learner request next? Consider the case that learner proposes the next instance, and obtain answer from teacher. E.g. What query should be presented next? One such instance is. In general, try generating queries that satisfy exactly half of the hypotheses. 38
Remarks (4) How can partially learned concept be used? Consider the VS learned in previous page. Suppose no more training examples, and the learner is required to classify a new instance not yet observed during training. Look at the following 4 examples: Assume the target concept is in VS, then labels of above 4 examples are (utilizing the partial order): ex 1 as “+”; ex 2 as “-”; ex 3 & 4 are ambiguous, and might be assigned a value by voting. 39
Concept Learning & General-to-Specific Ordering Introduction to concept learning Concept learning as search FIND-S algorithm Version space and CANDIDATE-ELIMINATION algorithm Inductive bias 40
A Biased Hypo Space Consider EnjoySport: If we restrict H to conjunctions of attributes, then it is unable to represent even a simple disjunctive concept such as “Sky=Sunny or Cloud”. E.g. given the following three training examples: Candidate-Elimination algorithm (actually any algorithm) will output empty VS. 41
An Unbiased Learner One obvious approach for an unbiased hypo space is to alternatively propose a hypo space H’ capable of representing every teachable concept over X, i.e. power set of X Consider a couple numbers in EnjoySport: \X|=96, number of conjunctive hypotheses equal to 973 (vs ) Apply CANDIDATE-ELIMINATION algorithm to H’ and training set D, then learning algorithm completely loses its generalization power: Every new instance unseen in D will be classified ambiguously! 42
Futility of Bias-Free Learning Fundamental Property of Inductive Inference: A learner that makes no a priori assumption (i.e. inductive bias) regarding the identity of target concept has no rational basis for classifying unseen instances. An interesting idea: characterize various learning approaches by the inductive bias they employ. However, we need to define inductive bias more precisely first. 43
Inductive Bias Let L(x i, D c ) denote the classification L assigned to x i after learning from training set D c. We describe inductive inference step performed by L as follows: What additional assumptions could be added to D c ∧ x i s.t. L(x i, D c ) would follow deductively? Thus we define inductive bias of L as this set of additional assumptions. 44
Inductive Bias (2) Def. Inductive bias of L is any minimal set of assertions B s.t. for any target concept c and training example D c we have where “y\-z” indicates z follows deductively from y. If we define L(x i, D c ) as the unanimous votes by elements of VS found (undermined if not unanimously), then inductive bias of CANDIDATE-ELIMINATION algorithm is “target concept c is in H” 45
Inductive Bias of Various Learners Rote-learner: learning by simply storing training examples in memory No inductive bias CANDIDATE-ELIMINATION: New instances are classified only in case that all members in VS make the same decision Inductive bias: target concept is in VS FIND-S: has an even stronger inductive bias than CANDIDATE-ELIMINATION 46
Inductive → Deductive 47
Summary Concept learning as search through H General-to-specific ordering over H Candidate-Elimination algorithm Learner can make useful queries Inductive leaps possible only if learner is biased 48
More on Concept Learning Bruner et al. (1957) did a pioneering study of concept learning in human being. Concept learning, also known as category learning or concept attainment, was defined in the book as “the search for and listing of attributes that can be used to distinguish exemplars from non exemplars of various categories”. Simply put, concepts are the mental categories that help us classify objects, events, or ideas, and each object, event, or idea has a set of common relevant features. (Wikipedia) 49
On Bruner et al.’s book Editorial Reviews (1986 ed.): “A Study of Thinking” is a pioneering account of how human beings achieve a measure of rationality in spite of the constraints imposed by bias, limited attention and memory, and the risks of error imposed by pressures of time and ignorance. First published in 1956 and hailed at its appearance as a groundbreaking study, it is still read three decades later as a major contribution to our understanding of the mind. In their insightful new introduction, the authors relate the book to the cognitive revolution and its handmaiden, artificial intelligence. 50
Concept Learning (contd) Modern psychological theories regard it as a process of abstraction, data compression, simplification, and summarization: Rule-based theories Prototype theory Exemplar Theories Multiple-Prototype Theories Explanation-Based Theories Bayesian theories Component display theory 51
Concept Learning (contd) Two leading machine learning approaches on it: Instance-based learning K-nearest neighborhood learning, locally weighted regression… Rule induction CANDIDATE-ELIMINAITON Read 2 nd paragraph of p. 47 of Mitchell’s book for extensions of CANDIDATE-ELIMINAITON Decision tree learning Genetic algorithm Sequential covering algorithm … Further reading: “A United Approach to Concept Learning”, PhD dissertation by P. M. D. Domingos (1997)A United Approach to Concept Learning 52
HW 2.2, 2.4 & 2.7 in Mitchell’s book, 10pt each, due on Wednesday,