Presentation is loading. Please wait.

Presentation is loading. Please wait.

I NTRODUCTION TO M ACHINE L EARNING. L EARNING Agent has made observations (data) Now must make sense of it (hypotheses) Hypotheses alone may be important.

Similar presentations


Presentation on theme: "I NTRODUCTION TO M ACHINE L EARNING. L EARNING Agent has made observations (data) Now must make sense of it (hypotheses) Hypotheses alone may be important."— Presentation transcript:

1 I NTRODUCTION TO M ACHINE L EARNING

2 L EARNING Agent has made observations (data) Now must make sense of it (hypotheses) Hypotheses alone may be important (e.g., in basic science) For inference (e.g., forecasting) To take sensible actions (decision making) A basic component of economics, social and hard sciences, engineering, …

3 L AST TIME Learning using statistical models relating unknown hypothesis to observed data 3 techniques Maximum likelihood Maximum a posteriori Bayesian inference

4 A R EVIEW HW4 asks you to learn parameters of a Bayes Net (Naïve Bayes) Suppose we wish to estimate P(C|A,B) The unknown parameters in the conditional probability distribution are “coin flips” P(C|AB) A,B 11 A,  B 22  A,B 33  A,  B 44

5 A R EVIEW Example data ML estimates ABC# obs 1113 1105 1011 10010 0114 0107 0016 0007 P(C|AB) A,B  1 =3/8 A,  B  2 =1/11  A,B  3 =4/11  A,  B  4 =6/13

6 M AXIMUM A P OSTERIORI Example data MAP Estimates assuming virtual counts a=b=2 virtual counts 2H,2T ABC# obs 1113 1105 1011 10010 0114 0107 0016 0007 P(C|AB) A,B  1 =(3+2)/(8+4) A,  B  2 =(1+2)/(11+4)  A,B  3 =(4+2)/(11+4)  A,  B  4 =(6+2)/(13+4)

7 T OPICS IN M ACHINE L EARNING Applications Document retrieval Document classification Data mining Computer vision Scientific discovery Robotics … Tasks & settings Classification Ranking Clustering Regression Decision-making Supervised Unsupervised Semi-supervised Active Reinforcement learning Techniques Bayesian learning Decision trees Neural networks Support vector machines Boosting Case-based reasoning Dimensionality reduction …

8 W HAT IS L EARNING ?  Mostly generalization from experience: “Our experience of the world is specific, yet we are able to formulate general theories that account for the past and predict the future” M.R. Genesereth and N.J. Nilsson, in Logical Foundations of AI, 1987   Concepts, heuristics, policies  Supervised vs. un-supervised learning

9 S UPERVISED L EARNING Agent is given a training set of input/output pairs ( x, y ), with y =f( x ) Task: build a model that will allow it to predict f( x ) for a new x

10 U NSUPERVISED L EARNING Agent is given a training set of data points x Task: learn “patterns” in the data (e.g., clusters)

11 R EINFORCEMENT L EARNING Agent acts sequentially in the real world, chooses actions a 1,…,a n, receives reward R Must decide which actions were most responsible for R

12 D EDUCTIVE V S. I NDUCTIVE R EASONING Deductive reasoning: General rules (e.g., logic) to specific examples Inductive reasoning: Specific examples to general rules

13 I NDUCTIVE L EARNING Basic form: learn a function from examples f is the unknown target function An example is a pair ( x, f(x) ) Problem: find a hypothesis h such that h ≈ f given a training set of examples D Instance of supervised learning Classification task: f  {0,1,…,C} (usually C=1) Regression task: f  reals

14 I NDUCTIVE LEARNING METHOD Construct/adjust h to agree with f on training set ( h is consistent if it agrees with f on all examples) E.g., curve fitting:

15 I NDUCTIVE LEARNING METHOD Construct/adjust h to agree with f on training set ( h is consistent if it agrees with f on all examples) E.g., curve fitting:

16 I NDUCTIVE LEARNING METHOD Construct/adjust h to agree with f on training set ( h is consistent if it agrees with f on all examples) E.g., curve fitting:

17 I NDUCTIVE LEARNING METHOD Construct/adjust h to agree with f on training set ( h is consistent if it agrees with f on all examples) E.g., curve fitting:

18 I NDUCTIVE LEARNING METHOD Construct/adjust h to agree with f on training set ( h is consistent if it agrees with f on all examples) E.g., curve fitting:

19 I NDUCTIVE LEARNING METHOD Construct/adjust h to agree with f on training set ( h is consistent if it agrees with f on all examples) E.g., curve fitting: h=D is a trivial, but perhaps uninteresting solution (caching)

20 O VERFITTING An issue that can arise in all learning tasks where the hypothesis too closely matches the known data Essentially we incorporate the “noise” of the data into our prediction strategy for unseen data — this is BAD! Extreme cases shown on previous two slides

21 C LASSIFICATION T ASK  The target function f(x) takes on values True and False  A example is positive if f is True, else it is negative  The set X of all examples is the example set  The training set is a subset of X a small one!

22 L OGIC -B ASED I NDUCTIVE L EARNING Here, examples (x, f(x)) take on discrete values

23 L OGIC -B ASED I NDUCTIVE L EARNING Here, examples (x, f(x)) take on discrete values Concept Note that the training set does not say whether an observable predicate is pertinent or not

24 R EWARDED C ARD E XAMPLE  Deck of cards, with each card designated by [r,s], its rank and suit, and some cards “rewarded”  Background knowledge KB: ((r=1) v … v (r=10))  NUM(r) ((r=J) v (r=Q) v (r=K))  FACE(r) ((s=S) v (s=C))  BLACK(s) ((s=D) v (s=H))  RED(s)  Training set D: REWARD([4,C])  REWARD([7,C])  REWARD([2,S])   REWARD([5,H])   REWARD([J,S])

25 R EWARDED C ARD E XAMPLE  Deck of cards, with each card designated by [r,s], its rank and suit, and some cards “rewarded”  Background knowledge KB: ((r=1) v … v (r=10))  NUM(r) ((r=J) v (r=Q) v (r=K))  FACE(r) ((s=S) v (s=C))  BLACK(s) ((s=D) v (s=H))  RED(s)  Training set D: REWARD([4,C])  REWARD([7,C])  REWARD([2,S])   REWARD([5,H])   REWARD([J,S])  Possible inductive hypothesis: h  (NUM(r)  BLACK(s)  REWARD([r,s])) There are several possible inductive hypotheses

26 L EARNING A L OGICAL P REDICATE (C ONCEPT C LASSIFIER )  Set E of objects (e.g., cards)  Goal predicate CONCEPT(x), where x is an object in E, that takes the value True or False (e.g., REWARD) Example: CONCEPT describes the precondition of an action, e.g., Unstack(C,A) E is the set of states CONCEPT(x)  HANDEMPTY  x, BLOCK(C)  x, BLOCK(A)  x, CLEAR(C)  x, ON(C,A)  x Learning CONCEPT is a step toward learning an action description

27 L EARNING A L OGICAL P REDICATE (C ONCEPT C LASSIFIER )  Set E of objects (e.g., cards)  Goal predicate CONCEPT(x), where x is an object in E, that takes the value True or False (e.g., REWARD)  Observable predicates A(x), B(X), … (e.g., NUM, RED)  Training set: values of CONCEPT for some combinations of values of the observable predicates

28 L EARNING A L OGICAL P REDICATE (C ONCEPT C LASSIFIER )  Set E of objects (e.g., cards)  Goal predicate CONCEPT(x), where x is an object in E, that takes the value True or False (e.g., REWARD)  Observable predicates A(x), B(X), … (e.g., NUM, RED)  Training set: values of CONCEPT for some combinations of values of the observable predicates  Find a representation of CONCEPT in the form: CONCEPT(x)  S(A,B, …) where S(A,B,…) is a sentence built with the observable predicates, e.g.: CONCEPT(x)  A(x)  (  B(x) v C(x))

29 H YPOTHESIS S PACE  An hypothesis is any sentence of the form: CONCEPT(x)  S(A,B, …) where S(A,B,…) is a sentence built using the observable predicates  The set of all hypotheses is called the hypothesis space H  An hypothesis h agrees with an example if it gives the correct value of CONCEPT

30 + + + + + + + + + + + + - - - - - - - - - - - - Example set X {[A, B, …, CONCEPT]} I NDUCTIVE L EARNING S CHEME Hypothesis space H {[CONCEPT(x)  S(A,B, …)]} Training set D Inductive hypothesis h

31 S IZE OF H YPOTHESIS S PACE n observable predicates 2 n entries in truth table defining CONCEPT and each entry can be filled with True or False In the absence of any restriction (bias), there are hypotheses to choose from n = 6  2x10 19 hypotheses! 2 2n2n

32 h 1  NUM(r)  BLACK(s)  REWARD([r,s]) h 2  BLACK(s)   (r=J)  REWARD([r,s]) h 3  ([r,s]=[4,C])  ([r,s]=[7,C])  [r,s]=[2,S])  REWARD([r,s]) h 4   ([r,s]=[5,H])   ([r,s]=[J,S])  REWARD([r,s]) agree with all the examples in the training set M ULTIPLE I NDUCTIVE H YPOTHESES

33 h 1  NUM(r)  BLACK(s)  REWARD([r,s]) h 2  BLACK(s)   (r=J)  REWARD([r,s]) h 3  ([r,s]=[4,C])  ([r,s]=[7,C])  [r,s]=[2,S])  REWARD([r,s]) h 4   ([r,s]=[5,H])   ([r,s]=[J,S])  REWARD([r,s]) agree with all the examples in the training set M ULTIPLE I NDUCTIVE H YPOTHESES Need for a system of preferences – called an inductive bias – to compare possible hypotheses

34 N OTION OF C APACITY  It refers to the ability of a machine to learn any training set without error  A machine with too much capacity is like a botanist with photographic memory who, when presented with a new tree, concludes that it is not a tree because it has a different number of leaves from anything he has seen before  A machine with too little capacity is like the botanist’s lazy brother, who declares that if it’s green, it’s a tree  Good generalization can only be achieved when the right balance is struck between the accuracy attained on the training set and the capacity of the machine

35  K EEP -I T -S IMPLE (KIS) B IAS  Examples Use much fewer observable predicates than the training set Constrain the learnt predicate, e.g., to use only “high- level” observable predicates such as NUM, FACE, BLACK, and RED and/or to have simple syntax  Motivation If an hypothesis is too complex it is not worth learning it (data caching does the job as well) There are much fewer simple hypotheses than complex ones, hence the hypothesis space is smaller

36  K EEP -I T -S IMPLE (KIS) B IAS  Examples Use much fewer observable predicates than the training set Constrain the learnt predicate, e.g., to use only “high- level” observable predicates such as NUM, FACE, BLACK, and RED and/or to have simple syntax  Motivation If an hypothesis is too complex it is not worth learning it (data caching does the job as well) There are much fewer simple hypotheses than complex ones, hence the hypothesis space is smaller Einstein: “A theory must be as simple as possible, but not simpler than this”

37  K EEP -I T -S IMPLE (KIS) B IAS  Examples Use much fewer observable predicates than the training set Constrain the learnt predicate, e.g., to use only “high- level” observable predicates such as NUM, FACE, BLACK, and RED and/or to have simple syntax  Motivation If an hypothesis is too complex it is not worth learning it (data caching does the job as well) There are much fewer simple hypotheses than complex ones, hence the hypothesis space is smaller If the bias allows only sentences S that are conjunctions of k << n predicates picked from the n observable predicates, then the size of H is O(n k )

38 S UPERVISED L EARNING F LOW C HART Training set Target function Datapoints Inductive Hypothesis Prediction Learner Hypothesis space Choice of learning algorithm Unknown concept we want to approximate Observations we have seen Test set Observations we will see in the future Better quantities to assess performance

39 C APACITY IS N OT THE O NLY C RITERION Accuracy on training set isn’t the best measure of performance + + + + + + + + + + + + - - - - - - - - - - - - Learn Test Example set XHypothesis space H Training set D

40 G ENERALIZATION E RROR A hypothesis h is said to generalize well if it achieves low error on all examples in X + + + + + + + + + + + + - - - - - - - - - - - - Learn Test Example set XHypothesis space H

41 O VERFITTING (R ESTATED ) If we have a situation where our training error continues to decrease While our generalization error on the other hand begins to increase We can safely say our model has fallen victim to overfitting What does this have to do with capacity?

42 A SSESSING P ERFORMANCE OF A L EARNING A LGORITHM Samples from X are typically unavailable Take out some of the training set Train on the remaining training set Test on the excluded instances Cross-validation

43 C ROSS -V ALIDATION Split original set of examples, train + + + + + + + - - - - - - + + + + + - - - - - - Hypothesis space H Train Examples D

44 C ROSS -V ALIDATION Evaluate hypothesis on testing set + + + + + + + - - - - - - Hypothesis space H Testing set

45 C ROSS -V ALIDATION Evaluate hypothesis on testing set Hypothesis space H Testing set + + + + + - - - - - - + + Test

46 C ROSS -V ALIDATION Compare true concept against prediction + + + + + + + - - - - - - Hypothesis space H Testing set + + + + + - - - - - - + + 9/13 correct

47 T ENNIS E XAMPLE Evaluate learning algorithm PlayTennis = S(Temperature,Wind)

48 T ENNIS E XAMPLE Evaluate learning algorithm PlayTennis = S(Temperature,Wind) Trained hypothesis PlayTennis = (T=Mild or Cool)  (W=Weak) Training errors = 3/10 Testing errors = 4/4

49 T ENNIS E XAMPLE Evaluate learning algorithm PlayTennis = S(Temperature,Wind) Trained hypothesis PlayTennis = (T=Mild or Cool) Training errors = 3/10 Testing errors = 1/4

50 T ENNIS E XAMPLE Evaluate learning algorithm PlayTennis = S(Temperature,Wind) Trained hypothesis PlayTennis = (T=Mild or Cool) Training errors = 3/10 Testing errors = 2/4

51 H OW TO CONSTRUCT A BETTER LEARNER ? Ideas?

52 T EN ( ISH ) C OMMANDMENTS OF MACHINE LEARNING Thou shalt not: Train on examples in the testing set Form assumptions by “peeking” at the testing set, then formulating inductive bias Overfit the training data

53 R EINFORCEMENT L EARNING Enough talk, let’s see some machine learning in action!

54 Klondike Solitaire

55 ● Difficulties: size of state space; large number of stochastic elements; variability of utility of actions ● Strategy: Reinforcement Learning – reward – long-term learning

56 Representation ● State: want to find a way to effectively reduce/collapse state space ● Action: need to find way of incorporating key components of an action and its potential results

57 Algorithm ● SARSA: State 1 -Action 1 -Reward-State 2 -Action 2 ● Different learning policies vary in: – State representation – Action representation – Reward function

58 Results

59 Clear Winner Emerges - Slow, but steady progress

60 Keep Playing... - Seems to converge, BUT: average win rates for last 10,000 games were: 3.13, 3.08, 3.4, 3, 3.2 ≥ 3 !!!

61 N EXT T IME Decision tree learning Read R&N 18.1-3


Download ppt "I NTRODUCTION TO M ACHINE L EARNING. L EARNING Agent has made observations (data) Now must make sense of it (hypotheses) Hypotheses alone may be important."

Similar presentations


Ads by Google