Download presentation
Presentation is loading. Please wait.
1
Learning From Observations
Marco Loog
2
Learning from Observations
Idea is that percepts should be used for improving agents ability to act in the future, not only for acting per se
3
Outline Learning agents Inductive learning Decision tree learning
4
Learning Learning is essential for unknown environments, i.e., when designer lacks omniscience Learning is useful as a system construction method, i.e., expose the agent to reality rather than trying to write it down Learning modifies the agent’s decision mechanisms to improve performance
5
Learning Agent [Revisited]
Four conceptual components Learning element : responsible for making improvements Performance element : takes percepts and decides on actions Critic : provides feedback on how agent is doing and determines how performance element should be modified Problem generator : responsible for suggesting actions leading to new and informative experience
6
Figure 2.15 [Revisited]
7
Learning Element Design of learning element is affected by
Which components of the performance element are to be learned What feedback is available to learn these components What representation is used for the components
8
Agent’s Components Direct mapping from conditions on current state to actions [instructor : brake!] Means to infer relevant properties about world from percept sequence [learning from images] Info about evolution of the world and results of possible actions [braking on wet road] Utility indicating desirability of world state [no tip / component of utility function] ... Each component can be learned from appropriate feedback
9
Types of Feedback Supervised learning : correct answers for each example Unsupervised learning : correct answers not given Reinforcement learning : occasional rewards
10
Inductive Learning Simplest form : learn a function from examples
I.e. learn the target function f Examples : input / output pairs (x, f(x))
11
Inductive Learning Problem = highly simplified model of real learning
Find a hypothesis h, such that h ≈ f, based on given training set of examples = highly simplified model of real learning Ignores prior knowledge Assumes examples are given
12
Hypothesis A good hypothesis will generalize well, i.e., able to predict based on unseen examples
13
Inductive Learning Method
E.g. function fitting Goal is to estimate real underlying functional relationship from example observations
14
Inductive Learning Method
Construct h to agree with f on training set
15
Inductive Learning Method
Construct h to agree with f on training set
16
Inductive Learning Method
Construct h to agree with f on training set
17
Inductive Learning Method
Construct h to agree with f on training set h is consistent if it agrees with f on all examples
18
Inductive Learning Method
Construct h to agree with f on training set h is consistent if it agrees with f on all examples
19
So, which ‘Fit’ is Best?
20
So, which ‘Fit’ is Best? Ockham’s razor : prefer simplest hypothesis consistent with the data
21
So, which ‘Fit’ is Best? Ockham’s razor : prefer simplest hypothesis consistent with the data What’s consistent? What’s simple?
22
Hypothesis A good hypothesis will generalize well, i.e., able to predict based on unseen examples Not-exactly-consistent may be preferable over exactly consistent Nondeterministic behavior Consistency even not always possible Nondeterministic functions : trade-off complexity of hypothesis / degree of fit
23
Decision Trees ‘Decision tree induction is one of the simplest, and yet most successful forms of learning algorithm’ Good intro to the area of inductive learning
24
Decision Tree Input : object or situation described by set of attributes / features Output [discrete or continuous] : decision / prediction Continuous -> regression Discrete -> classification Boolean classification : output is binary / ‘true’ or ‘false’
25
Decision Tree Performs a sequence of tests in order to reach a decision Tree [as in : graph without closed loops] Internal node : test of the value of single property Branches labeled with possible test outcomes Leaf node : specifies output value Resembles a ‘how to’ manual
26
Decide whether to wait for a Table at a Restaurant
Based on the following attributes Alternate : is there an alternative restaurant nearby? Bar : is there a comfortable bar area to wait in? Fri/Sat : is today Friday or Saturday? Hungry : are we hungry? Patrons : number of people in the restaurant [None, Some, Full] Price : price range [$, $$, $$$] Raining : is it raining outside? Reservation : have we made a reservation? Type : kind of restaurant [French, Italian, Thai, Burger] WaitEstimate : estimated waiting time [0-10, 10-30, 30-60, >60]
27
Attribute-Based Representations
Examples of decisions
28
Decision Tree Possible representation for hypotheses
Below is the ‘true’ tree [note Type? plays no role]
29
Expressiveness Decision trees can express any function of the input attributes E.g., for Boolean functions, truth table row path to leaf
30
Expressiveness There is a consistent decision tree for any training set with one path to leaf for each example [unless f nondeterministic in x] but it probably won’t generalize to new examples Prefer to find more compact decision trees [This Ockham again...]
31
Attribute-Based Representations
Is simply a lookup table Cannot generalize to unseen examples
32
Decision Tree Applying Ockham’s razor : smallest tree consistent with examples
33
Decision Tree Applying Ockham’s razor : smallest tree consistent with examples Able to generalize to unseen examples No need to program everything out / specify everything in detail ‘true’ tree = smallest tree?
34
Decision Tree Learning
Unfortunately, finding the ‘smallest’ tree is intractable in general New aim : find a ‘smallish’ tree consistent with the training examples Idea : [recursively] choose ‘most significant’ attribute as root of [sub]tree ‘Most significant’ : making the most difference to the classification
35
Choosing an Attribute Tests
Idea : a good attribute splits the examples into subsets that are [ideally] ‘all positive’ or ‘all negative’ Patrons? is a better choice
36
Using Information Theory
Information content [entropy] : I(P(v1), … , P(vn)) = Σi=1 -P(vi) log2 P(vi) For a training set containing p positive examples and n negative examples Specifies the minimum number of bits of information needed to encode the classification of an arbitrary member
37
Information Gain Chosen attribute A divides training set E into subsets E1, … , Ev according to their values for A, where A has v distinct values Information gain [IG] : expected reduction in entropy caused by partitioning the examples
38
Information Gain Information gain [IG] : expected reduction in entropy caused by partitioning the examples Choose the attribute with the largest IG [Wanna know more : Google it...]
39
Information Gain [E.g.] For the training set : p = n = 6, I(6/12, 6/12) = 1 bit Consider Patrons? and Type? [and others] Patrons has the highest IG of all attributes and so is chosen as the root Why is IG of Type? equal to zero?
40
Decision Tree Learning
Plenty of other measures for ‘best’ attributes possible...
41
Back to The Example... ‘Training data’
42
Decision Tree Learned Based on the 12 examples; substantially simpler solution than ‘true’ tree More complex hypothesis isn’t justified by small amount of data
43
Performance Measurement
How do we know that h ≈ f? Or : how the h*ll do we know that our decision tree performs well? Most often we don’t know... for sure
44
Performance Measurement
However prediction quality can be estimated using theory from computational / statistical learning theory / PAC-learning Or we could, for example, simply try h on a new test set of examples The crux being of course that there should actually be new test set... If no test set is available several possibilities exist for creating ‘training’ and ‘test’ sets from the available data
45
Performance Measurement
Learning curve : ‘%’ correct on test set as function of training set size
46
Bad Conduct in AI Training on the test set!
May happen before you know it Often very hard justifiable... if at all possible All I can say is : try to avoid it
47
Ensemble-Learning-in-1-Slide
Idea : collection [ensemble] of hypotheses is used / predictions are combined Motivation : hope that it is much less likely to misclassify [obviously!] E.g. independence can be exploited Examples : majority voting / boosting Ensemble learning simply creates new, more expressive hypothesis space
48
Summary In general : learning needed for unknown environments or lazy designers Learning agent = performance element + learning element [Chapter 2] Supervised learning : the aim is to find simple hypothesis [approximately] consistent with training examples Decision tree learning using IG Difficult to measure learning performance Learning curve
49
Next Week More...
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.