Download presentation
Presentation is loading. Please wait.
1
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18
2
CSS 471/598, CBS 598 by H. Liu2 Learning agents Improve their behavior through diligent study of their own experiences Again modeled on humans Acting -> Experience -> Better Acting We’ll study how to make a learning agent to learn; what is needed for learning; and some representative methods of learning from observations
3
CSS 471/598, CBS 598 by H. Liu3 Learning agents
4
CSS 471/598, CBS 598 by H. Liu4 A general model What are the components of a learning agent? Learning element - learn and improve (Fig 2.15) Performance element - an agent itself to perceive & act Problem generator - suggest some exploratory actions Critic - provide feedback how the agent is doing The design of a learning agent is affected by four issues: prior information feedback representation performance
5
CSS 471/598, CBS 598 by H. Liu5 What do we need Components of the performance element Each component should be learnable given feedback Representation of the components Propositional Logic, FOL, or others Available feedback (teacher, reward, none) Supervised, Reinforcement, Unsupervised Prior knowledge Nil, some, (Why not all?) Put it all together as learning some functions
6
CSS 471/598, CBS 598 by H. Liu6 Inductive Learning Data described by examples an example is a pair (x, f(x)), x is a vetor Induction - given a collection of examples of f, return a function h that approximates f. Data in the next slide (Fig 18.3) Concepts about learning (in the next few slides) Hypotheses Bias Learning incrementally or in batch
7
CSS 471/598, CBS 598 by H. Liu7 Attribute-based representations Examples described by attribute values (Boolean, discrete, continuous) E.g., situations where I will/won't wait for a table: Classification of examples is positive (T) or negative (F)
8
CSS 471/598, CBS 598 by H. Liu8 Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting:
9
CSS 471/598, CBS 598 by H. Liu9 Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting:
10
CSS 471/598, CBS 598 by H. Liu10 Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting:
11
CSS 471/598, CBS 598 by H. Liu11 Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting:
12
CSS 471/598, CBS 598 by H. Liu12 Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting:
13
CSS 471/598, CBS 598 by H. Liu13 Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: Ockham’s razor: prefer the simplest hypothesis consistent with data
14
CSS 471/598, CBS 598 by H. Liu14 Some questions about inductive learning Are there many forms of inductive learning? What are the forms you are aware of? We’ll learn some Can we achieve both expressiveness and efficiency? How can one possibly know that one’s learning algorithm has produced a theory that will correctly predict the future? If one does not, how can one say that the algorithm is any good?
15
CSS 471/598, CBS 598 by H. Liu15 Learning decision trees A decision tree takes as input an object described by a set of properties and outputs yes/no “decision”. One of the simplest and yet most successful forms of learning To make a decision “wait” or “not wait”, we need information such as … (page 654 for 10 attributes for the data set in Fig 18.3) Patrons(Full)^WaitEstimate(0-10)^Hungry(N)=>WillWait
16
CSS 471/598, CBS 598 by H. Liu16 Let’s make a decision Where to start? Use the data itself to make decisions When encounter missing rows, use class distribution Or use a decision tree, with which attribute to start first?
17
CSS 471/598, CBS 598 by H. Liu17 Expressiveness of a DT A possible DT (e.g., Fig 18.2 ) The decision tree language is essentially propositional, with each attribute test being a proposition. Any Boolean functions can be written as a decision tree (truth tables DTs) DTs can represent many functions with much smaller trees, but not for all Boolean functions (e.g., parity, majority)
18
CSS 471/598, CBS 598 by H. Liu18 How many different functions are in the set of all Boolean functions on n Boolean attributes? How to find consistent hypotheses in the space of all possible ones? And which one is most likely the best? What is the simplest way to induce? Some choices are … (your suggestions, please)
19
CSS 471/598, CBS 598 by H. Liu19 Inducing DTs from examples Extracting a pattern (DTs) means being able to describe a large number of cases in a concise way – we need a consistent & concise tree. Applying Occam’s razor: the most likely hypothesis is the simplest one that is consistent with all observations. How to find the smallest DT? Examine the most important attribute first (Fig 18.4) Algorithm (Fig 18.5, page 658) Another DT (Fig 18.6)
20
CSS 471/598, CBS 598 by H. Liu20 Choosing an attribute Idea: a good attribute splits the examples into subsets that are (ideally) "all positive" or "all negative" Patrons? is a better choice
21
CSS 471/598, CBS 598 by H. Liu21 Choosing the best attribute A computational method - information theory Information - informally, the more surprise you have, the more information you have; mathematically, I(P(v1),…,P(vn)) = sum[-P(vi)logP(vi)] I(1/2,1/2) = 1 I(0,1) = (1,0) = 0 Information alone can’t help much to answer “what is the correct classification?”.
22
CSS 471/598, CBS 598 by H. Liu22 Information gain - the difference between the original and the new info requirement: Remainder(A) = p1*I(B1)+…+pn*I(Bn) where p1+…+pn = 1 Gain(A) = I(A) - Remainder(A)
23
CSS 471/598, CBS 598 by H. Liu23 Which attribute? Revisit the example of “Wait” or “Not Wait” using your favorite 2 attributes.
24
CSS 471/598, CBS 598 by H. Liu24 Assessing the performance A fair assessment: the one the learner has not seen. Errors and accuracy rate Training and test sets: Divide the data into two sets Learn on the training set Test on the test set If necessary, shuffle the data and repeat Cross Validation Example: 3 5-fold CV
25
CSS 471/598, CBS 598 by H. Liu25 Learning curve - “happy graph”
26
CSS 471/598, CBS 598 by H. Liu26 Ensemble Learning Select a collection of hypotheses (or ensemble) from the hypothesis space and combine their predictions Hope to reduce misclassification by a single hypothesis (Fig. 18.8) Key issues How many hypotheses to produce How to combine them to predict Majority vote Sum of Weighted Some examples to generate many hypotheses Random forest Boosting Fig. 18.9 Bagging (bootstrap aggregating, see the complementary slides W8ensemble.ppt)
27
CSS 471/598, CBS 598 by H. Liu27 Practical use of DT learning BP’s use of GASOIL Learning to fly on a flight simulator An industrial strength system - Quinlan’s C4.5 Who’s the next hero?
28
CSS 471/598, CBS 598 by H. Liu28 Some issues of DT applications Missing values Multivalued attributes Continuous-valued attributes
29
CSS 471/598, CBS 598 by H. Liu29 Why learning works? How can one possibly know that his/her learning algorithm will correctly predict the future? Stationarity assumption – T rain and T est Datasets have the same probability distribution I.I.D. assumption – Data are independent and identically distributed How do we know that h is close enough to f without knowing f? Any suggestion?
30
CSS 471/598, CBS 598 by H. Liu30 Computational learning theory has some answer. The basic idea is that because any wrong h will make an incorrect prediction, it will be found out with high probability after a small number of examples. So, if h is consistent with a sufficient number of examples, it is unlikely to be seriously wrong – hence, it is probably approximately correct (PAC). How many examples are needed for learning? error(h) = P(h(x) f(x) |x drawn from D) Define “seriously wrong” as error(h b ) > P(h b agrees with N examples) (1- ) N P(H b contains a consistent h) |H b |(1- ) N |H|(1- ) N N ln |H|, given 1- e - PAC learning H HbHb f
31
CSS 471/598, CBS 598 by H. Liu31 Summary Learning is essential for intelligent agents dealing with the unknowns improving its capability over time All types of learning can be considered as learning an accurate representation h of f. Inductive learning f from data to h Decision trees - deterministic Boolean functions Ensemble learning PAC learning
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.