Download presentation
1
Instance-Based Learning
2
Content Motivation k-Nearest Neighbour Learning (kNN)
Eager Learning Lazy Learning Instance-Based Learning k-Nearest Neighbour Learning (kNN) Distance-Weighted k-NN Locally Weighted Regression (LWR) Case-Based Reasoning (CBR) Summary
3
Motivation: Eager Learning
THE LEARNING TASK: Try to approximate a target function through a hypothesis on the basis of training examples EAGER Learning: As soon as the training examples and the hypothesis space are received the search for the first hypothesis begins Training phase: given: training examples hypothesis space H search: best hypothesis Processing phase: for every new instance return Examples
4
Motivation: Lazy Algorithms
Training examples are stored and sleeping Generalisation beyond these examples is postponed till new instances must be classified Every time a new query instance is encountered, its relationship to the previously stored examples is examined in order to compute the value of the target function for this new instance
5
Motivation: Instance-Based Learning
Instance-Based Algorithms can establish a new local approximation for every new instance Training phase: given: training sample Processing phase: given: instance search: best local hypothesis return Examples: Nearest Neighbour Algorithm Distance Weighted Nearest Neighbour Locally Weighted Regression ....
6
Motivation: Instance-Based Learning 2
How are the instances represented? How can we measure the similarity of the instances? How can be computed?
7
Nearest Neighbour Algorithm
IDEA: All instances correspond to the points in the n-dimensional space Assign the value of the next, neighboured instance to the new instance REPRESENTATION: Let be an instance, where denotes the value of the r-th attribute of an instance x TARGET FUNCTION: Discrete valued or real valued
8
Nearest Neighbour Algorithm 2
HOW IS THE NEAREST NEIGHBOUR DEFINED : Metric as similarity measure Minkowski Norm: where Euclidean distance: This algorithm never forms an explicit general hypothesis regarding the target function f
9
Nearest Neighbour Algorithm 3
HOW IS FORMED? Discrete target function: where V: set of s classes Continuous target function: Let the next neighbour of ==>
10
k-Nearest Neighbour IDEA: If we choose k=1, then the algorithm assigns to the value where is the nearest training instance to For larger values of k the algorithm assigns the most common value among the k nearest training examples HOW CAN BE ESTABLISHED? where if and otherwise
11
k-Nearest Neighbour 2 Example: 1NN: 5-NN: Voronoi Diagram
Voronoi Diagram: The decision surface is induced by a 1-Nearest Neighbour algorithm for a typical set of training examples. The convex surrounding of each training example indicates the region of query points whose classification will be completely determined by the training example.
12
k-Nearest Neighbour 3 REFINEMENT: The weights of the neighbours are taken into account relative to their distance to the query point. The farther a neighbour the less is its influence... where To accommodate the case where the query point exactly matches one of the training instances and the denominator therefore is zero, we assign to be in this case Distance-weight for real-valued target function:
13
Remarks on k-Nearest Neighbour Algorithm
PROBLEM: The measurement of the distance between two instances considers every attribute. So even irrelevant attributes can influence the approximation. EXAMPLE: n =20 but only 2 attributes are relevant SOLUTION: Weight each attribute differently when calculating the distance between two neighbours: stretching the relevant axes in Euclidian space: shortening the axes that correspond to less relevant attributes lengthening the axes that correspond to more relevant attribute PROBLEM: Determine which weight belongs to which attribute automatically? Cross-validation Leave-one-out
14
Remarks on k-Nearest Neighbour Algorithm 2
ADVANTAGE: The training phase is processed very fast Can learn complex target function Robust to noisy training data Quite effective when a sufficiently large set of training data is provided Under very general conditions holds: where P is the probability of the error DISADVANTAGE: Alg. delays all processing until a new query is received => significant computation can be required to process; efficient memory indexing Processing is slow Sensibility about escape of the dimensions BIAS: Inductive bias corresponds to an assumption that the classification of an instance will be most similar to the classification of other instances that are nearby in Euclidean distance
15
Locally Weighted Regression
IDEA: Generalization of Nearest Neighbour Alg. It constructs an explicit approximation to f over a local region surrounding It uses nearby or distance-weighted training examples to form the local approximation to f. Local: The function is approximated based solely on the training data near the query point Weighted: The construction of each training example is weighted by its distance from the query point Regression: Means approximating a real-valued target function
16
Locally Weighted Regression
PROCEDURE: Given a new query , construct an approximation that fits the training examples in the neighbourhood surrounding This approximation is used to calculate , which is as the estimated target value assigned to the query instance. The description of may change, because a different local approximation will be calculated for each instance
17
Locally Weighted Regression 2
PROCEDURE: Given new query , construct an approximation that fits the training examples in the surrounding neighbourhood How can be calculated? Linear function Quadratic function Multilayer neural network ... This approximation is used to calculate , which is the output of the estimated target value for the query instance The description of may be deleted, because a different local approximation will be calculated for every distinct query instance
18
Locally Weighted Linear Regression
Special case of LWR, simple computation LINEAR HYPOTHESIS SPACE: where the rth attribute of x, x variable of the hypotheses space Define the error criterion E in order to emphasize the fitting of the local training example Minimise the squared error over just k nearest neighbours: Minimise the squared error over the entire set D using some kernel function K to decrease this error based on the distance Combine and
19
Locally Weighted Linear Regression 2
The third error criterion is a good approximation to the second one and it has the advantage that computational costs are independent of the total number of training examples If is chosen and the gradient descent rule is rederived (see NN) the following training rule is obtained
20
Evaluation Locally Weighted Regression
ADVANTAGE Pointwise approximation of a complex target function Earlier data has no influence on the new ones DISADVANTAGE The quality of the result depends on Choice of the function Choice of the kernel function K Choice of the hypothesis space H Sensibility against the relevant and irrelevant attributes
21
Case-Based Reasoning (CBR)
Instance-based methods and locally weighted regression: lazy learning; They classify new query instances by analysing similar instances and ignoring the very different ones They represent instances as real-valued points in an n-dimensional Euclidian space CBR: first two principles and instances are represented by using a richer symbolic description and the methods used to retrieval
22
Case-Based Reasoning 2 Given: a new case (instance)
Search for relevant cases in the Case-Library Select the best one from them Derive a solution Evaluate the found solution Add the solved case in the Case-Library
23
Case-Based Reasoning 3 HOW ARE THE INSTANCES REPRESENTED? complex logical relational description Example ((user-complaint error53 on shutdown) (CPU-model Power PC) (operating-system Windows) (network connection PCIA) (memory 48meg) (installed-application Excel Netscape) (disk 1gig) (likely-causes ???)) HOW CAN THE SIMILARITY BE MEASURED? See Example CADET
24
CADET Prototype example of case based reasoning systems
Assists in the conceptual design of simple mechanical devices, such as water faucets It uses a library containing approximately 75 previous designs and design fragments to suggest a conceptual design to meet the specifications of the new design Each instance <qualitative function, mechanical structure> is stored New design problem: Specify desired function Desired: Corresponding structure
25
CADET Example
26
CADET Example 2 Searches for subgraph isomorphisms between the two function graphs, so that parts of a case can be found to match parts of the design specification The system elaborates the original function specification graph in order to create functionally equivalent graphs that match still more cases It uses general knowledge about physical influences to create these elaborated function graphs: rewrite rule: x is a universally quantified variable Combination to gain new solution: based on the knowledge-based reasoning
27
Evaluation of CBR ADVANTAGE: DISADVANTAGE
Formation of autonomous thinking systems ??? DISADVANTAGE Hierarchical system Memory indexing Syntactical similarity measurement Possibility of incompability between two neighboured cases -> impossible combination Evaluation of the recognised solution
28
Evaluation of Lazy Algorithms
DIFFERENCE TO EAGER LEARNING Computational time less during the training phase longer during the classification Classification: training samples always remain obtained compute an instance specification approximation Generalization accuracy local approximations are computed Bias: consider the query instance when deciding how to generalize beyond the training data PROBLEMS: Efficiently labeling new instances Determining an appropriate distance measure Influence of irrelevant attributes
29
Summary Lazy learning: Delay processing of training examples until they must label a new query instance. The result is several local approximations. k-Nearest neighbour: An instance is a point in the n-dimensional Euclidean space. The target function value for a new query is estimated from the known values of the k nearest training examples. Locally weighted regression: Explicit local approximation to the target function is constructed for each query instance (form: constant, linear,...) Case-based reasoning: Instances are represented by complex logical description. A rich variety of methods is proposed for mapping from the training examples to the target function values for new instances.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.