Download presentation
Presentation is loading. Please wait.
Published byRebecca Perry Modified over 8 years ago
1
KNN Classifier
2
Handed an instance you wish to classify Look around the nearby region to see what other classes are around Whichever is most common—make that the prediction 8/29/032Instance Based Classification
3
Assign the most common class among the K-nearest neighbors (like a vote) 8/29/033Instance Based Classification
4
8/29/034Instance Based Classification
5
Train Load training data Classify Read in instance Find K-nearest neighbors in the training data Assign the most common class among the K-nearest neighbors (like a vote) 8/29/035 Euclidean distance: a is an attribute (dimension) Instance Based Classification
6
Naïve approach: exhaustive For the instance to be classified Visit every training sample and calculate distance Sort First K in the list 8/29/036 Euclidean distance: a is an attribute (dimension) Instance Based Classification
7
The Work that Must be Performed Visit every training sample and calculate distance Sort Lots of floating point calculations Classifier puts-off work till time to classify 8/29/037Instance Based Classification Euclidean distance: a is an attribute (dimension)
8
This is known as a “lazy” learning method If do most of the work during the training stage known as “eager” Our next classifier, Naïve Bayes, will be eager Training takes a while but can classify fast Which do you think is better? 8/29/038Instance Based Classification Where the work happens
9
From Wikipedia : space ‑ partitioning data structure for organizing points in a k ‑ dimensional space. kd ‑ trees are a useful data structure for several applications, such as searches involving a multidimensional search key (e.g. range searches and nearest neighbor searches). kd-trees are a special case of BSP trees. 8/29/039Instance Based Classification
10
Speeds up classification Probably slows “training” 8/29/03Instance Based Classification10
11
Choosing K can be a bit of an art What if you could include all data-points (K=n)? How might you do such a thing? 8/29/0311Instance Based Classification How include all data points? What if weighted the votes of each training sample by its distance from the point being classified?
12
1 over distance squared Could get less fancy and go linear But then training data very-far-away still have strong influence 8/29/03Instance Based Classification12
13
Other Radial Basis Functions Sometimes known as a Kernel Function One of the more common 8/29/03Instance Based Classification13
14
Work back-loaded Worse the bigger the training data Can alleviate with data structures What else? 8/29/03Instance Based Classification14 Other Issues? What if only some dimensions contribute to ability to classify? Differences in other dimensions would put distance between that point and the target.
15
Book calls this the curse of dimensionality More is not always better Might be identical in important dimensions but distant in others 8/29/0315Instance Based Classification From Wikipedia: In applied mathematics, curse of dimensionality (a term coined by Richard E. Bellman),[1][2] also known as the Hughes effect[3] or Hughes phenomenon[4] (named after Gordon F. Hughes),[5][6] refers to the problem caused by the exponential increase in volume associated with adding extra dimensions to a mathematical space. For example, 100 evenly-spaced sample points suffice to sample a unit interval with no more than 0.01 distance between points; an equivalent sampling of a 10- dimensional unit hypercube with a lattice with a spacing of 0.01 between adjacent points would require 1020 sample points: thus, in some sense, the 10-dimensional hypercube can be said to be a factor of 1018 "larger" than the unit interval. (Adapted from an example by R. E. Bellman; see below.) From Wikipedia: In applied mathematics, curse of dimensionality (a term coined by Richard E. Bellman),[1][2] also known as the Hughes effect[3] or Hughes phenomenon[4] (named after Gordon F. Hughes),[5][6] refers to the problem caused by the exponential increase in volume associated with adding extra dimensions to a mathematical space. For example, 100 evenly-spaced sample points suffice to sample a unit interval with no more than 0.01 distance between points; an equivalent sampling of a 10- dimensional unit hypercube with a lattice with a spacing of 0.01 between adjacent points would require 1020 sample points: thus, in some sense, the 10-dimensional hypercube can be said to be a factor of 1018 "larger" than the unit interval. (Adapted from an example by R. E. Bellman; see below.)
16
Thousands of genes Relatively few patients Is there a curse? 8/29/03Instance Based Classification16 gene patient g1g1 g2g2 g3g3 …gngn disease p1p1 x 1,1 x 1,2 x 1,3 …x 1,n Y p2p2 x 2,1 x 2,2 x 2,3 …x 2,n N............ pmpm x m,1 x m,2 x m,3 …x m,n ?
17
Bayesian could Think of discrete data as being pre-binned Remember RNA classification Data in each dimension was A, C, U, or G 8/29/03Instance Based Classification17 How measure distance? A might be closer to G than C or U (A and G are both purines while C and U are pyrimidines). Dimensional distance becomes domain specific. Representation becomes all important If could arrange appropriately could use techniques like Hamming distances Representation becomes all important If could arrange appropriately could use techniques like Hamming distances
18
RednessYellownessMassVolumeClass 4.8164722.347954125.508225.01441apple 2.0363184.879481125.877518.2101lemon 2.7673833.353061109.968733.53737orange 4.3272483.322961118.426619.07535peach 2.961974.124945159.257329.00904orange 5.6557191.706671147.069539.30565apple 8/29/03Instance Based Classification18 First few records in the training data See any issues? Hint: think of how Euclidean distance is calculated Should really normalize the data
19
Function approximation Real valued prediction: take average of nearest k neighbors If don’t know the function and/or it is too complex to “learn”, just plug-in a new value the KNN classifier can “learn” the predicted value on the fly by averaging the nearest neighbors 8/29/0319Instance Based Classification Why average?
20
Choose an m and b that minimizes the squared error But again, computationally How? 8/29/03Instance Based Classification20
21
If want to learn an instantaneous slope Can do local regression Get the slope of a line that fits just the local data 8/29/03Instance Based Classification21
22
For each of the training datum we know what Y should be If we have a randomly generated m and b, these, along with X will tell us a predicted Y Know whether the m and b yield too large or too small a prediction Can nudge “m” and “b” in an appropriate direction (+ or -) Sum these proposed nudges across all training data 8/29/03Instance Based Classification22 Target Y too low Line represents output or predicted Y
23
Which way should m go to reduce error? 8/29/03Instance Based Classification23 y actual Rise b
24
Locally weighted linear regression Would still perform gradient descent Becomes a global function approximation 8/29/03Instance Based Classification24
25
KNN highly effective for many practical problems With sufficient training data Robust to noisy training Work back-loaded Susceptible to dimensionality curse 8/29/03Instance Based Classification25
26
8/29/0326Instance Based Classification
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.