Download presentation
Presentation is loading. Please wait.
1
Machine Learning Group University College Dublin Nearest Neighbour Classifiers Lazy v’s Eager k-NN Condensed NN
2
k-NN 2 Classification problems Exemplar characterised by a set of features; decide class to which exemplar belongs Compare regression problems Exemplar characterised by a set of features; decide value of continuous output (dependant) variable
3
k-NN 3 Lazy v’s Eager D-Trees are an example of an Eager ML Algorithm D-Tree is built in advance off-line Less work to do at run-time k-NN is a Lazy approach Little work done off-line keep training examples, find k nearest at run time
4
k-NN 4 To what class does this belong? Classifying Apples & Pears
5
k-NN 5 Sal Amt Age JCat Gen Sal Amt JC Gn Age Loan Approval System Nearest Neighbour based on Similarity: What does similar mean? Sal Amt Age JCat Gen
6
k-NN 6 Imagine just 2 features 2 features Amount Monthly_Sal o o o o o o Amount Monthly_Sal x x x x x x x x o o o
7
k-NN 7 Voronoi Diagrams query point q nearest neighbor x Indicate areas in which prediction influenced by same set of examples
8
k-NN 8 3-Nearest Neighbors query point q 2x,1o
9
k-NN 9 7-Nearest Neighbors query point q 7 nearest neighbors 3x,4o
10
k-NN 10 k-NN and Noise 1-NN easy to implement susceptible to noise a misclassification every time a noisy pattern retrieved k-NN with k 3 will overcome this
11
k-NN 11 k-Nearest Neighbour D set of training samples Find k nearest neighbours to q according to this difference criterion For each x i D where Category of q decided by its k Nearest Neighbours
12
k-NN 12 Minkowski Distances Generalisation of Euclidean (p=2) & Manhattan (p=1) distance
13
k-NN 13 Appropriate functions To what class does this belong?
14
k-NN 14 e.g. MVT (now part of Agilent) Machine Vision for inspection of PCBs components present or absent solder joints good or bad
15
k-NN 15 Components present? Absent Present
16
k-NN 16 Characterise image as a set of features
17
k-NN 17 Dimension reduction in k-NN Not all features required noisy features a hindrance Some examples redundant retrieval time depends on no. of examples p features q best features n covering examples m examples Feature Selection Case Selection
18
k-NN 18 Condensed NN D set of training samples Find E where E D; NN rule used with E should be as good as D choose x D randomly, D D \ {x}, E {x}, DO learning? FALSE, FOR EACH x D classify x by NN using E, if classification incorrect then E E {x}, D D \ {x}, learning TRUE, WHILE (learning? FALSE)
19
k-NN 19 Condensed NN 100 examples 2 categories Different CNN solutions
20
k-NN 20 Improving Condensed NN Different outcomes depending on data order that’s a bad thing in an algorithm Sort data based on distance to nearest unlike neighbour A B identify exemplars near decision surface in diagram B more useful than A
21
k-NN 21 Condensed NN 100 examples 2 categories Different CNN solutions CNN using NUN
22
k-NN 22 Aside: Real Data is not Uniform Iris Data in two dimensions
23
k-NN 23 k-NN for spam filtering An Lazy Learning system will be able to adapt to the changing nature of Spam A Local Learning system is good for diverse disjunctive concepts like Spam Porn, mortgage, religion, cheap drugs… Work, family, play… Case base editing techniques exist to improve competence of a case base
24
k-NN 24 Spam Filtering C1 C2 C3 C4 Cn E-mail messages Feature Extraction Case F1 F2 F3 Fn S C1 ~ ~ ~ ~ S1 C2 ~ ~ ~ ~ S2 C3 ~ ~ ~ ~ S3 Cn ~ ~ ~ ~ Sn e.g. Bag-of-Words model in Text Classification & Information Retrieval.
25
k-NN 25 Texts as Bag-of-Words 1. The easiest online school on earth. 2. Here is the information from Computer Science for the next Graduate School Board meeting. 3. Please find attached the agenda for the Graduate School Board meeting. 1. The easiest online school on earth. 2. Here is the information from Computer Science for the next Graduate School Board meeting. 3. Please find attached the agenda for the Graduate School Board meeting. No.EasiestOnlineSchoolEarthInfo.ComputerScienceGraduateBoardMeetingPleaseFindAttachedAgenda 1xxxx 2xxxxxxx 3xxxxxxxx
26
k-NN 26 Texts as Bag-of-Words Similarity can be measured by dot-product between these vectors. Information is lost e.g. sequence information No.EasiestOnlineSchoolEarthInfo.ComputerScienceGraduateBoardMeetingPleaseFindAttachedAgenda 1xxxx 2xxxxxxx 3xxxxxxxx
27
k-NN 27 Runtime System ECUE - Email Classification Using Examples Email Feature Extraction Casebase Feature Selection Casebase Case Selection Casebase Classification spam! Target Case
28
k-NN 28 Classification k-NN classifier with k=3 Unanimous Voting to bias away from False Positives Case Retrieval Net [Lenz et al. 98] improves performance of kNN search
29
k-NN 29 K-NN: Summary ML avoids some KE effort Lazy v’s Eager How k-NN works Dimmension reduction Condensed NN Feature Selection Spam Filtering application
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.