Machine Learning Group University College Dublin Nearest Neighbour Classifiers Lazy v’s Eager k-NN Condensed NN.

Machine Learning Group University College Dublin Nearest Neighbour Classifiers Lazy v’s Eager k-NN Condensed NN

k-NN 2 Classification problems Exemplar characterised by a set of features; decide class to which exemplar belongs Compare regression problems Exemplar characterised by a set of features; decide value of continuous output (dependant) variable

k-NN 3 Lazy v’s Eager D-Trees are an example of an Eager ML Algorithm  D-Tree is built in advance off-line  Less work to do at run-time k-NN is a Lazy approach  Little work done off-line  keep training examples, find k nearest at run time

k-NN 4 To what class does this belong? Classifying Apples & Pears

k-NN 5 Sal Amt Age JCat Gen  Sal  Amt  JC  Gn  Age Loan Approval System Nearest Neighbour based on Similarity:  What does similar mean? Sal Amt Age JCat Gen

k-NN 6 Imagine just 2 features 2 features  Amount  Monthly_Sal o o o o o o Amount Monthly_Sal x x x x x x x x o o o

k-NN 7 Voronoi Diagrams query point q nearest neighbor x Indicate areas in which prediction influenced by same set of examples

k-NN 8 3-Nearest Neighbors query point q 2x,1o

k-NN 9 7-Nearest Neighbors query point q 7 nearest neighbors 3x,4o

k-NN 10 k-NN and Noise 1-NN easy to implement  susceptible to noise a misclassification every time a noisy pattern retrieved k-NN with k  3 will overcome this

k-NN 11 k-Nearest Neighbour D set of training samples Find k nearest neighbours to q according to this difference criterion For each x i  D where Category of q decided by its k Nearest Neighbours

k-NN 12 Minkowski Distances Generalisation of Euclidean (p=2) & Manhattan (p=1) distance

k-NN 13 Appropriate  functions To what class does this belong?

k-NN 14 e.g. MVT (now part of Agilent) Machine Vision for inspection of PCBs components present or absent solder joints good or bad

k-NN 15 Components present? Absent Present

k-NN 16 Characterise image as a set of features

k-NN 17 Dimension reduction in k-NN Not all features required  noisy features a hindrance Some examples redundant  retrieval time depends on no. of examples p features q best features n covering examples m examples Feature Selection Case Selection

k-NN 18 Condensed NN D set of training samples Find E where E  D; NN rule used with E should be as good as D choose x  D randomly, D  D \ {x}, E  {x}, DO learning?  FALSE, FOR EACH x  D classify x by NN using E, if classification incorrect then E  E  {x}, D  D \ {x}, learning  TRUE, WHILE (learning?  FALSE)

k-NN 19 Condensed NN 100 examples 2 categories Different CNN solutions

k-NN 20 Improving Condensed NN Different outcomes depending on data order  that’s a bad thing in an algorithm Sort data based on distance to nearest unlike neighbour A B identify exemplars near decision surface in diagram B more useful than A

k-NN 21 Condensed NN 100 examples 2 categories Different CNN solutions CNN using NUN

k-NN 22 Aside: Real Data is not Uniform Iris Data in two dimensions

k-NN 23 k-NN for spam filtering An Lazy Learning system will be able to adapt to the changing nature of Spam A Local Learning system is good for diverse disjunctive concepts like Spam Porn, mortgage, religion, cheap drugs… Work, family, play… Case base editing techniques exist to improve competence of a case base

k-NN 24 Spam Filtering C1 C2 C3 C4 Cn E-mail messages Feature Extraction Case F1 F2 F3 Fn S C1 ~ ~ ~ ~ S1 C2 ~ ~ ~ ~ S2 C3 ~ ~ ~ ~ S3 Cn ~ ~ ~ ~ Sn e.g. Bag-of-Words model in Text Classification & Information Retrieval.

k-NN 25 Texts as Bag-of-Words 1. The easiest online school on earth. 2. Here is the information from Computer Science for the next Graduate School Board meeting. 3. Please find attached the agenda for the Graduate School Board meeting. 1. The easiest online school on earth. 2. Here is the information from Computer Science for the next Graduate School Board meeting. 3. Please find attached the agenda for the Graduate School Board meeting. No.EasiestOnlineSchoolEarthInfo.ComputerScienceGraduateBoardMeetingPleaseFindAttachedAgenda 1xxxx 2xxxxxxx 3xxxxxxxx

k-NN 26 Texts as Bag-of-Words Similarity can be measured by dot-product between these vectors. Information is lost e.g. sequence information No.EasiestOnlineSchoolEarthInfo.ComputerScienceGraduateBoardMeetingPleaseFindAttachedAgenda 1xxxx 2xxxxxxx 3xxxxxxxx

k-NN 27 Runtime System ECUE - Email Classification Using Examples Email Feature Extraction Casebase Feature Selection Casebase Case Selection Casebase Classification spam! Target Case

k-NN 28 Classification k-NN classifier with k=3 Unanimous Voting to bias away from False Positives Case Retrieval Net [Lenz et al. 98] improves performance of kNN search

k-NN 29 K-NN: Summary ML avoids some KE effort Lazy v’s Eager How k-NN works Dimmension reduction  Condensed NN  Feature Selection Spam Filtering application

Machine Learning Group University College Dublin Nearest Neighbour Classifiers Lazy v’s Eager k-NN Condensed NN.

Similar presentations

Presentation on theme: "Machine Learning Group University College Dublin Nearest Neighbour Classifiers Lazy v’s Eager k-NN Condensed NN."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Machine Learning Group University College Dublin Nearest Neighbour Classifiers Lazy v’s Eager k-NN Condensed NN.

Similar presentations

Presentation on theme: "Machine Learning Group University College Dublin Nearest Neighbour Classifiers Lazy v’s Eager k-NN Condensed NN."— Presentation transcript:

Similar presentations

About project

Feedback