Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers
What is Lazy Learning? Compare ANNs and CBR or k-NN classifier –Artificial Neural Networks are eager learners training examples compiled into a model at training time not available at runtime –CBR or k-Nearest Neighbour are lazy little offline learning done work deferred to runtime Compare conventional use of lazy-eager in computer science
Outline Classification problems Classification techniques k-Nearest Neighbour –Condense training Set –Feature selection –Feature weighting Ensemble techniques in ML
Classification problems Exemplar characterised by a set of features; decide class to which exemplar belongs Compare regression problems Exemplar characterised by a set of features; decide value of continuous output (dependant) variable
Classifying apples and pears To what class does this belong?
Distance/Similarity Function For query q and training set X (described by features F) compute d(x,q) for each x X, where and where Category of q decided by its k Nearest Neighbours
k-NN and Noise 1-NN easy to implement –susceptible to noise a misclassification every time a noisy pattern retrieved k-NN with k 3 will overcome this
e.g. Pregnancy prediction
e.g. MVT Machine Vision for inspection of PCBs –components present or absent –solder joints good or bad
Components present? Absent Present
Characterise image as a set of features
Classification techniques Artificial Neural Networks –also good for non linear regression –black box development tricky users do not know what is going on Decision Trees –built using induction (information theoretic analysis) k-Nearest Neighbour classifiers –keep training examples, find k nearest at run time
Dimension reduction in k-NN Not all features required –noisy features a hindrance Some examples redundant –retrieval time depends on no. of examples p features q best features n covering examples m examples
Condensed NN D set of training samples Find E where E D; NN rule used with E should be as good as with D choose x D randomly, D D \ {x}, E {x}, DO learning? FALSE, FOR EACH x D classify x by NN using E, if classification incorrect then E E {x}, D D \ {x}, learning TRUE, WHILE (learning? FALSE)
Condensed NN 100 examples 2 categories Different CNN solutions
Improving Condensed NN Sort data based on distance to nearest unlike neighbour A B –identify exemplars near decision surface –in diagram B more useful than A Different outcomes depending on data order –that’s a bad thing in an algorithm
Condensed NN 100 examples 2 categories Different CNN solutions CNN using NUN
m Feature selection Irrelevant features are noise: –make classification harder Extra features add to computation cost p
Ensemble techniques For the user with more machine cycles than they know what to do with Outcome Combiner Classifiers Build several classifiers –different training subsets –different feature subsets Aggregate results –voting vote based on generalisation error
Conclusions Finding a covering set of training data –very good solutions exist Compare with results of Ensemble techniques