Learning: Nearest Neighbor Artificial Intelligence CMSC 25000 January 31, 2002.

Slides:



Advertisements
Similar presentations
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Advertisements

K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
1 CS 391L: Machine Learning: Instance Based Learning Raymond J. Mooney University of Texas at Austin.
Indian Statistical Institute Kolkata
Navneet Goyal. Instance Based Learning  Rote Classifier  K- nearest neighbors (K-NN)  Case Based Resoning (CBR)
Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Nearest Neighbor Models (Reading: Chapter.
Classification Dr Eamonn Keogh Computer Science & Engineering Department University of California - Riverside Riverside,CA Who.
Learning: Nearest Neighbor, Perceptrons & Neural Nets
Three kinds of learning
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
CES 514 – Data Mining Lec 9 April 14 Mid-term k nearest neighbor.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor.
Evolutionary Search Artificial Intelligence CSPP January 28, 2004.
Part I: Classification and Bayesian Learning
Module 04: Algorithms Topic 07: Instance-Based Learning
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
K Nearest Neighborhood (KNNs)
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
Nearest Neighbor & Information Retrieval Search Artificial Intelligence CMSC January 29, 2004.
Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.
Learning from observations
Nearest Neighbor & Information Retrieval Search Artificial Intelligence CMSC January 29, 2004.
Evolutionary Search Artificial Intelligence CMSC January 25, 2007.
Learning with Decision Trees Artificial Intelligence CMSC February 20, 2003.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Chapter 6 – Three Simple Classification Methods © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Nearest Neighbor Ling 572 Advanced Statistical Methods in NLP January 12, 2012.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Data Mining and Decision Support
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
Learning with Decision Trees Artificial Intelligence CMSC February 18, 2003.
CS Machine Learning Instance Based Learning (Adapted from various sources)
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Medical Decision Making Learning: Decision Trees Artificial Intelligence CMSC February 10, 2005.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Evolutionary Search Artificial Intelligence CMSC January 31, 2008.
General-Purpose Learning Machine
k-Nearest neighbors and decision tree
Data Science Algorithms: The Basic Methods
Instance Based Learning
Searching by Constraint (Continued)
Classification Nearest Neighbor
Machine Learning Basics
K Nearest Neighbor Classification
Learning with Identification Trees
Classification Nearest Neighbor
Instance Based Learning
Machine Learning in Practice Lecture 23
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
CSC 558 – Data Analytics II, Prep for assignment 1 – Instance-based (lazy) machine learning January 2018.
Artificial Intelligence CMSC January 27, 2004
Nearest Neighbors CSC 576: Data Mining.
Hubs and Authorities & Learning: Perceptrons
Artificial Intelligence CMSC January 25, 2005
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Presentation transcript:

Learning: Nearest Neighbor Artificial Intelligence CMSC January 31, 2002

Agenda Machine learning: Introduction Nearest neighbor techniques –Applications: Robotic motion, Credit rating Efficient implementations: –k-d trees, parallelism Extensions: K-nearest neighbor Limitations: –Distance, dimensions, & irrelevant attributes

Machine Learning Learning: Acquiring a function, based on past inputs and values, from new inputs to values. Learn concepts, classifications, values –Identify regularities in data

Machine Learning Examples Pronunciation: –Spelling of word => sounds Speech recognition: –Acoustic signals => sentences Robot arm manipulation: –Target => torques Credit rating: –Financial data => loan qualification

Machine Learning Characterization Distinctions: –Are output values known for any inputs? Supervised vs unsupervised learning –Supervised: training consists of inputs + true output value »E.g. letters+pronunciation –Unsupervised: training consists only of inputs »E.g. letters only Course studies supervised methods

Machine Learning Characterization Distinctions: –Are output values discrete or continuous? Discrete: “Classification” –E.g. Qualified/Unqualified for a loan application Continuous: “Regression” –E.g. Torques for robot arm motion Characteristic of task

Machine Learning Characterization Distinctions: –What form of function is learned? Also called “inductive bias” Graphically, decision boundary E.g. Single, linear separator –Rectangular boundaries - ID trees –Vornoi spaces…etc…

Machine Learning Functions Problem: Can the representation effectively model the class to be learned? Motivates selection of learning algorithm For this function, Linear discriminant is GREAT! Rectangular boundaries (e.g. ID trees) TERRIBLE! Pick the right representation!

Machine Learning Features Inputs: –E.g.words, acoustic measurements, financial data –Vectors of features: E.g. word: letters –‘cat’: L1=c; L2 = a; L3 = t Financial data: F1= # late payments/yr : Integer F2 = Ratio of income to expense: Real

Machine Learning Features Question: –Which features should be used? –How should they relate to each other? Issue 1: How do we define relation in feature space if features have different scales? –Solution: Scaling/normalization Issue 2: Which ones are important? –If differ in irrelevant feature, should ignore

Complexity & Generalization Goal: Predict values accurately on new inputs Problem: –Train on sample data –Can make arbitrarily complex model to fit –BUT, will probably perform badly on NEW data Strategy: –Limit complexity of model (e.g. degree of equ’n) –Split training and validation sets Hold out data to check for overfitting

Nearest Neighbor Memory- or case- based learning Supervised method: Training –Record labeled instances and feature-value vectors For each new, unlabeled instance –Identify “nearest” labeled instance –Assign same label Consistency heuristic: Assume that a property is the same as that of the nearest reference case.

Nearest Neighbor Example Problem: Robot arm motion –Difficult to model analytically Kinematic equations –Relate joint angles and manipulator positions Dynamics equations –Relate motor torques to joint angles –Difficult to achieve good results modeling robotic arms or human arm Many factors & measurements

Nearest Neighbor Example Solution: –Move robot arm around –Record parameters and trajectory segment Table: torques, positions,velocities, squared velocities, velocity products, accelerations –To follow a new path: Break into segments Find closest segments in table Get those torques (interpolate as necessary)

Nearest Neighbor Example Issue: Big table –First time with new trajectory “Closest” isn’t close Table is sparse - few entries Solution: Practice –As attempt trajectory, fill in more of table After few attempts, very close

Nearest Neighbor Example II Credit Rating: –Classifier: Good / Poor –Features: L = # late payments/yr; R = Income/Expenses Name L R G/P A0 1.2G B25 0.4P C5 0.7 G D P E P F G G G H P

Nearest Neighbor Example II Name L R G/P A0 1.2G B25 0.4P C5 0.7 G D P E P F G G G H P L R A B C D E F G H

Nearest Neighbor Example II L A B C D E F G H R Name L R G/P H I J G HP I ?? J Distance Measure: Sqrt ((L1-L2)^2 + [sqrt(10)*(R1-R2)]^2)) - Scaled distance

Efficient Implementations Classification cost: –Find nearest neighbor: O(n) Compute distance between unknown and all instances Compare distances –Problematic for large data sets Alternative: –Use binary search to reduce to O(log n)

Efficient Implementation: K-D Trees Divide instances into sets based on features –Binary branching: E.g. > value –2^d leaves with d split path = n d= O(log n) –To split cases into sets, If there is one element in the set, stop Otherwise pick a feature to split on –Find average position of two middle objects on that dimension »Split remaining objects based on average position »Recursively split subsets

K-D Trees: Classification R > 0.825? L > 17.5?L > 9 ? No Yes R > 0.6?R > 0.75?R > ?R > ? No YesNo Yes No PoorGood Yes No Yes GoodPoor NoYes Good No Poor Yes Good

Efficient Implementation: Parallel Hardware Classification cost: –# distance computations Const time if O(n) processors –Cost of finding closest Compute pairwise minimum, successively O(log n) time

Nearest Neighbor: Issues Prediction can be expensive if many features Affected by classification, feature noise –One entry can change prediction Definition of distance metric –How to combine different features Different types, ranges of values Sensitive to feature selection

Nearest Neighbor Analysis Problem: –Ambiguous labeling, Training Noise Solution: –K-nearest neighbors Not just single nearest instance Compare to K nearest neighbors –Label according to majority of K What should K be? –Often 3, can train as well

Nearest Neighbor: Analysis Issue: –What is a good distance metric? –How should features be combined? Strategy: –(Typically weighted) Euclidean distance –Feature scaling: Normalization Good starting point: –(Feature - Feature_mean)/Feature_standard_deviation –Rescales all values - Centered on 0 with std_dev 1

Nearest Neighbor: Analysis Issue: –What features should we use? E.g. Credit rating: Many possible features –Tax bracket, debt burden, retirement savings, etc.. –Nearest neighbor uses ALL –Irrelevant feature(s) could mislead Fundamental problem with nearest neighbor

Nearest Neighbor: Advantages Fast training: –Just record feature vector - output value set Can model wide variety of functions –Complex decision boundaries –Weak inductive bias Very generally applicable

Summary Machine learning: –Acquire function from input features to value Based on prior training instances –Supervised vs Unsupervised learning Classification and Regression –Inductive bias: Representation of function to learn Complexity, Generalization, & Validation

Summary: Nearest Neighbor Nearest neighbor: –Training: record input vectors + output value –Prediction: closest training instance to new data Efficient implementations Pros: fast training, very general, little bias Cons: distance metric (scaling), sensitivity to noise & extraneous features