Learning: Nearest Neighbor Artificial Intelligence CMSC January 31, 2002
Agenda Machine learning: Introduction Nearest neighbor techniques –Applications: Robotic motion, Credit rating Efficient implementations: –k-d trees, parallelism Extensions: K-nearest neighbor Limitations: –Distance, dimensions, & irrelevant attributes
Machine Learning Learning: Acquiring a function, based on past inputs and values, from new inputs to values. Learn concepts, classifications, values –Identify regularities in data
Machine Learning Examples Pronunciation: –Spelling of word => sounds Speech recognition: –Acoustic signals => sentences Robot arm manipulation: –Target => torques Credit rating: –Financial data => loan qualification
Machine Learning Characterization Distinctions: –Are output values known for any inputs? Supervised vs unsupervised learning –Supervised: training consists of inputs + true output value »E.g. letters+pronunciation –Unsupervised: training consists only of inputs »E.g. letters only Course studies supervised methods
Machine Learning Characterization Distinctions: –Are output values discrete or continuous? Discrete: “Classification” –E.g. Qualified/Unqualified for a loan application Continuous: “Regression” –E.g. Torques for robot arm motion Characteristic of task
Machine Learning Characterization Distinctions: –What form of function is learned? Also called “inductive bias” Graphically, decision boundary E.g. Single, linear separator –Rectangular boundaries - ID trees –Vornoi spaces…etc…
Machine Learning Functions Problem: Can the representation effectively model the class to be learned? Motivates selection of learning algorithm For this function, Linear discriminant is GREAT! Rectangular boundaries (e.g. ID trees) TERRIBLE! Pick the right representation!
Machine Learning Features Inputs: –E.g.words, acoustic measurements, financial data –Vectors of features: E.g. word: letters –‘cat’: L1=c; L2 = a; L3 = t Financial data: F1= # late payments/yr : Integer F2 = Ratio of income to expense: Real
Machine Learning Features Question: –Which features should be used? –How should they relate to each other? Issue 1: How do we define relation in feature space if features have different scales? –Solution: Scaling/normalization Issue 2: Which ones are important? –If differ in irrelevant feature, should ignore
Complexity & Generalization Goal: Predict values accurately on new inputs Problem: –Train on sample data –Can make arbitrarily complex model to fit –BUT, will probably perform badly on NEW data Strategy: –Limit complexity of model (e.g. degree of equ’n) –Split training and validation sets Hold out data to check for overfitting
Nearest Neighbor Memory- or case- based learning Supervised method: Training –Record labeled instances and feature-value vectors For each new, unlabeled instance –Identify “nearest” labeled instance –Assign same label Consistency heuristic: Assume that a property is the same as that of the nearest reference case.
Nearest Neighbor Example Problem: Robot arm motion –Difficult to model analytically Kinematic equations –Relate joint angles and manipulator positions Dynamics equations –Relate motor torques to joint angles –Difficult to achieve good results modeling robotic arms or human arm Many factors & measurements
Nearest Neighbor Example Solution: –Move robot arm around –Record parameters and trajectory segment Table: torques, positions,velocities, squared velocities, velocity products, accelerations –To follow a new path: Break into segments Find closest segments in table Get those torques (interpolate as necessary)
Nearest Neighbor Example Issue: Big table –First time with new trajectory “Closest” isn’t close Table is sparse - few entries Solution: Practice –As attempt trajectory, fill in more of table After few attempts, very close
Nearest Neighbor Example II Credit Rating: –Classifier: Good / Poor –Features: L = # late payments/yr; R = Income/Expenses Name L R G/P A0 1.2G B25 0.4P C5 0.7 G D P E P F G G G H P
Nearest Neighbor Example II Name L R G/P A0 1.2G B25 0.4P C5 0.7 G D P E P F G G G H P L R A B C D E F G H
Nearest Neighbor Example II L A B C D E F G H R Name L R G/P H I J G HP I ?? J Distance Measure: Sqrt ((L1-L2)^2 + [sqrt(10)*(R1-R2)]^2)) - Scaled distance
Efficient Implementations Classification cost: –Find nearest neighbor: O(n) Compute distance between unknown and all instances Compare distances –Problematic for large data sets Alternative: –Use binary search to reduce to O(log n)
Efficient Implementation: K-D Trees Divide instances into sets based on features –Binary branching: E.g. > value –2^d leaves with d split path = n d= O(log n) –To split cases into sets, If there is one element in the set, stop Otherwise pick a feature to split on –Find average position of two middle objects on that dimension »Split remaining objects based on average position »Recursively split subsets
K-D Trees: Classification R > 0.825? L > 17.5?L > 9 ? No Yes R > 0.6?R > 0.75?R > ?R > ? No YesNo Yes No PoorGood Yes No Yes GoodPoor NoYes Good No Poor Yes Good
Efficient Implementation: Parallel Hardware Classification cost: –# distance computations Const time if O(n) processors –Cost of finding closest Compute pairwise minimum, successively O(log n) time
Nearest Neighbor: Issues Prediction can be expensive if many features Affected by classification, feature noise –One entry can change prediction Definition of distance metric –How to combine different features Different types, ranges of values Sensitive to feature selection
Nearest Neighbor Analysis Problem: –Ambiguous labeling, Training Noise Solution: –K-nearest neighbors Not just single nearest instance Compare to K nearest neighbors –Label according to majority of K What should K be? –Often 3, can train as well
Nearest Neighbor: Analysis Issue: –What is a good distance metric? –How should features be combined? Strategy: –(Typically weighted) Euclidean distance –Feature scaling: Normalization Good starting point: –(Feature - Feature_mean)/Feature_standard_deviation –Rescales all values - Centered on 0 with std_dev 1
Nearest Neighbor: Analysis Issue: –What features should we use? E.g. Credit rating: Many possible features –Tax bracket, debt burden, retirement savings, etc.. –Nearest neighbor uses ALL –Irrelevant feature(s) could mislead Fundamental problem with nearest neighbor
Nearest Neighbor: Advantages Fast training: –Just record feature vector - output value set Can model wide variety of functions –Complex decision boundaries –Weak inductive bias Very generally applicable
Summary Machine learning: –Acquire function from input features to value Based on prior training instances –Supervised vs Unsupervised learning Classification and Regression –Inductive bias: Representation of function to learn Complexity, Generalization, & Validation
Summary: Nearest Neighbor Nearest neighbor: –Training: record input vectors + output value –Prediction: closest training instance to new data Efficient implementations Pros: fast training, very general, little bias Cons: distance metric (scaling), sensitivity to noise & extraneous features