Nearest Neighbor Ling 572 Advanced Statistical Methods in NLP January 12, 2012.

Slides:

Advertisements

Similar presentations

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Advertisements

Data Mining Classification: Alternative Techniques

Data Mining Classification: Alternative Techniques

K-means method for Signal Compression: Vector Quantization

1 CS 391L: Machine Learning: Instance Based Learning Raymond J. Mooney University of Texas at Austin.

1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)

Lazy vs. Eager Learning Lazy vs. eager learning

Statistical Classification Rong Jin. Classification Problems X Input Y Output ? Given input X={x 1, x 2, …, x m } Predict the class label y  Y Y = {-1,1},

Classification and Decision Boundaries

Navneet Goyal. Instance Based Learning  Rote Classifier  K- nearest neighbors (K-NN)  Case Based Resoning (CBR)

Final review LING572 Fei Xia Week 10: 03/13/08 1.

K nearest neighbor and Rocchio algorithm

Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.

Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Nearest Neighbor Models (Reading: Chapter.

Classification Dr Eamonn Keogh Computer Science & Engineering Department University of California - Riverside Riverside,CA Who.

Instance Based Learning

1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.

CES 514 – Data Mining Lec 9 April 14 Mid-term k nearest neighbor.

Aprendizagem baseada em instâncias (K vizinhos mais próximos)

KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.

Instance Based Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán) 1.

INSTANCE-BASE LEARNING

Memory-Based Learning Instance-Based Learning K-Nearest Neighbor.

Evolutionary Search Artificial Intelligence CSPP January 28, 2004.

CS Instance Based Learning1 Instance Based Learning.

1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.

Module 04: Algorithms Topic 07: Instance-Based Learning

Learning: Nearest Neighbor Artificial Intelligence CMSC January 31, 2002.

Decision Trees Advanced Statistical Methods in NLP Ling572 January 10, 2012.

Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

APPLICATIONS OF DATA MINING IN INFORMATION RETRIEVAL.

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

K Nearest Neighborhood (KNNs)

DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.

1 Data Mining Lecture 5: KNN and Bayes Classifiers.

LOGO Ensemble Learning Lecturer: Dr. Bo Yuan

K Nearest Neighbors Saed Sayad 1www.ismartsoft.com.

 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 9 Instance-Based.

1 Instance Based Learning Ata Kaban The University of Birmingham.

Nearest Neighbor & Information Retrieval Search Artificial Intelligence CMSC January 29, 2004.

Nearest Neighbor & Information Retrieval Search Artificial Intelligence CMSC January 29, 2004.

Evolutionary Search Artificial Intelligence CMSC January 25, 2007.

Learning with Decision Trees Artificial Intelligence CMSC February 20, 2003.

Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.

1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.

USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.

CpSc 881: Machine Learning Instance Based Learning.

CpSc 810: Machine Learning Instance Based Learning.

Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.

Lazy Learners K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.

DATA MINING LECTURE 10b Classification k-nearest neighbor classifier

Meta-learning for Algorithm Recommendation Meta-learning for Algorithm Recommendation Background on Local Learning Background on Algorithm Assessment Algorithm.

CS Machine Learning Instance Based Learning (Adapted from various sources)

K-Nearest Neighbor Learning.

Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.

Kansas State University Department of Computing and Information Sciences CIS 890: Special Topics in Intelligent Systems Wednesday, November 15, 2000 Cecil.

Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.

1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.

CS 8751 ML & KDDInstance Based Learning1 k-Nearest Neighbor Locally weighted regression Radial basis functions Case-based reasoning Lazy and eager learning.

Data Mining and Text Mining. The Standard Data Mining process.

Instance Based Learning

K Nearest Neighbor Classification

Instance Based Learning

Machine Learning: UNIT-4 CHAPTER-1

Artificial Intelligence CMSC January 27, 2004

Text Categorization Berlin Chen 2003 Reference:

Memory-Based Learning Instance-Based Learning K-Nearest Neighbor

Presentation transcript:

Nearest Neighbor Ling 572 Advanced Statistical Methods in NLP January 12, 2012

Roadmap Instance-based learning Examples and Motivation Basic algorithm: k-Nearest Neighbors Issues: Selecting Weighting Voting Summary HW #2

Instance-based Learning aka “Lazy Learning” No explicit training procedure Just store / tabulate training instances Many examples: k-Nearest Neighbor (kNN) Locally weighted regression Radial basis functions Case-based reasoning kNN : most widely used variant

Nearest Neighbor Example I Problem: Robot arm motion Difficult to model analytically Kinematic equations Relate joint angles and manipulator positions Dynamics equations Relate motor torques to joint angles Difficult to achieve good results modeling robotic arms or human arm Many factors & measurements

Nearest Neighbor Example Solution: Move robot arm around Record parameters and trajectory segment Table: torques, positions,velocities, squared velocities, velocity products, accelerations To follow a new path: Break into segments Find closest segments in table Get those torques (interpolate as necessary)

Nearest Neighbor Example Issue: Big table First time with new trajectory “Closest” isn’t close Table is sparse - few entries Solution: Practice As attempt trajectory, fill in more of table After few attempts, very close

Nearest Neighbor Example II Topic Tracking Task: Given a sample number of exemplar documents, find other documents on same topic Approach: Features: ‘terms’ : words or phrases, stemmed Weights: tf-idf: term frequency, inverse document freq Modified New document : assign label of nearest neighbors

Instance-based Learning III Example-based Machine Translation (EBMT) Task: Translation in the sales domain Approach: Collect translation pairs: Source language: target language Decompose into subphrases via minimal pairs How much is that red umbrella? Ano akai kasa wa ikura desu ka. How much is that small camera? Ano chiisai kamera wa ikura desu ka. New sentence to translate: Find most similar source language subunits Compose corresponding subunit translations

Nearest Neighbor Memory- or case- based learning Consistency heuristic: Assume that a property is the same as that of the nearest reference case. Supervised method: Training Record labelled instances and feature-value vectors For each new, unlabelled instance Identify “nearest” labelled instance(s) Assign same label

Nearest Neighbor Example Credit Rating: Classifier: Good / Poor Features: L = # late payments/yr; R = Income/Expenses Name L R G/P A0 1.2G B25 0.4P C5 0.7 G D P E P F G G G H P

Nearest Neighbor Example Name L R G/P A0 1.2G B25 0.4P C5 0.7 G D P E P F G G G H P L R A B C D E F G H

Nearest Neighbor Example L A B C D E F G H R Name L R G/P I J K

Nearest Neighbor Example L A B C D E F G H R Name L R G/P I J K I

Nearest Neighbor Example L A B C D E F G H R Name L R G/P I J K G I

Nearest Neighbor Example L A B C D E F G H R Name L R G/P I J K G I

Nearest Neighbor Example L A B C D E F G H R Name L R G/P I J K G I J

Nearest Neighbor Example L A B C D E F G H R Name L R G/P I J K G IP J ??

Nearest Neighbor Example L A B C D E F G H R Name L R G/P I J K G IP J ?? K

Nearest Neighbor Example L A B C D E F G H R Name L R G/P I J K G IP J ?? K Distance Measure: Scaled distance

k Nearest Neighbor Determine a value k Calculate the distance between new instance and all training instances Sort training instances by distance; select k nearest Identify labels of k nearest neighbors Use voting mechanism on k to determine classification

Issues: What’s K? How do we weight/scale/select features? How do we combine instances by voting?

Selecting K Just pick a number? Percentage of samples? Cross-validation: Split data into Training set (and validation set) Development set Test set Pick k with lowest cross-validation error Under n-fold cross-validation

Feature Values Scale: Remember credit rating Features: income/outgo ratio [0.5,2] Late payments: [0,30]; … AGI: [8K,500K] What happens with Euclidean distance? Features with large magnitude dominate Approach: Rescale: i.e. normalize to [0,1]

Feature Selection Consider: Large feature set (100) Few relevant features (2) What happens? Trouble! Differences in irrelevant features likely to dominate k-NN sensitive to irrelevant attributes in high dimensional feature space Can be useful, but feature selection is key

Feature Weighting What if you want some features to be more important? Or less important Reweighting a dimension i by weight w i Can increase or decrease weight of feature on that dim. Set weight to 0? Ignores corresponding feature How can we set the weights? Cross-validation ;-)

Distance Measures Euclidean distance: Weighted Euclidean distance: Cosine similarity:

Voting in kNN Suppose we have found the k nearest neighbors Let f i (x) be the class label of i th neighbor of x δ(c,f i (x)) is 1 if f i (x) = c and 0 otherwise Then g(c)=Σ i δ(c,f i (x)): # of neighbors with label c

Voting in kNN Alternatives: Majority vote: c * =argmax c g(c) Weighted voting: neighbors have different weights c * = argmax c Σ i w i δ(c,f i (x)) Weighted voting allows use many training examples Could use all samples weighted by inverse distance

kNN: Strengths Advantages: Simplicity (conceptual) Efficiency (in training) No work at all Handles multi-class classification Stability/robustness: average of many neighbor votes Accuracy: with large data set

kNN: Weaknesses Disadvantages: (In)efficiency at testing time Distance computation for all N at test time Lack of theoretical basis Sensitivity to irrelevant features Distance metrics unclear on non-numerical/binary values

HW #2 Decision trees Text classification: Same data, categories as Q5 (Ling570: HW #8) Features: Words Build and evaluate decision trees with Mallet Build and evaluate decision trees with your own code

Building your own DT learner Nodes test single feature Features are binary: Present/not present DT is binary branching Selection criterion: Information gain Early stopping heuristics: Information gain above threshold Tree depth below threshold

Efficiency Caution: There are LOTS of features You’ll be testing whether or not a feature is present for each class repeatedly (c,f) or (c,~f) Think about ways to do this efficiently

Efficient Implementations Classification cost: Find nearest neighbor: O(n) Compute distance between unknown and all instances Compare distances Problematic for large data sets Alternative: Use binary search to reduce to O(log n)

Efficient Implementation: K-D Trees Divide instances into sets based on features Binary branching: E.g. > value 2^d leaves with d split path = n d= O(log n) To split cases into sets, If there is one element in the set, stop Otherwise pick a feature to split on Find average position of two middle objects on that dimension Split remaining objects based on average position Recursively split subsets

K-D Trees: Classification R > 0.825?L > 17.5?L > 9 ? No Yes R > 0.6?R > 0.75?R > ?R > ? No YesNo Yes No PoorGood Yes No Yes GoodPoor NoYes Good No Poor Yes Good

Efficient Implementation: Parallel Hardware Classification cost: # distance computations Const time if O(n) processors Cost of finding closest Compute pairwise minimum, successively O(log n) time