Nearest Neighbor Classifier 1.K-NN Classifier 2.Multi-Class Classification.

Slides:



Advertisements
Similar presentations
Text Categorization.
Advertisements

Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
Linear Classifiers (perceptrons)
1 CS 391L: Machine Learning: Instance Based Learning Raymond J. Mooney University of Texas at Austin.
Hinrich Schütze and Christina Lioma
INTRODUCTION TO Machine Learning 3rd Edition
Text Categorization CSC 575 Intelligent Information Retrieval.
WEB BAR 2004 Advanced Retrieval and Web Mining Lecture 16.
Navneet Goyal. Instance Based Learning  Rote Classifier  K- nearest neighbors (K-NN)  Case Based Resoning (CBR)
Learning for Text Categorization
Large-scale matching CSE P 576 Larry Zitnick
K nearest neighbor and Rocchio algorithm
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Chapter 7 – K-Nearest-Neighbor
Lecture Notes for CMPUT 466/551 Nilanjan Ray
CS276A Text Retrieval and Mining Lecture 16 [Borrows slides from Ray Mooney and Barbara Rosario]
Nearest Neighbor Classifiers other names: –instance-based learning –case-based learning (CBL) –non-parametric learning –model-free learning.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
1 Text Categorization. 2 Categorization Given: –A description of an instance, x  X, where X is the instance language or instance space. –A fixed set.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Advanced Multimedia Text Classification Tamara Berg.

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Vector Space Text Classification
Content-Based Recommendation Systems Michael J. Pazzani and Daniel Billsus Rutgers University and FX Palo Alto Laboratory By Vishal Paliwal.

PrasadL11Classify1 Vector Space Classification Adapted from Lectures by Raymond Mooney and Barbara Rosario.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
K Nearest Neighborhood (KNNs)
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
PrasadL14VectorClassify1 Vector Space Text Classification Adapted from Lectures by Raymond Mooney and Barbara Rosario.
Vector Space Classification (modified from Stanford CS276 slides on Lecture 11: Text Classification; Vector space classification [Borrows slides from Ray.
1 CS276 Information Retrieval and Web Search Lecture 13: Classifiers: kNN, Rocchio, etc. [Borrows slides from Ray Mooney and Barbara Rosario]
Text Classification 2 David Kauchak cs459 Fall 2012 adapted from:
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI
Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Text Classification 1 Chris Manning,
Chapter 6 – Three Simple Classification Methods © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Information Retrieval and Organisation Chapter 14 Vector Space Classification Dell Zhang Birkbeck, University of London.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
CS Machine Learning Instance Based Learning (Adapted from various sources)
Vector Space Classification 1.Vector space text classification 2.Rochhio Text Classification.
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 12 10/4/2011.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
KNN & Naïve Bayes Hongning Wang
بسم الله الرحمن الرحيم وقل ربي زدني علما.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Instance Based Learning
Information Retrieval
Classification Nearest Neighbor
Information Retrieval Christopher Manning and Prabhakar Raghavan
Instance Based Learning (Adapted from various sources)
K Nearest Neighbor Classification
Classification Nearest Neighbor
Text Categorization Assigning documents to a fixed set of categories
Instance Based Learning
Hankz Hankui Zhuo Text Categorization Hankz Hankui Zhuo
Data Mining Classification: Alternative Techniques
Multivariate Methods Berlin Chen
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Presentation transcript:

Nearest Neighbor Classifier 1.K-NN Classifier 2.Multi-Class Classification

k Nearest Neighbor Classification

KNN Important Points

5 kNN decision boundaries Government Science Arts Boundaries are in principle arbitrary surfaces – but usually polyhedra kNN gives locally defined decision boundaries between classes – far away points do not influence each classification decision (unlike in Naïve Bayes, Rocchio, etc.) Sec.14.3

6 Similarity Metrics and Complexity Nearest neighbor method depends on a similarity (or distance) metric. Simplest for continuous m-dimensional instance space is Euclidean distance. Simplest for m-dimensional binary instance space is Hamming distance (number of feature values that differ). For text, cosine similarity of tf.idf weighted vectors is typically most effective. Testing Time: O(B|V t |) where B is the average number of training documents in which a test-document word appears. – Typically B << |D| Sec.14.3

7 Illustration of 3 Nearest Neighbor for Text Vector Space Sec.14.3

8 3 Nearest Neighbor vs. Rocchio Nearest Neighbor tends to handle polymorphic categories better than Rocchio/NB.

9 kNN: Discussion No feature selection necessary Scales well with large number of classes – Don’t need to train n classifiers for n classes Classes can influence each other – Small changes to one class can have ripple effect Scores can be hard to convert to probabilities No training necessary – Actually: perhaps not true. (Data editing, etc.) May be expensive at test time In most cases it’s more accurate than NB or Rocchio Bias/Variance tradeoff – Variance ≈ Capacity kNN has high variance and low bias. – Infinite memory NB has low variance and high bias. – Decision surface has to be linear. Sec.14.3

Types of Classifiers

References Aly M. Survey on Multiclass Classification Methods. Technical report, California Institute of Technology, Stefan Büttcher, Charles L. A. Clarke, Gordon V. Cormack; Information retrieval ; MIT Press, Andoni, Alexandr, Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab Mirrokni Locality-sensitive hashing using stable distributions. In Nearest Neighbor Methods in Learning and Vision: Theory and Practice. MIT Press.