Lazy Learners K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.

Slides:



Advertisements
Similar presentations
1 Classification using instance-based learning. 3 March, 2000Advanced Knowledge Management2 Introduction (lazy vs. eager learning) Notion of similarity.
Advertisements

Machine Learning Instance Based Learning & Case Based Reasoning Exercise Solutions.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
K-means method for Signal Compression: Vector Quantization
1 CS 391L: Machine Learning: Instance Based Learning Raymond J. Mooney University of Texas at Austin.
Preventing Overfitting Problem: We don’t want to these algorithms to fit to ``noise’’ The generated tree may overfit the training data –Too many branches,
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
Lazy vs. Eager Learning Lazy vs. eager learning
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning 4.1 Introduction Instance-Based Learning: Local approximation to the.
Classification and Decision Boundaries
Data Mining Classification: Alternative Techniques
Navneet Goyal. Instance Based Learning  Rote Classifier  K- nearest neighbors (K-NN)  Case Based Resoning (CBR)
Instance Based Learning
More Classifier and Accuracy Measure of Classifiers
K nearest neighbor and Rocchio algorithm
Bayesian classifiers.
Instance based learning K-Nearest Neighbor Locally weighted regression Radial basis functions.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Nearest Neighbor Models (Reading: Chapter.
Instance Based Learning
Instance-Based Learning
Instance Based Learning. Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return the answer associated.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Classification and Regression. Classification and regression  What is classification? What is regression?  Issues regarding classification and regression.
Data Mining Classification: Alternative Techniques
These slides are based on Tom Mitchell’s book “Machine Learning” Lazy learning vs. eager learning Processing is delayed until a new instance must be classified.
Lecture outline Classification Naïve Bayes classifier Nearest-neighbor classifier.
CES 514 – Data Mining Lec 9 April 14 Mid-term k nearest neighbor.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Instance Based Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán) 1.
INSTANCE-BASE LEARNING
CS Instance Based Learning1 Instance Based Learning.
Bayesian Networks. Male brain wiring Female brain wiring.
K Nearest Neighborhood (KNNs)
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
Classification and Prediction (cont.) Pertemuan 10 Matakuliah: M0614 / Data Mining & OLAP Tahun : Feb
Basic Data Mining Technique
Chapter 8 The k-Means Algorithm and Genetic Algorithm.
11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 1 Prediction.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 9 Instance-Based.
Classification and Prediction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot Readings: Chapter 6 – Han and Kamber.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
1 Instance Based Learning Ata Kaban The University of Birmingham.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
CpSc 881: Machine Learning Instance Based Learning.
CpSc 810: Machine Learning Instance Based Learning.
Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
CS Machine Learning Instance Based Learning (Adapted from various sources)
K-Nearest Neighbor Learning.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Kansas State University Department of Computing and Information Sciences CIS 890: Special Topics in Intelligent Systems Wednesday, November 15, 2000 Cecil.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
CS 8751 ML & KDDInstance Based Learning1 k-Nearest Neighbor Locally weighted regression Radial basis functions Case-based reasoning Lazy and eager learning.
1 Instance Based Learning Soongsil University Intelligent Systems Lab.
Classification Nearest Neighbor
Data Mining: Concepts and Techniques (3rd ed
Instance Based Learning (Adapted from various sources)
Classification Nearest Neighbor
Nearest-Neighbor Classifiers
Instance Based Learning
COSC 4335: Other Classification Techniques
Chap 8. Instance Based Learning
Machine Learning: UNIT-4 CHAPTER-1
Nearest Neighbors CSC 576: Data Mining.
CSE4334/5334 Data Mining Lecture 7: Classification (4)
Presentation transcript:

Lazy Learners K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures

Classification and Prediction Eager Learners: when given a set of training tuples, will construct a generalization model before receiving new tuples to classify Classification by decision tree induction Rule-based classification Classification by back propagation Support Vector Machines (SVM) Associative classification

Lazy vs. Eager Learning Lazy vs. eager learning Lazy learning (e.g., instance-based learning): Simply stores training data (or only minor processing) and waits until it is given a test tuple Eager learning (the above discussed methods): Given a set of training set, constructs a classification model before receiving new (e.g., test) data to classify Lazy: less time in training but more time in predicting

Lazy Learner: Instance-Based Methods Typical approaches k-nearest neighbor approach Instances represented as points in a Euclidean space.

The k-Nearest Neighbor Algorithm All instances correspond to points in the n-D space The nearest neighbor are defined in terms of Euclidean distance, dist(X 1, X 2 ) Target function could be discrete- or real- valued For discrete-valued, k-NN returns the most common value among the k training examples nearest to x q. _ + _ xqxq + __ + _ _ +

The k-Nearest Neighbor Algorithm k-NN for real-valued prediction for a given unknown tuple Returns the mean values of the k nearest neighbors Distance-weighted nearest neighbor algorithm Weight the contribution of each of the k neighbors according to their distance to the query x q Give greater weight to closer neighbors Robust to noisy data by averaging k-nearest neighbors

The k-Nearest Neighbor Algorithm How can I determine the value of k, the number of neighbors? In general, the larger the number of training tuples is, the larger the value of k is Nearest-neighbor classifiers can be extremely slow when classifying test tuples O(n) By simple presorting and arranging the stored tuples into search tree, the number of comparisons can be reduced to O(logN)

The k-Nearest Neighbor Algorithm Example: K=5

Fuzzy Set Approaches Rule-based systems for classification have the disadvantage that they involve sharp cutoffs for continuous attributes For example: IF (years_employed>2) AND (income>50K) THEN credit_card=approved What if a customer has 10 years employed and income is 49K?

Fuzzy Set Approaches Instead, we can discretize income into categories such as {low,medium,high}, and then apply fuzzy logic to allow “fuzzy” threshold for each category

Fuzzy Set Approaches Fuzzy theory is also known as possibility theory, it was proposed by Lotif Zadeh in 1965 Unlike the notion of traditional “crisp” sets where an element either belongs to a set S, in fuzzy theory, elements can belong to more than one fuzzy set

Fuzzy Set Approaches For example, the income value $49K belongs to both the medium and high fuzzy sets: M medium ($49K)=0.15 and M high ($49K)=0.96

Fuzzy Set Approaches Another example for temperature

Classifier Accuracy Measures classes(Real) buy computer = yes (Real) buy computer = no total (Predict) buy computer = yes (Predict) buy computer = no total7000 (Buy Computer) 3000 (Does not buy Computer) 10000

Classifier Accuracy Measures Alternative accuracy measures (e.g., for cancer diagnosis) sensitivity = t-pos/pos specificity = t-neg/neg precision = t-pos/(t-pos + f-pos) accuracy =