K Nearest Neighbors Classifier & Decision Trees

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

Chapter 7 Classification and Regression Trees
Data Mining Lecture 9.
DECISION TREES. Decision trees  One possible representation for hypotheses.
Random Forest Predrag Radenković 3237/10
CHAPTER 9: Decision Trees
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Evaluation of Decision Forests on Text Categorization
Classification with Multiple Decision Trees
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.
Data Mining Classification: Alternative Techniques
Deriving rules from data Decision Trees a.j.m.m (ton) weijters.
K Means Clustering , Nearest Cluster and Gaussian Mixture
Instance Based Learning
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.
1 Pattern Recognition Pattern recognition is: 1. The name of the journal of the Pattern Recognition Society. 2. A research area in which patterns in data.
Data Mining Techniques Outline
Chapter 2: Pattern Recognition
Scalable and Distributed Similarity Search in Metric Spaces Michal Batko Claudio Gennaro Pavel Zezula.
Basic Data Mining Techniques
1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
CS Instance Based Learning1 Instance Based Learning.
Module 04: Algorithms Topic 07: Instance-Based Learning
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Decision Procedures An Algorithmic Point of View
Lecture Notes 4 Pruning Zhangxi Lin ISQS
Pattern Recognition: Baysian Decision Theory Charles Tappert Seidenberg School of CSIS, Pace University.
CSE 8392 Spring CSE 8392 SPRING 1999 DATA MINING: CORE TOPICS Classification Professor Margaret H. Dunham Department of Computer Science and Engineering.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Chapter 9 – Classification and Regression Trees
Basic Data Mining Technique
K Nearest Neighbors Saed Sayad 1www.ismartsoft.com.
1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.
1 COMP3503 Inductive Decision Trees with Daniel L. Silver Daniel L. Silver.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.
Computer Science: A Structured Programming Approach Using C Trees Trees are used extensively in computer science to represent algebraic formulas;
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
MKT 700 Business Intelligence and Decision Models Algorithms and Customer Profiling (1)
Copyright © 2010 SAS Institute Inc. All rights reserved. Decision Trees Using SAS Sylvain Tremblay SAS Canada – Education SAS Halifax Regional User Group.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
An Introduction to Support Vector Machine (SVM)
1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Classification and Regression Trees
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
DECISION TREES An internal node represents a test on an attribute.
k-Nearest neighbors and decision tree
Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17
Data Science Algorithms: The Basic Methods
Instance Based Learning
Ch9: Decision Trees 9.1 Introduction A decision tree:
Nearest-Neighbor Classifiers
Chapter 7: Transformations
ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.
STT : Intro. to Statistical Learning
Presentation transcript:

K Nearest Neighbors Classifier & Decision Trees X K Nearest Neighbors Classifier & Decision Trees

Content K Nearest Neighbors Decision Trees Binary Decision Trees Linear Decision Trees Chi-Squared Automatic Interaction Detector (CHAID) Classification and Regression Trees (CART)

K Nearest Neighbors K Nearest Neighbors Advantage Disadvantage Nonparametric architecture Simple Powerful Requires no training time Disadvantage Memory intensive Classification/estimation is slow

K Nearest Neighbors The key issues involved in training this model includes setting the variable K Validation techniques (ex. Cross validation) the type of distant metric Euclidean measure

Figure K Nearest Neighbors Example Stored training set patterns X input pattern for classification --- Euclidean distance measure to the nearest three patterns

Store all input data in the training set For each pattern in the test set Search for the K nearest patterns to the input pattern using a Euclidean distance measure For classification, compute the confidence for each class as Ci /K, (where Ci is the number of patterns among the K nearest patterns belonging to class i.) The classification for the input pattern is the class with the highest confidence.

Training parameters and typical settings Number of nearest neighbors The numbers of nearest neighbors (K) should be based on cross validation over a number of K setting. When k=1 is a good baseline model to benchmark against. A good rule-of-thumb numbers is k should be less than the square root of the total number of training patterns.

Training parameters and typical settings Input compression Since KNN is very storage intensive, we may want to compress data patterns as a preprocessing step before classification. Using input compression will result in slightly worse performance. Sometimes using compression will improve performance because it performs automatic normalization of the data which can equalize the effect of each input in the Euclidean distance measure.

Decision trees Decision trees are popular for pattern recognition because the models they produce are easier to understand. Root node A B Nodes of the tree Leaves (terminal nodes) of the tree Branches (decision point) of the tree C

Decision trees -Binary decision trees Classification of an input vector is done by traversing the tree beginning at the root node, and ending the leaf. Each node of the tree computes an inequality (ex. BMI<24, yes or no) based on a single input variable. Each leaf is assigned to a particular class. Yes No BMI<24

Decision trees -Binary decision trees Since each inequality that is used to split the input space is only based on one input variable. Each node draws a boundary that can be geometrically interpreted as a hyperplane perpendicular to the axis. B C

Decision trees -Linear decision trees aX1+bX2 Yes No Linear decision trees are similar to binary decision trees, except that the inequality computed at each node takes on an arbitrary linear from that may depend on multiple variables.

Decision trees -Chi-Squared Automatic Interaction Detector (CHAID) Branch#1 Branch#2 Branch#3 CHAID is a non-binary decision tree. The decision or split made at each node is still based on a single variable, but can result in multiple branches. The split search algorithm is designed for categorical variables.

Decision trees -Chi-Squared Automatic Interaction Detector (CHAID) Continuous variables must be grouped into a finite number of bins to create categories. A reasonable number of “equal population bins” can be created for use with CHAID. ex. If there are 1000 samples, creating 10 equal population bins would result in 10 bins, each containing 100 samples. A Chi2 value is computed for each variable and used to determine the best variable to split on.

Decision trees -Classification and regression trees (CART) CLASSIFICATION AND REGRESSION TREES (CART) are binary decision trees, which split a single variable at each node. The CART algorithm recursively goes though an exhaustive search of all variables and split values to find the optimal splitting rule for each node.

Decision trees -Classification and regression trees (CART) The optimal splitting criteria at a specific node can be found as follow: Φ(s’/t)=Maxi (Φ(s/t))

CART Φ(s’/t)=Maxi (Φ(s/t)) Training set Root node tL Φ(s’/t)=Maxi (Φ(s/t)) tL= left offspring of node tR= right offspring of node Class j

Decision trees -Classification and regression trees (CART) Pruning rule Cut off the branches of the tree R(t)=r(t)p(t) The sub-tree with the smallest g(t) can then be pruned form the tree.

End