Download presentation
1
K Nearest Neighbors Classifier & Decision Trees
X K Nearest Neighbors Classifier & Decision Trees
2
Content K Nearest Neighbors Decision Trees Binary Decision Trees
Linear Decision Trees Chi-Squared Automatic Interaction Detector (CHAID) Classification and Regression Trees (CART)
3
K Nearest Neighbors K Nearest Neighbors Advantage Disadvantage
Nonparametric architecture Simple Powerful Requires no training time Disadvantage Memory intensive Classification/estimation is slow
4
K Nearest Neighbors The key issues involved in training this model includes setting the variable K Validation techniques (ex. Cross validation) the type of distant metric Euclidean measure
5
Figure K Nearest Neighbors Example
Stored training set patterns X input pattern for classification --- Euclidean distance measure to the nearest three patterns
6
Store all input data in the training set
For each pattern in the test set Search for the K nearest patterns to the input pattern using a Euclidean distance measure For classification, compute the confidence for each class as Ci /K, (where Ci is the number of patterns among the K nearest patterns belonging to class i.) The classification for the input pattern is the class with the highest confidence.
7
Training parameters and typical settings
Number of nearest neighbors The numbers of nearest neighbors (K) should be based on cross validation over a number of K setting. When k=1 is a good baseline model to benchmark against. A good rule-of-thumb numbers is k should be less than the square root of the total number of training patterns.
8
Training parameters and typical settings
Input compression Since KNN is very storage intensive, we may want to compress data patterns as a preprocessing step before classification. Using input compression will result in slightly worse performance. Sometimes using compression will improve performance because it performs automatic normalization of the data which can equalize the effect of each input in the Euclidean distance measure.
9
Decision trees Decision trees are popular for pattern recognition because the models they produce are easier to understand. Root node A B Nodes of the tree Leaves (terminal nodes) of the tree Branches (decision point) of the tree C
10
Decision trees -Binary decision trees
Classification of an input vector is done by traversing the tree beginning at the root node, and ending the leaf. Each node of the tree computes an inequality (ex. BMI<24, yes or no) based on a single input variable. Each leaf is assigned to a particular class. Yes No BMI<24
11
Decision trees -Binary decision trees
Since each inequality that is used to split the input space is only based on one input variable. Each node draws a boundary that can be geometrically interpreted as a hyperplane perpendicular to the axis. B C
12
Decision trees -Linear decision trees
aX1+bX2 Yes No Linear decision trees are similar to binary decision trees, except that the inequality computed at each node takes on an arbitrary linear from that may depend on multiple variables.
13
Decision trees -Chi-Squared Automatic Interaction Detector (CHAID)
Branch#1 Branch#2 Branch#3 CHAID is a non-binary decision tree. The decision or split made at each node is still based on a single variable, but can result in multiple branches. The split search algorithm is designed for categorical variables.
14
Decision trees -Chi-Squared Automatic Interaction Detector (CHAID)
Continuous variables must be grouped into a finite number of bins to create categories. A reasonable number of “equal population bins” can be created for use with CHAID. ex. If there are 1000 samples, creating 10 equal population bins would result in 10 bins, each containing 100 samples. A Chi2 value is computed for each variable and used to determine the best variable to split on.
15
Decision trees -Classification and regression trees (CART)
CLASSIFICATION AND REGRESSION TREES (CART) are binary decision trees, which split a single variable at each node. The CART algorithm recursively goes though an exhaustive search of all variables and split values to find the optimal splitting rule for each node.
16
Decision trees -Classification and regression trees (CART)
The optimal splitting criteria at a specific node can be found as follow: Φ(s’/t)=Maxi (Φ(s/t))
17
CART Φ(s’/t)=Maxi (Φ(s/t)) Training set Root node
tL Φ(s’/t)=Maxi (Φ(s/t)) tL= left offspring of node tR= right offspring of node Class j
18
Decision trees -Classification and regression trees (CART)
Pruning rule Cut off the branches of the tree R(t)=r(t)p(t) The sub-tree with the smallest g(t) can then be pruned form the tree.
19
End
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.