Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.

Similar presentations


Presentation on theme: "1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks."— Presentation transcript:

1 1 Pattern Classification X

2 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks

3 General Method Training  Learning knowledge or parameters Testing  Applying learned to new instance 3

4 4 KNN in Digit Recognition

5 5 K Nearest Neighbors  Advantage Nonparametric architecture Simple Powerful Requires no training time  Disadvantage Memory intensive Classification/estimation is slow

6 6 K Nearest Neighbors The key issues involved in training this model includes setting  the variable K Validation techniques (ex. Cross validation)  the type of distant metric Euclidean measure

7 7 Figure K Nearest Neighbors Example X Stored training set patterns X input pattern for classification --- Euclidean distance measure to the nearest three patterns

8 8 Store all input data in the training set For each pattern in the test set Search for the K nearest patterns to the input pattern using a Euclidean distance measure For classification, compute the confidence for each class as C i /K, (where C i is the number of patterns among the K nearest patterns belonging to class i.) The classification for the input pattern is the class with the highest confidence.

9 9 Training parameters and typical settings Number of nearest neighbors  The numbers of nearest neighbors (K) should be based on cross validation over a number of K setting.  When k=1 is a good baseline model to benchmark against.  A good rule-of-thumb numbers is k should be less than the square root of the total number of training patterns.

10 10 Training parameters and typical settings Input compression  Since KNN is very storage intensive, we may want to compress data patterns as a preprocessing step before classification.  Using input compression will result in slightly worse performance.  Sometimes using compression will improve performance because it performs automatic normalization of the data which can equalize the effect of each input in the Euclidean distance measure.

11 11CPC group SeminarThursday, June 1, 2006 Euclidean distance metric fails Pattern to be classifiedPrototype APrototype B  Prototype B seems more similar than Prototype A according to Euclidean distance.  Digit “9” misclassified as “4”.  Possible solution is to use an distance metric invariant to irrelevant transformations.

12 12 Decision trees Decision trees are popular for pattern recognition because the models they produce are easier to understand. Root node AA B B BB A.Nodes of the tree B.Leaves (terminal nodes) of the tree C.Branches (decision point) of the tree C

13 13 Decision trees -Binary decision trees Classification of an input vector is done by traversing the tree beginning at the root node, and ending the leaf. Each node of the tree computes an inequality (ex. BMI<24, yes or no) based on a single input variable. Each leaf is assigned to a particular class. Yes No Yes No Yes BMI<24

14 14 Decision trees -Binary decision trees Since each inequality that is used to split the input space is only based on one input variable. Each node draws a boundary that can be geometrically interpreted as a hyperplane perpendicular to the axis. BC

15 15 Decision trees -Linear decision trees Linear decision trees are similar to binary decision trees, except that the inequality computed at each node takes on an arbitrary linear from that may depend on multiple variables. aX1+bX2 Yes No Yes No Yes

16 Biological Neural Systems Neuron switching time : > 10 -3 secs Number of neurons in the human brain: ~10 10 Connections (synapses) per neuron : ~10 4 –10 5 Face recognition : 0.1 secs High degree of distributed and parallel computation  Highly fault tolerent  Highly efficient  Learning is key

17 Excerpt from Russell and Norvig

18 A Neuron Computation:  input signals  input function(linear)  activation function(nonlinear)  output signal ajaj output links  akak output Input links WkjWkj a i = output(in j ) in j j

19 Part 1. Perceptrons: Simple NN  x1x1 x2x2 xnxn...... w1w1 w2w2 wnwn a=  i=1 n w i x i Xi’s range: [0, 1] 1 if a   y = 0 if a <  y { inputs weights activation output 

20 Decision Surface of a Perceptron x1x1 x2x2 Decision line w 1 x 1 + w 2 x 2 =  w 1 11 0 0 0 0 0 1

21 Linear Separability x1x1 x2x2 10 00 Logical AND x1x1 x2x2 ay 0000 0110 1010 1121 w 1 =1 w 2 =1  =1.5 x1x1 10 0 w 1 =? w 2 =?  = ? 1 Logical XOR x1x1 x2x2 y 000 011 101 110

22 Threshold as Weight: W0  x1x1 x2x2 xnxn...... w1w1 w2w2 wnwn w0w0 x 0 =-1 a=  i=0 n w i x i y 1 if a   y= 0 if a <  {  =w 0 Thus, y= sgn(a)=0 or 1

23 Perceptron Learning Rule w’=w +  (t-y) x w i := w i +  w i = w i +  (t-y) x i (i=1..n) The parameter  is called the learning rate.  In Han’s book it is lower case L  It determines the magnitude of weight updates  w i. If the output is correct (t=y) the weights are not changed (  w i =0). If the output is incorrect (t  y) the weights w i are changed such that the output of the Perceptron for the new weights w’ i is closer/further to the input x i.

24 Perceptron Training Algorithm Repeat for each training vector pair (x,t) evaluate the output y when x is the input if y  t then form a new weight vector w’ according to w’=w +  (t-y) x else do nothing end if end for Until y=t for all training vector pairs or # iterations > k

25 Perceptron Learning Example t=1 t=-1 w=[0.25 –0.1 0.5] x 2 = 0.2 x 1 – 0.5 o=1o=-1 (x,t)=([-1,-1],1) o=sgn(0.25+0.1-0.5) =-1  w=[0.2 –0.2 –0.2] (x,t)=([2,1],-1) o=sgn(0.45-0.6+0.3) =1  w=[-0.2 –0.4 –0.2] (x,t)=([1,1],1) o=sgn(0.25-0.7+0.1) =-1  w=[0.2 0.2 0.2]

26 Part 2. Multi Layer Networks Output nodes Input nodes Hidden nodes Output vector Input vector

27 Can use multi layer to learn nonlinear functions How to set the weights? x1x1 10 0 w 1 =? w 2 =?  = ? 1 Logical XOR x1x1 x2x2 y 000 011 101 110 x1 x2 3 4 5 w23 w35

28 28 End


Download ppt "1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks."

Similar presentations


Ads by Google