Examples of classification methods

Slides:



Advertisements
Similar presentations
Data Mining Lecture 9.
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Computational Learning An intuitive approach. Human Learning Objects in world –Learning by exploration and who knows? Language –informal training, inputs.
DECISION TREES. Decision trees  One possible representation for hypotheses.
Naïve-Bayes Classifiers Business Intelligence for Managers.
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
NEURAL NETWORKS Perceptron
also known as the “Perceptron”
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 27, 2012.
Decision Tree Approach in Data Mining
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
Classification Techniques: Decision Tree Learning
Data Mining Classification: Naïve Bayes Classifier
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
COMP 328: Midterm Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology
-Artificial Neural Network- Chapter 3 Perceptron 朝陽科技大學 資訊管理系 李麗華 教授.
CES 514 – Data Mining Lecture 8 classification (contd…)
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Ensemble Learning: An Introduction
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Data Mining with Neural Networks (HK: Chapter 7.5)
. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.
Gini Index (IBM IntelligentMiner)
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Machine learning Image source:
Machine learning Image source:
Bayesian Networks. Male brain wiring Female brain wiring.
Topics on Final Perceptrons SVMs Precision/Recall/ROC Decision Trees Naive Bayes Bayesian networks Adaboost Genetic algorithms Q learning Not on the final:
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Copyright © 2010 Pearson Education, Inc. Chapter 15 Probability Rules!
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
Classification Techniques: Bayesian Classification
CS690L Data Mining: Classification
School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.
Non-Bayes classifiers. Linear discriminants, neural networks.
Learning with AdaBoost
11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Bayesian Classification
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
EEE502 Pattern Recognition
Classification Today: Basic Problem Decision Trees.
An Interval Classifier for Database Mining Applications Rakes Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer, Arun Swami Proceedings of the 18 th VLDB.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Chapter 4: Algorithms CS 795. Inferring Rudimentary Rules 1R – Single rule – one level decision tree –Pick each attribute and form a single level tree.
CIS 335 CIS 335 Data Mining Classification Part I.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Backpropagation.
Junheng, Shengming, Yunsheng 11/09/2018
A task of induction to find patterns
A task of induction to find patterns
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Examples of classification methods CSIT5210

Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network SVM

KNN Question: Assignment 1 Q1 Solution: 1)Understand the distance function: # of different attributes. The distance between tuple 2 and tuple 3: one attribute is the same and three attributes are different Dist(2,3) = |{Height(low!=med), Weight(med!=high), BloodPressure(med!=high)}| = 3

KNN 2) Calculate the Distance table. Training data Testing data Dist 1 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

KNN 3) For k=1 find the nearest 1 neighbor(choose the smaller id in ties), compare actual and prediction result. We choose the one with smaller id to break tie Dist 1(N) 2(Y) 3(Y) 4(Y) 5(N) 6(N) 7(N) 8(Y) 9(N) 10(Y) 11(Y==Y) 4 3 2 12(N==N) 1 13(Y==Y) 14(Y==Y) 15(N==N) 16(N!=Y) 17(N==N) 18(Y!=N) 19(Y==Y) 20(N==N) There are 2 errors in the the 10 test data(11-20), so the error rate is 2/10 = 0.2

KNN 4) For k=3 repeat the above procedure. Use the majority win rule to get prediction result. Dist 1(N) 2(Y) 3(Y) 4(Y) 5(N) 6(N) 7(N) 8(Y) 9(N) 10(Y) 11(Y==Y) 4 3 2 12(N==N) 1 13(Y==Y) 14(Y==Y) 15(N!=Y) 16(N!=Y) 17(N==N) 18(Y!=N) 19(Y!=N) 20(N==N) There are 4 errors, so the error rate is 4/10 = 0.4

Decision Tree Question: Assignment 1 Q2 Solution: There are 6 yes and 6 no in the training data. So: Info(D) = I(6,6) For all the 4 attributes, calculate the Information gained by branching on it: E.g. : If branching on the attribute “age”, the data will be split into: D(age=old) = {4 yes, 0 no } D(age=young) = {2 yes, 6 no} Infoage(D) = 8/12 * I(2,6) + 4/12 * I(4,0)

Decision Tree Age has the largest gain so we choose Age as the root. For the age=old branch, all decisions are yes, cannot split any more. For the age=young branch, repeat the gain calculation again.

Decision Tree Then we choose married for splitting. The remaining data is: Dmarried=yes = {4 approved=no} Dmarried=no = {2 approved=yes, 2 approved=no} We choose approved=yes for the branch married=no. Here is the final tree:

Decision Tree Age young old Yes Married no yes Yes No Apply the tree on the testing data: error rate = 4/6 = 0.667

Naive Bayesian Question: Assignment 1 Q3 Answer: In the training data, there are 6 approved=yes and 6 approved=no, so P(C1) = P(approved=yes) = 6/12 = 0.5 P(C2) = P(approved=no) = 6/12 = 0.5 For every attribute and class, compute P(X|Ci) P(Sex = “male” | C1) = 4/6 = 0.667 P(Sex = “female” | C1) = 2/6 = 0.333 P(Sex = “male” | C2) = 4/6 = 0.667 P(Sex = “female” | C2) = 2/6 = 0.333

Naive Bayesian P(Age = “old” | C1) = 4/6 = 0.667 P(Age = “young” | C1) = 2/6 = 0.333 P(Age = “old” | C2) = 0/6 = 0 P(Age = “young” | C2) = 6/6 = 1 P(Housing = “yes” | C1) = 1/6 = 0.167 P(Housing = “no” | C1) = 5/6 = 0.833 P(Housing = “yes” | C2) = 4/6 = 0.667 P(Housing = “no” | C2) = 2/6 = 0.333

Naive Bayesian P(Employed = “yes” | C1) = 4/6 = 0.667 P(Employed = “no” | C1) = 2/6 = 0.333 P(Employed = “yes” | C2) = 1/6 = 0.167 P(Employed = “no” | C2) = 5/6 = 0.833 For the first testing data: X1 = (Sex = “female”, Age = “young”, Housing = “yes”, Employed = “yes”) P(X1|C1) = 0.333×0.333×0.167×0.667 = 0.012 P(X1|C2) = 0.333×1×0.667×0.167 = 0.037 P(X1|C1) * P(C1) = 0.012 * 0,5 = 0.006 P(X1|C2) * P(C2) = 0.037 * 0,5 = 0.019 > P(X|C1) * P(C1) So X1 belongs to C2 (Approved=no)

Naive Bayesian For the remaining testing data, repeat the same procedure. So the error rate is 2/3 = 0.667 id sex age housing employed Approved(actual) prediction 13 female young yes no 14 male 15

Bayesian Network Question: Smoking is prohibited on High-Speed trains. If someone smokes, the alarm may sound, and also other passengers may report it to the police. If the police hear the alarm or get the report, he will, very possibly, come and arrest the smoker. This can be modeled in the following Bayes network:

Bayesian Network The alarm is not accurate enough, it ignores some smoking and sometimes sound for nothing. Not every passenger want to report smokers and some passengers make mistakes.(The alarm does not affect passengers.) Smoking S P(A=F) P(A=T) T 0.4 0.6 F 0.8 0.2 S P(R=F) P(R=T) T 0.6 0.4 F 0.9 0.1 Alarm Report A R P(P=T) P(P=F) T 0.8 0.2 F 0.6 0.4 0.01 0.99 Police comes The police comes if he believes there is someone smoking. They don’t trust the alarm very much and they may, rarely, patrol on the train.

Bayesian Network The alarm sounds: Passengers report: Police comes: Suppose the probability of someone smoking is 0.5, what is the probability of the police comes? Answer: P(S=T)=0.5 and P(S=F)=0.5, The alarm sounds: P(A) = P(A|S)*P(S) + P(A|¬S)*P(¬S) = 0.4 Passengers report: P(R) = P(R|S)*P(S) + P(R|¬S)*P(¬S) = 0.25 Police comes: P(P) = P(P|AR) * P(A) * P(R) + P(P|A¬R) * P(A) * P(¬R) + P(P|¬AR) * P(¬A) * P(R) + P(P|¬A¬R) * P(¬A) * P(¬R) = 0.08 + 0.12 + 0.09 + 0.0045 = 0.2945

Naïve Neural Network Question: Given a perceptron, the training samples are given in the table below. In addition, the initial weights are also given: W0=0.5, W1=0.4, W2=0.5. The learning rate α is 0.2. Please use the sample data as training data and update W0, W1, and W2.

Naïve Neural Network Answer: Step 2: Step 1: y = 0 = T1, so no need to change weights. Step 2: a = = -0.5 +0.4*0 + 0.5*1 = 0 y = 1 = T2, so no need to change weights.

Naïve Neural Network Step 3: Thus, y = 0 ≠ T3, ∆w0 = α (t-y) x0 = 0.2 * 1 * (-1) = -0.2 ∆w1 = α (t-y) x1 = 0.2 * 1 * 1 = 0.2 ∆w2 = α (t-y) x2 = 0.2 * 1 * 0 =0 Thus, w0 = w0 + ∆w0 =0.5 – 0.2 =0.3 w1 = w1 + ∆w1 = 0.4 + 0.2 = 0.6 w2 = w2 + ∆w2 = 0.5

Naïve Neural Network Step 4 So, the final weights are: Y = 1 = T4 , no need to change weights. So, the final weights are: w0 =0.3 w1 =0.6 w2 =0.5

Multilayer Neural Network Given the following neural network with initialized weights as in the picture(next page), we are trying to distinguish between nails and screws and an example of training tuples is as follows: T1{0.6, 0.1, nail} T2{0.2, 0.3, screw} Let the learning rate (l) be 0.1. Do the forward propagation of the signals in the network using T1 as input, then perform the back propagation of the error. Show the changes of the weights. Given the new updated weights with T1, use T2 as input, show whether the predication is correct or not.

Multilayer Neural Network

Multilayer Neural Network Answer: First, use T1 as input and then perform the back propagation. At Unit 3: a3 =x1w13 +x2w23+θ3 =0.14 o3 = = 0.535 Similarly, at Unit 4,5,6: a4 = 0.22, o4 = 0.555 a5 = 0.64, o5 = 0.655 a6 = 0.1345, o6 = 0.534

Multilayer Neural Network Now go back, perform the back propagation, starts at Unit 6: Err6 = o6 (1- o6) (t- o6) = 0.534 * (1-0.534)*(1-0.534) = 0.116 ∆w36 = (l) Err6 O3 = 0.1 * 0.116 * 0.535 = 0.0062 w36 = w36 + ∆w36 = -0.394 ∆w46 = (l) Err6 O4 = 0.1 * 0.116 * 0.555 = 0.0064 w46 = w46 + ∆w46 = 0.1064 ∆w56 = (l) Err6 O5 = 0.1 * 0.116 * 0.655 = 0.0076 w56 = w56 + ∆w56 = 0.6076 θ6 = θ6 + (l) Err6 = -0.1 + 0.1 * 0.116 = -0.0884

Multilayer Neural Network Continue back propagation: Error at Unit 3: Err3 = o3 (1- o3) (w36 Err6) = 0.535 * (1-0.535) * (-0.394*0.116) = -0. 0114 w13 = w13 + ∆w13 = w13 + (l) Err3X1 = 0.1 + 0.1*(-0.0114) * 0.6 = 0.09932 w23 = w23 + ∆w23 = w23 + (l) Err3X2 = -0.2 + 0.1*(-0.0114) * 0.1 = -0.2001154 θ3 = θ3 + (l) Err3 = 0.1 + 0.1 * (-0.0114) = 0.09886 Error at Unit 4: Err4 = o4 (1- o4) (w46 Err6) = 0.555 * (1-0.555) * (-0.1064*0.116) = 0.003 w14 = w14 + ∆w14 = w14 + (l) Err4X1 = 0 + 0.1*(-0.003) * 0.6 = 0.00018 w24 = w24 + ∆w24 = w24 + (l) Err4X2 = 0.2 + 0.1*(-0.003) * 0.1 = 0.20003 θ4 = θ4 + (l) Err4 = 0.2 + 0.1 * (0.003) = 0.2003 Error at Unit 5: Err5 = o5 (1- o5) (w56 Err6) = 0.655 * (1-0.655) * (-0. 6076*0.116) = 0.016 w15 = w15 + ∆w15 = w15 + (l) Err5X1 = 0.3 + 0.1* 0.016 * 0.6 = 0.30096 w25 = w25 + ∆w25 = w25 + (l) Err5X2 = -0.4 + 0.1*0.016 * 0.1 = -0.39984 θ5= θ5 + (l) Err5 = 0.5 + 0.1 * 0.016 = 0.5016

Multilayer Neural Network After T1, the updated values are as follows: Now, with the updated values, use T2 as input: At Unit 3: a3 = x1w13 + x2w23 + θ3 = 0.0586898 o3 = = 0.515

Multilayer Neural Network Similarly, a4 = 0.260345, o4 = 0.565 a5 = 0.441852, o5 = 0.6087 At Unit 6: a6 = x3w36 + x4w46 + x5w56 + θ6 = 0.13865 o6 = = 0.5348 Since O6 is closer to 1, so the prediction should be nail, different from given “screw”. So this predication is NOT correct.

SVM Consider the following data points. Please use SVM to train a classifier, and then classify these data points. Points with ai=1 means this point is support vector. For example, point 1 (1,2) is the support vector, but point 5 (5,9) is not the support vector. Training data: Testing data:

SVM Question: (a) Find the decision boundary, show detail calculation process. (b) Use the decision boundary you found to classify the Testing data. Show all calculation process in detail, including the intermediate result and the formula you used.

SVM Answer: a) As the picture shows, P1, P2, P3 are support vectors.

SVM Suppose w is (w1,w2). Since both P1(1,2) and P3(0,1) have y = 1, while P2(2,1) has y =-1: w1*1+w2*2+b = 1 w1*0+w2*1+b = 1 w1*2+w2*1+b =-1 w1= -1, w2 = 1, b = 0 then, the decision boundary is: w1 * x1+w2 * x2 + b =0 -x1+x2 = 0 Showed in the picture next page.

SVM

SVM b) Use the decision boundary to classify the testing data: For the point P9 (2,5) -x1+x2 = -2+5 = 3 >= 1 So we choose y = 1 For the point P10 (7,2) -x1+x2 = -7+2 = -5 <= -1 So we choose y = -1 Showed in the picture next page.

SVM

Q&A