SEG4630 2009-2010 Tutorial 1 – Classification Decision tree, Naïve Bayes & k-NN CHANG Lijun.

Slides:



Advertisements
Similar presentations
Data Mining Lecture 9.
Advertisements

Decision Trees Decision tree representation ID3 learning algorithm
Classification Algorithms
Decision Tree Approach in Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Classification Techniques: Decision Tree Learning
Lazy vs. Eager Learning Lazy vs. eager learning
What we will cover here What is a classifier
Navneet Goyal. Instance Based Learning  Rote Classifier  K- nearest neighbors (K-NN)  Case Based Resoning (CBR)
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Classification Continued
Lecture 5 (Classification with Decision Trees)
Decision Trees an Introduction.
CES 514 – Data Mining Lec 9 April 14 Mid-term k nearest neighbor.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
Classification Based in part on Chapter 10 of Hand, Manilla, & Smyth and Chapter 7 of Han and Kamber David Madigan.
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Machine Learning Reading: Chapter Text Classification  Is text i a finance new article? PositiveNegative.
Classification.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
SEEM Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN
Chapter 4: Algorithms CS 795.
Naïve Bayes Classifier Ke Chen Extended by Longin Jan Latecki COMP20411 Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Naïve Bayes Classifier Ke Chen Modified and extended by Longin Jan Latecki
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
Classification. 2 Classification: Definition  Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes.
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
CS690L Data Mining: Classification
Bayesian Classification
Classification and Prediction
Elsayed Hemayed Data Mining Course
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
COMP24111 Machine Learning Naïve Bayes Classifier Ke Chen.
Decision Trees.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Chapter 4: Algorithms CS 795. Inferring Rudimentary Rules 1R – Single rule – one level decision tree –Pick each attribute and form a single level tree.
SEEM Tutorial 1 Classification: Decision tree Siyuan Zhang,
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Machine Learning Inductive Learning and Decision Trees
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Classification Algorithms
Artificial Intelligence
Naïve Bayes Classifier
Classification Nearest Neighbor
Data Science Algorithms: The Basic Methods
Naïve Bayes Classifier
Decision Tree Saed Sayad 9/21/2018.
Classification and Prediction
Machine Learning: Lecture 3
Naïve Bayes Classifier
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier
A task of induction to find patterns
Data Mining CSCI 307, Spring 2019 Lecture 15
Classification 1.
A task of induction to find patterns
Presentation transcript:

SEG Tutorial 1 – Classification Decision tree, Naïve Bayes & k-NN CHANG Lijun

2 Classification: Definition  Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.  Find a model for class attribute as a function of the values of other attributes. Decision tree, Naïve bayes & k-NN  Goal: previously unseen records should be assigned a class as accurately as possible.

3 Decision Tree  Goal Construct a tree so that instances belonging to different classes should be separated  Basic algorithm (a greedy algorithm) Tree is constructed in a top-down recursive manner At start, all the training examples are at the root Test attributes are selected on the basis of a heuristics or statistical measure (e.g., information gain) Examples are partitioned recursively based on selected attributes

4 Attribute Selection Measure 1: Information Gain  Let p i be the probability that a tuple belongs to class C i, estimated by |C i,D |/|D|  Expected information (entropy) needed to classify a tuple in D:  Information needed (after using A to split D into v partitions) to classify D:  Information gained by branching on attribute A

5 Attribute Selection Measure 2: Gain Ratio  Information gain measure is biased towards attributes with a large number of values  C4.5 (a successor of ID3) uses gain ratio to overcome the problem (normalization to information gain) GainRatio(A) = Gain(A)/SplitInfo(A)

6 Attribute Selection Measure 3: Gini index  If a data set D contains examples from n classes, gini index, gini(D) is defined as where p j is the relative frequency of class j in D  If a data set D is split on A into two subsets D 1 and D 2, the gini index gini(D) is defined as  Reduction in Impurity:

7 Example OutlookTemperatureHumidityWindPlay Tennis Sunny>25HighWeakNo Sunny>25HighStrongNo Overcast>25HighWeakYes Rain15-25HighWeakYes Rain<15NormalWeakYes Rain<15NormalStrongNo Overcast<15NormalStrongYes Sunny15-25HighWeakNo Sunny<15NormalWeakYes Rain15-25NormalWeakYes Sunny15-25NormalStrongYes Overcast15-25HighStrongYes Overcast>25NormalWeakYes Rain15-25HighStrongNo

8 Tree induction example S[9+, 5-] Outlook Sunny [2+,3-] Overcast [4+,0-] Rain [3+,2-] S[9+, 5-] Temperature <15 [3+,1-] [5+,1-] >25 [2+,2-] Info(S) = -9/14(log 2 (9/14))-5/14(log 2 (5/14)) = 0.94 Gain(Outlook) = 0.94 – 5/14[-2/5(log 2 (2/5))-3/5(log 2 (3/5))] – 4/14[-4/4(log 2 (4/4))-0/4(log 2 (0/4))] – 5/14[-3/5(log 2 (3/5))-2/5(log 2 (2/5))] = 0.94 – 0.69 = 0.25 Gain(Temperature) = 0.94 – 4/14[-3/4(log 2 (3/4))-1/4(log 2 (1/4))] – 6/14[-5/6(log 2 (5/6))-1/6(log 2 (1/6))] – 4/14[-2/4(log 2 (2/4))-2/4(log 2 (2/4))] = 0.94 – 0.80 = 0.14

9 S[9+, 5-] Humidity High [3+,4-] Normal [6+, 1-] S[9+, 5-] Wind Weak [6+, 2-] Strong [3+, 3-] Gain(Humidity) = 0.94 – 7/14[-3/7(log 2 (3/7))-4/7(log 2 (4/7))] – 7/14[-6/7(log 2 (6/7))-1/7(log 2 (1/7))] = 0.94 – 0.79 = 0.15 Gain(Wind) = 0.94 – 8/14[-6/8(log 2 (6/8))-2/8(log 2 (2/8))] – 6/14[-3/6(log 2 (3/6))-3/6(log 2 (3/6))] = 0.94 – 0.89 = 0.05

10 Outlook OvercastSunnyRain Yes ?? Gain(Outlook) = 0.25 Gain(Temperature)=0.14 Gain(Humidity) = 0.15 Gain(Wind) = 0.05 NoWeakHigh>25Sunny NoStrongHigh>25Sunny YesWeakHigh>25Overcast YesWeakHigh15-25Rain YesWeakNormal<15Rain NoStrongNormal<15Rain YesStrongNormal<15Overcast NoWeakHigh15-25Sunny YesWeakNormal<15Sunny YesWeakNormal15-25Rain YesStrongNormal15-25Sunny YesStrongHigh15-25Overcast YesWeakNormal>25Overcast NoStrongHigh15-25Rain Play Tennis WindHumidi ty Tempe rature Outlook

11 Info(Sunny) = -2/5(log 2 (2/5)) -3/5(log 2 (3/5)) = 0.97 Sunny[2+,3-] Temperature <15 [1+,0-] [1+,1-] >25 [0+,2-] Gain(Temperature) = 0.97 – 1/5[-1/1(log 2 (1/1))-0/1(log 2 (0/1))] – 2/5[-1/2(log 2 (1/2))-1/2(log 2 (1/2))] – 2/5[-0/2(log 2 (0/2))-2/2(log 2 (2/2))] = 0.97 – 0.4 = 0.37 Sunny[2+, 3-] Wind Weak [1+, 2-] Strong [1+, 1-] Gain(Humidity) = 0.97 – 3/5[-0/3(log 2 (0/3))-3/3(log 2 (3/3))] – 2/5[-2/2(log 2 (2/2))-0/2(log 2 (0/2))] = 0.97 – 0 = 0.97 Gain(Wind) = 0.97 – 3/5[-1/3(log 2 (1/3))-2/3(log 2 (2/3))] – 3/5[-1/2(log 2 (1/2))-1/2(log 2 (1/2))] = 0.97 – 0.96 = 0.02 Sunny[2+,3-] Humidity High [0+,3-] Normal [2+, 0-]

12 Outlook OvercastSunnyRain Yes Humidity ?? Yes No NormalHigh

13 Info(Rain) = -3/5(log2(3/5)) -2/5(log 2 (2/5)) = 0.97 Rain[3+,2-] Temperature <15 [1+,1-] [2+,1-] >25 [0+,0-] Gain(Outlook) = 0.97 – 2/5[-1/2(log 2 (1/2))-1/2(log 2 (1/2))] – 3/5[-2/3(log 2 (2/3))-1/3(log 2 (1/3))] – 0/5[-0/0(log 2 (0/0))-0/0(log 2 (0/0))] = 0.97 – 0.75 = 0.22 Rain[3+,2-] Wind Weak [3+, 0-] Strong [0+, 2-] Gain(Humidity) = 0.97 – 2/5[-1/2(log 2 (1/2))-1/2(log 2 (1/2))] – 3/5[-2/3(log 2 (2/3))-1/3(log 2 (1/3))] = 0.97 – 0.43 = 0.54 Gain(Wind) = 0.97 – 3/5[-3/3(log 2 (3/3))-0/3(log 2 (0/3))] – 2/5[-0/2(log 2 (0/2))-2/2(log 2 (2/2))] = 0.97 – 0 = 0.97 Rain[3+,2-] Humidity High [1+,1-] Normal [2+, 1-]

14 Outlook OvercastSunnyRain Yes Humidity Wind Yes No NormalHigh No Yes StrongWeak

15 Bayesian Classification  A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities where x i is the value of attribute A i Choose the class label that has the highest probability  Foundation: Based on Bayes’ Theorem. posteriori probability prior probability likelihood ? Model: compute from data

16 Naïve Bayes Classifier  Problem: joint probabilities are difficult to estimate  Naïve Beyes Classifier Assumption: attributes are conditionally independent

17 Naïve Bayes Classifier ABC mbt mst gqt hst gqt gqf gsf hbf hqf mbf P(C=t) = 1/2 P(C=f) = 1/2 P(A=m|C=t) = 2/5 P(A=m|C=f) = 1/5 P(B=q|C=t) = 2/5 P(B=q|C=f) = 2/5 Test Record: A=m, B=q, C=? SEG4630 Tutorial 6 Made by Wenting

18 Naïve Bayes Classifier  For C = t P(A=m|C=t) * P(B=q|C=t) * P(C=t) = 2/5 * 2/5 * 1/2 = 2/25 P(C=t|A=m, B=q) = (2/25) / P(A=m, B=q)  For C = f P(A=m|C=f) * P(B=q|C=f) * P(C=f) = 1/5 * 2/5 * 1/2 = 1/25 P(C=t|A=m, B=q) = (1/25) / P(A=m, B=q)  Conclusion: A=m, B=q, C=t Higher! SEG4630 Tutorial 6 Made by Wenting

19 Nearest Neighbor Classification  Input  A set of stored records  k: # of nearest neighbors  Output  Compute distance:  Identify k nearest neighbors  Determine the class label of unknown record based on class labels of nearest neighbors (i.e. by taking majority vote)

20 Nearest Neighbor Classification  Input Given 8 training instances P1 (4, 2)  Orange P2 (0.5, 2.5)  Orange P3 (2.5, 2.5)  Orange P4 (3, 3.5)  Orange P5 (5.5, 3.5)  Orange P6 (2, 4)  Black P7 (4, 5)  Black P8 (2.5, 5.5)  Black k = 1 & k = 3  new instance: Pn (4, 4)  ???  Calculate the distances: d(P1, Pn) = d(P2, Pn) = 3.80 d(P3, Pn) = 2.12 d(P4, Pn) = 1.12 d(P5, Pn) = 1.58 d(P6, Pn) = 2 d(P7, Pn) = 1 d(P8, Pn) = 2.12 A Discrete Example

21 k = 1 P1 P2P3 P4 P5 P6 P7 P8 Pn P1 P2 P3 P4 P5 P6 P7 P8 Pn k = 3 Nearest Neighbor Classification

22 Nearest Neighbor Classification …  Scaling issues Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes  Each attribute must follow in the same range  Min-Max normalization Example:  Two data records: a = (1, 1000), b = (0.5, 1)  dis(a, b) = ?

23 Lazy & Eager Learning  Two Types of Learning Methodologies Lazy Learning  Instance-based learning. (k-NN) Eager Learning  Decision-tree and Bayesian classification.  ANN & SVM P1 P2 P3 P4 P5 P6 P7 P8 Pn P1 P2 P3 P4 P5 P6 P7 P8 Pn

24 Lazy & Eager Learning  Key Differences Lazy Learning  Do not require model building  Less time training but more time predicting  Lazy method effectively uses a richer hypothesis space since it uses many local linear functions to form its implicit global approximation to the target function Eager Learning  Require model building  More time training but less time predicting  must commit to a single hypothesis that covers the entire instance space