Introduction to machine learning KH Wong

Slides:

Advertisements

Similar presentations

1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)

Advertisements

CHAPTER 9: Decision Trees

Classification Techniques: Decision Tree Learning

Chapter 7 – Classification and Regression Trees

Chapter 7 – Classification and Regression Trees

Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Decision Tree Rong Jin. Determine Milage Per Gallon.

Induction of Decision Trees

1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.

Decision Trees an Introduction.

Decision Trees Chapter 18 From Data to Knowledge.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.

Ensemble Learning (2), Tree and Forest

Decision Tree Learning

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Fall 2004 TDIDT Learning CS478 - Machine Learning.

Machine Learning Chapter 3. Decision Tree Learning

Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.

Mohammad Ali Keyvanrad

Chapter 9 – Classification and Regression Trees

Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.

Scaling up Decision Trees. Decision tree learning.

For Wednesday No reading Homework: –Chapter 18, exercise 6.

For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.

MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.

1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.

Decision Tree Learning

Classification and Regression Trees

Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.

Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

Chapter 6 Decision Tree.

Machine Learning Inductive Learning and Decision Trees

DECISION TREES An internal node represents a test on an attribute.

Decision Trees an introduction.

Chapter 18 From Data to Knowledge

Decision Tree Learning

Decision trees (concept learnig)

Machine Learning Lecture 2: Decision Tree Learning.

LECTURE 20: DECISION TREES

Decision trees (concept learnig)

Classification Algorithms

Decision Tree Learning

Artificial Intelligence

Ch9: Decision Trees 9.1 Introduction A decision tree:

Chapter 6 Classification and Prediction

Data Science Algorithms: The Basic Methods

Issues in Decision-Tree Learning Avoiding overfitting through pruning

Decision Tree Saed Sayad 9/21/2018.

Introduction to Data Mining, 2nd Edition by

Classification and Prediction

Advanced Artificial Intelligence

Introduction to Data Mining, 2nd Edition by

Machine Learning Chapter 3. Decision Tree Learning

Machine Learning: Lecture 3

Decision Trees Decision tree representation ID3 learning algorithm

Lecture 05: Decision Trees

Machine Learning Chapter 3. Decision Tree Learning

Decision Trees By Cole Daily CSCI 446.

Decision trees.

Statistical Learning Dong Liu Dept. EEIS, USTC.

INTRODUCTION TO Machine Learning

LECTURE 18: DECISION TREES

INTRODUCTION TO Machine Learning 2nd Edition

Data Mining CSCI 307, Spring 2019 Lecture 6

Classification and Regression Trees or CART

Presentation transcript:

Introduction to machine learning KH Wong Chapter 15: Classification using Classification and Regression Trees or CART Introduction to machine learning KH Wong Machine learning: Classification v9b

Machine learning: Classification v9b We will learn : the Classification and Regression Tree ( CART) ( or Decision Tree) A binary tree, i.e. each node has 2 leaves It can perform classification or regression function Easy to build and widely used Pruning can be applied to solve over-fitting problem https://machinelearningmastery.com/classification-and-regression-trees-for-machine-learning/ Machine learning: Classification v9b

To build the tree you need training data You should have enough data for training. It is a supervised learning algorithm Divide the whole training data (100%) into: Training set (30%): for training your classifier Validation set (10%): for tuning the parameters Test set (20%): for test the performance of your classifier Machine learning: Classification v9b

CART can preform classification or regression functions So when to use classification or regression Classification trees : Outputs are class symbols not real numbers. E.g. high, medium, low etc. Regression trees : Outputs are target variables (real numbers): E.g. 1.234, 5.678 etc. See http://www.simafore.com/blog/bid/62482/2-main-differences-between-classification-and-regression-trees Machine learning: Classification v9b

Classification tree approaches Famous trees are ID3, C4.5 and CART? What are the differences ? We only learn CART here. https://www.quora.com/What-are-the-differences-between-ID3-C4-5-and-CART Machine learning: Classification v9b

A tree showing nodes, branches, leaves , attributes and target classes Root node If attribute X=Raining Branch: No Branch: yes Leaf node1 If attribute X=sunny Leaf node3 If attribute Z=driving Branch: No Branch: Yes No Yes Leaf node2 If Y=stay outdoor Target Class= umbrella Target Class= No umbrella Target Class= No umbrella Branch: Yes Branch: No Target Class= umbrella Target Class= No umbrella Machine learning: Classification v9b https://www-users.cs.umn.edu/~kumar001/dmbook/ch4.pdf

CART Model Representation Root node Attribute (variables) 2 inner nodes: 1 root node ,1 leaf node CART is a binary tree. Each root node represents a single input variable (x) and a split point on that variable (assuming the variable is numeric). The leaf nodes of the tree contain an output variable (y) which is used to make a prediction. Given a dataset with two inputs (x) of height in centimeters and weight in kilograms the output of sex as male or female, below is a crude example of a binary decision tree (completely fictitious for demonstration purposes only). Leaf Node (class varaibleor prediction) https://machinelearningmastery.com/classification-and-regression-trees-for-machine-learning/ Machine learning: Classification v9b

A simple example of a decision tree Use height and weight to guess the sex of a person. code 1 2 3 4 If Height > 180 cm Then Male If Height <= 180 cm AND Weight > 80 kg Then Male If Height <= 180 cm AND Weight <= 80 kg Then Female Make Predictions With CART Models The decision tree split this up into rectangles (when p=2 input variables) or some kind of hyper-rectangles with more inputs. Testing to see if a person a male or not Height > 180 cm: No Weight > 80 kg: No Therefore: Female Machine learning: Classification v9b

Machine learning: Classification v9b Exercise 1 Why it is a binary tree? Answer: ____________________ How many nodes and leaves? Answer: ________________ Male or Female if 183cm , 77 Kg? ANS:______ 173 cm , 79 Kg? ANS: _____ 177 cm , 85 Kg? ANS: ______ Machine learning: Classification v9b

Machine learning: Classification v9b Exercise 1, ANSWER Why it is a binary tree? Answer: at each node it has 2 leaves How many nodes and leaves? Answer: Nodes:2, leaves 4. Male or Female if 183cm , 77 Kg? ANS: Male 173 cm , 79 Kg? ANS: Female 177 cm , 85 Kg? ANS: Female Machine learning: Classification v9b

Machine learning: Classification v9b How to create a CART Greedy Splitting : Grow the tree Stopping Criterion: when to stop growing Pruning The Tree: remove unnecessary leaves to make it more efficient and solve over fitting problems. Machine learning: Classification v9b

Machine learning: Classification v9b 1) Greedy Splitting During growing the tree, you need to grow the leaves from a node by splitting. You need a metric to evaluate your split is good or not, e.g. can use one of the followings: Gini (impurity) index Information gain Entropy Variance reduction Machine learning: Classification v9b

Gini impurity index Calculation http://people.revoledu.com/kardi/tutorial/DecisionTree/how-to-measure-impurity.htm Machine learning: Classification v9b

1) Split metric : Entropy Prob(bus) =4/10=0.4 Prob(car)=3/10=0.3 Prob(train)=3/10=0.3 Entropy=-0.4*log_2(0.4)- 0.3*log_2(0.3)- 0.3*log_2(0.3)=1.571(note:log_2 is log base 2.) Another example: if P(bus)=1, P(car)=0, P(train)=0 Entropy = 1*log_2(1)-0*log_2(0.00001)- 0*log_2(0.000001)=0 Entropy = 0, it is very pure, Impurity is 0 Machine learning: Classification v9b

Exercise 2 2) Split metric: Gini (impurity) index Prob(bus) =4/10=0.4 Prob(car)=3/10=0.3 Prob(train)=3/10=0.3 Gini index =1-(0.4*0.4+0.3*0.3+0.3*0.3)= 0.66 Another example if the class has only bus: if P(bus)=1, P(car)=0, P(train)=0 Gini Impurity index= 1-1*1-0*0-0*0=0 Impurity is 0 Machine learning: Classification v9b

Answer2 2) Split metric: Gini (impurity) index Prob(bus) =4/10=0.4 Prob(car)=3/10=0.3 Prob(train)=3/10=0.3 Gini index =1-(0.4*0.4+0.3*0.3+0.3*0.3)= 0.66 Another example if the class has only bus: if P(bus)=1, P(car)=0, P(train)=0 Gini Impurity index= 1-1*1-0*0-0*0=0 Impurity is 0 Exercise Machine learning: Classification v9b

Machine learning: Classification v9b Exercise 3. Train If the first 2 rows are not bus but train, find entropy and Gini index Prob(bus) =2/10=0.2 Prob(car)=3/10=0.3 Prob(train)=5/10=0.5 Entropy =_______________________________ Gini index =_____________________________ Train Machine learning: Classification v9b

Machine learning: Classification v9b ANSWER 3. Train If the first 2 rows are not bus but train, find entropy and Gini index Prob(bus) =2/10=0.2 Prob(car)=3/10=0.3 Prob(train)=5/10=0.5 Entropy =-0.2*log_2(0.2)- 0.3*log_2(0.3)- 0.5log_2(0.5)= 1.485 Gini index =1-(0.2*0.2+0.3*0.3+0.5*0.5)= 0.62 Train Machine learning: Classification v9b

3) Split metric : Classification error Classification error=1-max(0.4,0.3,0.3) =1-0.4=0.6 Another example: if P(bus)=1, P(car)=0, P(train)=0 Classification error=1-max(1,0,0)=0 Impurity is 0, if there is only bus Machine learning: Classification v9b

4) Split metrics : Variance reduction Introduced in CART,[3] variance reduction is often employed in cases where the target variable is continuous (regression tree), meaning that use of many other metrics would first require discretization before being applied. The variance reduction of a node N is defined as the total reduction of the variance of the target variable x due to the split at this node: https://en.wikipedia.org/wiki/Decision_tree_learning Machine learning: Classification v9b

Splitting procedure: Recursive Partitioning for CART Take all of your training data. Consider all possible values of all variables. Select the variable/value (X=t1) (e.g. X1=Height) that produces the greatest “separation” (or maximum homogeneity - - less impurity within each of the new part) in the target. (X=t1) is called a “split”. If X< t1 (e.g. Height <180cm) then send the data to the “left”; otherwise, send data point to the “right”. Now repeat same process on these two “nodes” You get a “tree” Note: CART only uses binary splits. https://www.casact.org/education/specsem/f2005/handouts/cart.ppt Machine learning: Classification v9b

Machine learning: Classification v9b An example Weather Driving Class=Umbrella ------- -------- ---------- 1 Sunny 1 Yes 1 Yes 2 Cloudy 2 No 2 No 3 Rainy ------- ----- ------- 1 1 2 1 2 2 2 1 2 3 1 2 2 2 1 3 2 1 2 2 2 Machine learning: Classification v9b

Machine learning: Classification v9b How to build the tree Weather: Sunny ? Case1 First question: 4 possible cases If attribute “Weather” is the root node: Sunny or not Cloudy or not Rainy or not If attribute “Driving” is the root node: So which case is the best? yes No Weather: Cloudy ? Case2 yes No Weather: Rainy ? Case3 Driving? Case4 yes No yes No Machine learning: Classification v9b

Parent entropy: using weather as the root node : including case1,2,3 Sunny ? Case1 yes No Num of umbrella Yes=6 Num of umbrella No=3 Parent_entropy = -(6/9)*log2(6/9)-(3/9)*log2(3/9) =4.3399 Machine learning: Classification v9b

Machine learning: Classification v9b For case1 Weather: Sunny ? Case1 yes No If weather:sunny (y or n)=root node N=9=Number of samples=9 M=2=Number of sunny cases N1y=0=Num of Umbrella yes N1n=2=Num of Umbrella No N=9,Ny=1,N1n=2,W1=2/9 Gini=W1*((N1y/N)^2+(N1y/N)^2)=0.0247= (2/9)*(0/9)+(2/9)=0.0055 Entropy =-2*log2(N1y/N)-2*log2(N1n/N) =-inf Machine learning: Classification v9b

Machine learning: Classification v9b For case2 Weather: Cloudy? Case2 yes No If weather:cloudy (y or n)=root node N=9=Number of samples=9 M=4=Number of cloudy cases W2=Weight for cloudy=4/9 Ny=2=Num of Umbrella yes Nn=2=Num of Umbrella No N=9,Ny=2,Nn=2 Gini=(Ny/N)^2+(Ny/N)^2=0.0988 Entropy =-2*log2(Ny/N)-2*log2(Nn/N) = 8.6797 Machine learning: Classification v9b

Machine learning: Classification v9b Exercise 4 For case3 Weather: Rainy ? Case3 yes No If weather:Rainy (y or n)=root node N=9=Number of samples=9 M=2=Number of sunny cases W3=Weight for rainy=2/9 Ny=0=Num of Umbrella yes Nn=2=Num of Umbrella No N=9,Ny=1,Nn=2 Gini=? Machine learning: Classification v9b

Machine learning: Classification v9b Exercise 4 For case2 Weather: Cloudy? Case2 yes No If weather:cloudy (y or n)=root node N=9=Number of samples=9 M=4=Number of cloudy cases W2=Weight for cloudy=4/9 Ny=2=Num of Umbrella yes Nn=2=Num of Umbrella No N=9,Ny=2,Nn=2 Answer: Gini=(Ny/N)^2+(Ny/N)^2=0.0988 Entropy =-2*log2(Ny/N)-2*log2(Nn/N) = 8.6797 Machine learning: Classification v9b

Separation measurement Information gain=Root entropy-leaf entropy Machine learning: Classification v9b

Machine learning: Classification v9b For case4 Driving? Case4 yes No If Driving (y or n)=root node N=9=Number of samples=9 M=9=Number of driving cases Ny=0=Num of Umbrella yes Nn=2=Num of Umbrella No N=9,Ny=3,Nn=6 Gini=(Ny/N)^2+(Ny/N)^2=0.2222 Entropy =-2*log2(Ny/N)-2*log2(Nn/N) =4.3399 Machine learning: Classification v9b

Machine learning: Classification v9b Example Temperature Humidity Weather Drive/walk Class=Umbrella ----------- -------- ------- ---------- ---------- 1 Low 1 Low 1 Sunny 1 Drive 1 Yes 2 Medium 2 Medium 2 Cloudy 2 Walk 2 No 3 High 3 High 3 Rain 1 1 1 1 2 1 2 1 2 1 2 2 1 1 2 2 1 1 2 1 1 1 2 1 2 2 2 2 1 2 2 2 3 2 2 3 3 3 2 1 3 3 3 1 2 http://dni-institute.in/blogs/cart-algorithm-for-decision-tree/ http://people.revoledu.com/kardi/tutorial/DecisionTree/how-decision-tree-algorithm-work.htm Machine learning: Classification v9b

Exercise 4: which one (attribute) do we need to pick first?? https://medium.com/deep-math-machine-learning-ai/chapter-4-decision-trees-algorithms-b93975f7a1f1 Machine learning: Classification v9b

Answer 4: which one (attribute) do we need to pick first?? https://medium.com/deep-math-machine-learning-ai/chapter-4-decision-trees-algorithms-b93975f7a1f1 Answer: determine the attribute that best classifies the training data; use this attribute at the root of the tree. Repeat this process at for each branch. Machine learning: Classification v9b

Machine learning: Classification v9b Overfitting Problem and solution Machine learning: Classification v9b

Overfitting problem and solution Problem: Your trained model only works for training data but will fail when handling new or unseen data Solution: use validation set to prune (remove some leaves) your tree to avoid overfitting. References: https://www.investopedia.com/terms/o/overfitting.asp https://www.investopedia.com/terms/o/overfitting.asp#ixzz5OJ5hm9Hb Machine learning: Classification v9b

Machine learning: Classification v9b Pruning methods Idea: Remove leaves that contribute little. Pruning method: Cost-Complexity Pruning The original Tree is T, it has a subtree T2, we prune T2 and the pruned tree Tree T subtree T2 pruned tree https://en.wikipedia.org/wiki/Pruning_(decision_trees) http://mlwiki.org/index.php/Cost-Complexity_Pruning Machine learning: Classification v9b

Machine learning: Classification v9b MATLAB DEMO https://www.mathworks.com/help/stats/examples/classification.html Machine learning: Classification v9b

Machine learning: Classification v9b Defining terms For the whole dataset : use about 70 % for training data; 30 % for testing (pruning and Cross-Validation use) http://mlwiki.org/index.php/Cross-Validation Choose examples for training/testing sets randomly Training data is used to construct the decision tree (will be pruned) Testing data is used for pruning f=Error on training data N= number if instances covered by the leaves Z= Z score of a normal distribution https://en.wikipedia.org/wiki/Standard_normal_table e=Error on testing data (calculated from f,N,z) Machine learning: Classification v9b

Post-pruning using Error estimation http://www.saedsayad.com/decision_tree_overfitting.htm Post-pruning using Error estimation In the following example we set Z to 0.69 (see normal distribution curve) which is equal to a confidence level of 75%. see https://www.rapidtables.com/math/probability/normal_distribution.html Machine learning: Classification v9b

Post-pruning using cost-complexity http://mlwiki.org/index.php/Cost-Complexity_Pruning Machine learning: Classification v9b

Use test set to find best prune result Selected Because RRC(T) is the smallest RC=cross validation Train set R Test set R https://onlinecourses.science.psu.edu/stat857/node/58/ Machine learning: Classification v9b

Machine learning: Classification v9b http://people.revoledu.com/kardi/tutorial/DecisionTree/how-decision-tree-algorithm-work.htm https://onlinecourses.science.psu.edu/stat857/node/60/ Machine learning: Classification v9b

Machine learning: Classification v9b Appendix Machine learning: Classification v9b

Machine learning: Classification v9b Example using sklearn https://github.com/alameenkhader/spam_classifier Using sklearn from sklearn import tree # You may hard code your data as given or to use a .csv file import csv then fetch your data from .csv file # Assume we have two dimensional feature space with two classes we like distinguish dataTable = [[2,9],[4,10],[5,7],[8,3],[9,1]] dataLabels = ["Class A","Class A","Class B","Class B","Class B"] # Declare our classifier trained_classifier = tree.DecisionTreeClassifier() # Train our classifier with data we have trained_classifier = trained_classifier.fit(dataTable,dataLabels) # We are done with training, so it is time to test it! someDataOutOfTrainingSet = [[10,2]] label = trained_classifier.predict(someDataOutOfTrainingSet) # Show the prediction of trained classifier for data [11,2] print(label[0]) Machine learning: Classification v9b

Iris test using sklearn, this will generate dt.dot file import numpy as np from sklearn import datasets from sklearn import tree # Load iris iris = datasets.load_iris() X = iris.data y = iris.target # Build decision tree classifier dt = tree.DecisionTreeClassifier(criterion='entropy') dt.fit(X, y) dotfile = open("dt.dot", 'w') tree.export_graphviz(dt, out_file=dotfile, feature_names=iris.feature_names) dotfile.close() Machine learning: Classification v9b

Machine learning: Classification v9b print(__doc__) import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import load_iris from sklearn.tree import DecisionTreeClassifier # Parameters n_classes = 3 plot_colors = "ryb" plot_step = 0.02 # Load data iris = load_iris() for pairidx, pair in enumerate([[0, 1], [0, 2], [0, 3], [1, 2], [1, 3], [2, 3]]): # We only take the two corresponding features X = iris.data[:, pair] y = iris.target # Train clf = DecisionTreeClassifier().fit(X, y) # Plot the decision boundary plt.subplot(2, 3, pairidx + 1) x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 xx, yy = np.meshgrid(np.arange(x_min, x_max, plot_step), np.arange(y_min, y_max, plot_step)) plt.tight_layout(h_pad=0.5, w_pad=0.5, pad=2.5) Z = clf.predict(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) cs = plt.contourf(xx, yy, Z, cmap=plt.cm.RdYlBu) plt.xlabel(iris.feature_names[pair[0]]) plt.ylabel(iris.feature_names[pair[1]]) # Plot the training points for i, color in zip(range(n_classes), plot_colors): idx = np.where(y == i) plt.scatter(X[idx, 0], X[idx, 1], c=color, label=iris.target_names[i], cmap=plt.cm.RdYlBu, edgecolor='black', s=15) plt.suptitle("Decision surface of a decision tree using paired features") plt.legend(loc='lower right', borderpad=0, handletextpad=0) plt.axis("tight") plt.show() Iris dataset http://scikit-learn.org/stable/auto_examples/tree/plot_iris.html#sphx-glr-auto-examples-tree-plot-iris-py Machine learning: Classification v9b

A working implementation in pure python https://machinelearningmastery.com/implement-decision-tree-algorithm-scratch-python/ Machine learning: Classification v9b

Machine learning: Classification v9b code function tt4 clear parent_en=entropy_cal([9,5]) %humidy------------------- en1=entropy_cal([3,4]) en2=entropy_cal([6,1]) Information_gain(1)=parent_en-(7/14)*en1-(7/14)*en2 clear en1 en2 %outlook------------------ en1=entropy_cal([3,2]) en2=entropy_cal([4,0]) en3=entropy_cal([2,3]) Information_gain(2)=parent_en-(5/14)*en1-(4/14)*en2-(5/14)*en3 clear en1 en2 en3 %wind ------------------------- en1=entropy_cal([6,2]) en2=entropy_cal([3,3]) Information_gain(3)=parent_en-(8/14)*en1-(6/14)*en2 %temperature ------------------------- en1=entropy_cal([2,2]) %hot 2 yes , 2 no en2=entropy_cal([3,1]) %mild 3 yes, 1 no en3=entropy_cal([4,2]) %cool 4 yes, 2 no Information_gain(4)=parent_en-(4/14)*en1-(4/14)*en2-(6/14)*en3 Information_gain %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [en]=entropy_cal(e) n=length(e); base=sum(e); %% probabilty of the elements in the input for i=1:n p(i)=e(i)/base; end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% temp=0; if p(i)==0 %to avoid the problem of -inf else temp= p(i)*log2(p(i))+temp; en=-temp; Machine learning: Classification v9b