COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

Slides:



Advertisements
Similar presentations
IT 433 Data Warehousing and Data Mining
Advertisements

Decision Tree Approach in Data Mining
Classification with Multiple Decision Trees
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
Decision Tree.
Data Mining Classification: Alternative Techniques
Ch5 Stochastic Methods Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011.
Data Mining Classification: Naïve Bayes Classifier
Lecture Notes for Chapter 4 Introduction to Data Mining
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska Hiroo Kusaba.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
Classification and Regression. Classification and regression  What is classification? What is regression?  Issues regarding classification and regression.
Classification Continued
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Lecture 5 (Classification with Decision Trees)
Data-intensive Computing Algorithms: Classification Ref: Algorithms for the Intelligent Web 6/26/20151.
1er. Escuela Red ProTIC - Tandil, de Abril, Bayesian Learning 5.1 Introduction –Bayesian learning algorithms calculate explicit probabilities.
Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)
Data Mining: Classification
Bayesian Networks. Male brain wiring Female brain wiring.
COMP53311 Clustering Prepared by Raymond Wong Some parts of this notes are borrowed from LW Chan ’ s notes Presented by Raymond Wong
6/28/2014 CSE651C, B. Ramamurthy1.  Classification is placing things where they belong  Why? To learn from classification  To discover patterns  To.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Jiawei Han, Micheline.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Classification and Prediction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot Readings: Chapter 6 – Han and Kamber.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Uncertainty Uncertain Knowledge Probability Review Bayes’ Theorem Summary.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
Classification Techniques: Bayesian Classification
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
CS690L Data Mining: Classification
Chapter 20 Data Analysis and Mining. 2 n Decision Support Systems  Obtain high-level information out of detailed information stored in (DB) transaction-processing.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Chapter 6 Bayesian Learning
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
Bayesian Classification
COMP5331 Outlier Prepared by Raymond Wong Presented by Raymond Wong
COMP53311 Knowledge Discovery in Databases Overview Prepared by Raymond Wong Presented by Raymond Wong
Lecture Notes for Chapter 4 Introduction to Data Mining
Classification & Prediction — Continue—. Overfitting in decision trees Small training set, noise, missing values Error rate decreases as training set.
Classification Today: Basic Problem Decision Trees.
Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
1 1 MSCIT 5210: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Jiawei Han, Micheline.
Bayesian Learning. Bayes Classifier A probabilistic framework for solving classification problems Conditional Probability: Bayes theorem:
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
By N.Gopinath AP/CSE.  A decision tree is a flowchart-like tree structure, where each internal node (nonleaf node) denotes a test on an attribute, each.
Data-intensive Computing Algorithms: Classification Ref: Algorithms for the Intelligent Web 7/10/20161.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
Data-intensive Computing Algorithms: Classification
Prepared by: Mahmoud Rafeek Al-Farra
Quick Review Probability Theory
Classification 3 (Nearest Neighbor Classifier)
Decision Trees (suggested time: 30 min)
COMP1942 Classification: More Concept Prepared by Raymond Wong
Chapter 6 Classification and Prediction
Classification Techniques: Bayesian Classification
Naïve Bayes CSE651 6/7/2014.
Classification by Decision Tree Induction
Prepared by: Mahmoud Rafeek Al-Farra
Data Mining Classification: Alternative Techniques
Data Mining – Chapter 3 Classification
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 8 —
Other Classification Models: Support Vector Machine (SVM)
Presentation transcript:

COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP53312 Classification root child=yeschild=no Income=high Income=low 100% Yes 0% No 100% Yes 0% No 0% Yes 100% No Decision tree RaceIncomeChildInsurance whitehighno? Suppose there is a person.

COMP53313 Classification root child=yeschild=no Income=high Income=low 100% Yes 0% No 100% Yes 0% No 0% Yes 100% No Decision tree RaceIncomeChildInsurance whitehighno? Suppose there is a person. RaceIncomeChildInsurance blackhighnoyes whitehighyes whitelowyes whitelowyes blacklowno blacklowno blacklowno whitelowno Training set Test set

COMP53314 Applications Insurance According to the attributes of customers, Determine which customers will buy an insurance policy Marketing According to the attributes of customers, Determine which customers will buy a product such as computers Bank Loan According to the attributes of customers, Determine which customers are “ risky ” customers or “ safe ” customers

COMP53315 Applications Network According to the traffic patterns, Determine whether the patterns are related to some “ security attacks ” Software According to the experience of programmers, Determine which programmers can fix some certain bugs

COMP53316 Same/Difference Classification Clustering

COMP53317 Classification Methods Decision Tree Bayesian Classifier Nearest Neighbor Classifier

COMP53318 Decision Trees ID3 C4.5 CART Iterative Dichotomiser Classification And Regression Trees Classification

COMP53319 Entropy Example 1 Consider a random variable which has a uniform distribution over 32 outcomes To identify an outcome, we need a label that takes 32 different values. Thus, 5 bit strings suffice as labels

COMP Entropy Entropy is used to measure how informative is a node. If we are given a probability distribution P = (p 1, p 2, …, p n ) then the Information conveyed by this distribution, also called the Entropy of P, is: I(P) = - (p 1 x log p 1 + p 2 x log p 2 + … + p n x log p n ) All logarithms here are in base 2.

COMP Entropy For example, If P is (0.5, 0.5), then I(P) is 1. If P is (0.67, 0.33), then I(P) is 0.92, If P is (1, 0), then I(P) is 0. The entropy is a way to measure the amount of information. The smaller the entropy, the more informative we have.

COMP Entropy RaceIncomeChildInsurance blackhighnoyes whitehighyes whitelowyes whitelowyes blacklowno blacklowno blacklowno whitelowno Info(T) = - ½ log ½ - ½ log ½ = 1 Info(T black ) = - ¾ log ¾ - ¼ log ¼ For attribute Race, Info(T white ) = - ¾ log ¾ - ¼ log ¼ Info(Race, T) = ½ x Info(T black ) + ½ x Info(T white ) Gain(Race, T) = Info(T) – Info(Race, T)= 1 – = For attribute Race, Gain(Race, T) = =

COMP Entropy RaceIncomeChildInsurance blackhighnoyes whitehighyes whitelowyes whitelowyes blacklowno blacklowno blacklowno whitelowno Info(T) = - ½ log ½ - ½ log ½ = 1 Info(T low ) = - 1/3 log 1/3 – 2/3 log 2/3 For attribute Income, Info(T high ) = - 1 log 1 – 0 log 0 Info(Income, T) = ¼ x Info(T high ) + ¾ x Info(T low ) Gain(Income, T) = Info(T) – Info(Income, T)= 1 – = For attribute Race, Gain(Race, T) = For attribute Income, Gain(Income, T) = = = 0 =

COMP RaceIncomeChildInsurance blackhighnoyes whitehighyes whitelowyes whitelowyes blacklowno blacklowno blacklowno whitelowno Info(T) = - ½ log ½ - ½ log ½ = 1 Info(T yes ) = - 1 log 1 – 0 log 0 For attribute Child, Info(T no ) = - 1/5 log 1/5 – 4/5 log 4/5 Info(Child, T) = 3/8 x Info(T yes ) + 5/8 x Info(T no ) Gain(Child, T) = Info(T) – Info(Child, T)= 1 – = For attribute Race, Gain(Race, T) = For attribute Income, Gain(Income, T) = For attribute Child, Gain(Child, T) = = = 0 = root child=yeschild=no {2, 3, 4} {1, 5, 6, 7, 8} Insurance: 3 Yes; 0 No Insurance: 1 Yes; 4 No 100% Yes 0% No 20% Yes 80% No

COMP RaceIncomeChildInsurance blackhighnoyes whitehighyes whitelowyes whitelowyes blacklowno blacklowno blacklowno whitelowno Info(T) = - 1/5 log 1/5 – 4/5 log 4/5 = Info(T black ) = - ¼ log ¼ – ¾ log ¾ For attribute Race, Info(T white ) = - 0 log 0 – 1 log 1 Info(Race, T) = 4/5 x Info(T black ) + 1/5 x Info(T white ) Gain(Race, T) = Info(T) – Info(Race, T)= – = For attribute Race, Gain(Race, T) = = 0 = = root child=yeschild=no {2, 3, 4} {1, 5, 6, 7, 8} Insurance: 3 Yes; 0 No Insurance: 1 Yes; 4 No 100% Yes 0% No 20% Yes 80% No

COMP RaceIncomeChildInsurance blackhighnoyes whitehighyes whitelowyes whitelowyes blacklowno blacklowno blacklowno whitelowno Info(T) = - 1/5 log 1/5 – 4/5 log 4/5 = Info(T high ) = - 1 log 1 – 0 log 0 For attribute Income, Info(T low ) = - 0 log 0 – 1 log 1 Info(Income, T) = 1/5 x Info(T high ) + 4/5 x Info(T low ) Gain(Income, T) = Info(T) – Info(Income, T)= – 0 = For attribute Race, Gain(Race, T) = = root child=yeschild=no {2, 3, 4} {1, 5, 6, 7, 8} Insurance: 3 Yes; 0 No Insurance: 1 Yes; 4 No For attribute Income, Gain(Income, T) = % Yes 0% No 20% Yes 80% No

COMP RaceIncomeChildInsurance blackhighnoyes whitehighyes whitelowyes whitelowyes blacklowno blacklowno blacklowno whitelowno Info(T) = - 1/5 log 1/5 – 4/5 log 4/5 = Info(T high ) = - 1 log 1 – 0 log 0 For attribute Income, Info(T low ) = - 0 log 0 – 1 log 1 Info(Income, T) = 1/5 x Info(T high ) + 4/5 x Info(T low ) Gain(Income, T) = Info(T) – Info(Income, T)= – 0 = For attribute Race, Gain(Race, T) = = root child=yeschild=no For attribute Income, Gain(Income, T) = Income=high Income=low 100% Yes 0% No 20% Yes 80% No {1} {5, 6, 7, 8} Insurance: 1 Yes; 0 No Insurance: 0 Yes; 4 No 100% Yes 0% No 0% Yes 100% No

COMP RaceIncomeChildInsurance blackhighnoyes whitehighyes whitelowyes whitelowyes blacklowno blacklowno blacklowno whitelowno root child=yeschild=no Income=high Income=low 100% Yes 0% No 100% Yes 0% No 0% Yes 100% No Decision tree RaceIncomeChildInsurance whitehighno? Suppose there is a new person.

COMP RaceIncomeChildInsurance blackhighnoyes whitehighyes whitelowyes whitelowyes blacklowno blacklowno blacklowno whitelowno root child=yeschild=no Income=high Income=low 100% Yes 0% No 100% Yes 0% No 0% Yes 100% No Decision tree Termination Criteria? e.g., height of the tree e.g., accuracy of each node

COMP Decision Trees ID3 C4.5 CART

COMP C4.5 ID3 Impurity Measurement Gain(A, T) = Info(T) – Info(A, T) C4.5 Impurity Measurement Gain(A, T) = (Info(T) – Info(A, T))/SplitInfo(A) where SplitInfo(A) = -  v  A p(v) log p(v)

COMP Entropy RaceIncomeChildInsurance blackhighnoyes whitehighyes whitelowyes whitelowyes blacklowno blacklowno blacklowno whitelowno Info(T) = - ½ log ½ - ½ log ½ = 1 For attribute Race, Gain(Race, T) = (Info(T) – Info(Race, T))/SplitInfo(Race)= (1 – )/1 = For attribute Race, Gain(Race, T) = Info(T black ) = - ¾ log ¾ - ¼ log ¼ Info(T white ) = - ¾ log ¾ - ¼ log ¼ Info(Race, T) = ½ x Info(T black ) + ½ x Info(T white ) = SplitInfo(Race) = - ½ log ½ - ½ log ½ = 1

COMP Entropy RaceIncomeChildInsurance blackhighnoyes whitehighyes whitelowyes whitelowyes blacklowno blacklowno blacklowno whitelowno Info(T) = - ½ log ½ - ½ log ½ = 1 For attribute Income, Gain(Income, T)= (Info(T) – Info(Income, T))/SplitInfo(Income)= (1 – )/ = For attribute Race, Gain(Race, T) = For attribute Income, Gain(Income, T) = Info(T low ) = - 1/3 log 1/3 – 2/3 log 2/3 Info(T high ) = - 1 log 1 – 0 log 0 Info(Income, T) = ¼ x Info(T high ) + ¾ x Info(T low ) = = 0 = SplitInfo(Income) = - 2/8 log 2/8 – 6/8 log 6/8= For attribute Child, Gain(Child, T) = ?

COMP Decision Trees ID3 C4.5 CART

COMP CART Impurity Measurement Gini I(P) = 1 –  j p j 2

COMP Gini RaceIncomeChildInsurance blackhighnoyes whitehighyes whitelowyes whitelowyes blacklowno blacklowno blacklowno whitelowno Info(T) = 1 – ( ½ ) 2 – ( ½ ) 2 = ½ Info(T black ) = 1 – ( ¾ ) 2 – ( ¼ ) 2 For attribute Race, Info(T white ) = 1 – ( ¾ ) 2 – ( ¼ ) 2 Info(Race, T) = ½ x Info(T black ) + ½ x Info(T white ) Gain(Race, T) = Info(T) – Info(Race, T)= ½ – = For attribute Race, Gain(Race, T) = = 0.375

COMP Gini RaceIncomeChildInsurance blackhighnoyes whitehighyes whitelowyes whitelowyes blacklowno blacklowno blacklowno whitelowno Info(T) = 1 – ( ½ ) 2 – ( ½ ) 2 = ½ Info(T low ) = 1 – (1/3) 2 – (2/3) 2 For attribute Income, Info(T high ) = 1 – 1 2 – 0 2 Info(Income, T) = 1/4 x Info(T high ) + 3/4 x Info(T low ) Gain(Income, T) = Info(T) – Info(Race, T)= ½ – = For attribute Race, Gain(Race, T) = = = = 0 For attribute Income, Gain(Race, T) = For attribute Child, Gain(Child, T) = ?

COMP Classification Methods Decision Tree Bayesian Classifier Nearest Neighbor Classifier

COMP Bayesian Classifier Na ï ve Bayes Classifier Bayesian Belief Networks

COMP Na ï ve Bayes Classifier Statistical Classifiers Probabilities Conditional probabilities

COMP Na ï ve Bayes Classifier Conditional Probability A: a random variable B: a random variable P(A | B) = P(AB) P(B)

COMP Na ï ve Bayes Classifier Bayes Rule A : a random variable B: a random variable P(A | B) = P(B|A) P(A) P(B)

COMP Na ï ve Bayes Classifier Independent Assumption Each attribute are independent e.g., P(X, Y, Z | A) = P(X | A) x P(Y | A) x P(Z | A) RaceIncomeChildInsurance blackhighnoyes whitehighyes whitelowyes whitelowyes blacklowno blacklowno blacklowno whitelowno

COMP Na ï ve Bayes Classifier RaceIncomeChildInsurance blackhighnoyes whitehighyes whitelowyes whitelowyes blacklowno blacklowno blacklowno whitelowno P(Race = black | Yes) = ¼ For attribute Race, P(Race = white | Yes) = ¾ P(Race = black | No) = ¾ P(Race = white | No) = ¼ P(Income = high | Yes) = ½ For attribute Income, P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1 P(Child = yes | Yes) = ¾ For attribute Child, P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1 Insurance = Yes P(Yes) = ½ P(No) = ½ RaceIncomeChildInsurance whitehighno? Suppose there is a new person. P(Race = white, Income = high, Child = no| Yes) Na ï ve Bayes Classifier = P(Race = white | Yes) x P(Income = high | Yes) x P(Child = no | Yes) = ¾ x ½ x ¼ = P(Race = white, Income = high, Child = no| No) = P(Race = white | No) x P(Income = high | No) x P(Child = no | No) = ¼ x 0 x 1 = 0

COMP Na ï ve Bayes Classifier RaceIncomeChildInsurance blackhighnoyes whitehighyes whitelowyes whitelowyes blacklowno blacklowno blacklowno whitelowno P(Race = black | Yes) = ¼ For attribute Race, P(Race = white | Yes) = ¾ P(Race = black | No) = ¾ P(Race = white | No) = ¼ P(Income = high | Yes) = ½ For attribute Income, P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1 P(Child = yes | Yes) = ¾ For attribute Child, P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1 Insurance = Yes P(Yes) = ½ P(No) = ½ RaceIncomeChildInsurance whitehighno? Suppose there is a new person. P(Race = white, Income = high, Child = no| Yes) Na ï ve Bayes Classifier = P(Race = white, Income = high, Child = no| No) = P(Race = white | No) x P(Income = high | No) x P(Child = no | No) = ¼ x 0 x 1 = 0

COMP Na ï ve Bayes Classifier RaceIncomeChildInsurance blackhighnoyes whitehighyes whitelowyes whitelowyes blacklowno blacklowno blacklowno whitelowno P(Race = black | Yes) = ¼ For attribute Race, P(Race = white | Yes) = ¾ P(Race = black | No) = ¾ P(Race = white | No) = ¼ P(Income = high | Yes) = ½ For attribute Income, P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1 P(Child = yes | Yes) = ¾ For attribute Child, P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1 Insurance = Yes P(Yes) = ½ P(No) = ½ RaceIncomeChildInsurance whitehighno? Suppose there is a new person. P(Race = white, Income = high, Child = no| Yes) Na ï ve Bayes Classifier = P(Race = white, Income = high, Child = no| No) = P(Race = white | No) x P(Income = high | No) x P(Child = no | No) = ¼ x 0 x 1 = 0

COMP Na ï ve Bayes Classifier RaceIncomeChildInsurance blackhighnoyes whitehighyes whitelowyes whitelowyes blacklowno blacklowno blacklowno whitelowno P(Race = black | Yes) = ¼ For attribute Race, P(Race = white | Yes) = ¾ P(Race = black | No) = ¾ P(Race = white | No) = ¼ P(Income = high | Yes) = ½ For attribute Income, P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1 P(Child = yes | Yes) = ¾ For attribute Child, P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1 Insurance = Yes P(Yes) = ½ P(No) = ½ RaceIncomeChildInsurance whitehighno? Suppose there is a new person. P(Race = white, Income = high, Child = no| Yes) Na ï ve Bayes Classifier = P(Race = white, Income = high, Child = no| No) = 0

COMP Na ï ve Bayes Classifier RaceIncomeChildInsurance blackhighnoyes whitehighyes whitelowyes whitelowyes blacklowno blacklowno blacklowno whitelowno P(Race = black | Yes) = ¼ For attribute Race, P(Race = white | Yes) = ¾ P(Race = black | No) = ¾ P(Race = white | No) = ¼ P(Income = high | Yes) = ½ For attribute Income, P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1 P(Child = yes | Yes) = ¾ For attribute Child, P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1 Insurance = Yes P(Yes) = ½ P(No) = ½ RaceIncomeChildInsurance whitehighno? Suppose there is a new person. P(Race = white, Income = high, Child = no| Yes) Na ï ve Bayes Classifier = P(Race = white, Income = high, Child = no| No) = 0

COMP Na ï ve Bayes Classifier RaceIncomeChildInsurance blackhighnoyes whitehighyes whitelowyes whitelowyes blacklowno blacklowno blacklowno whitelowno P(Race = black | Yes) = ¼ For attribute Race, P(Race = white | Yes) = ¾ P(Race = black | No) = ¾ P(Race = white | No) = ¼ P(Income = high | Yes) = ½ For attribute Income, P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1 P(Child = yes | Yes) = ¾ For attribute Child, P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1 Insurance = Yes P(Yes) = ½ P(No) = ½ RaceIncomeChildInsurance whitehighno? Suppose there is a new person. P(Race = white, Income = high, Child = no| Yes) Na ï ve Bayes Classifier = P(Race = white, Income = high, Child = no| No) = 0 P(Yes | Race = white, Income = high, Child = no) = P(Race = white, Income = high, Child = no| Yes) P(Yes) P(Race = white, Income = high, Child = no) = x 0.5 P(Race = white, Income = high, Child = no) = P(Race = white, Income = high, Child = no) P(Yes | Race = white, Income = high, Child = no) = P(Race = white, Income = high, Child = no)

COMP Na ï ve Bayes Classifier RaceIncomeChildInsurance blackhighnoyes whitehighyes whitelowyes whitelowyes blacklowno blacklowno blacklowno whitelowno P(Race = black | Yes) = ¼ For attribute Race, P(Race = white | Yes) = ¾ P(Race = black | No) = ¾ P(Race = white | No) = ¼ P(Income = high | Yes) = ½ For attribute Income, P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1 P(Child = yes | Yes) = ¾ For attribute Child, P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1 Insurance = Yes P(Yes) = ½ P(No) = ½ RaceIncomeChildInsurance whitehighno? Suppose there is a new person. P(Race = white, Income = high, Child = no| Yes) Na ï ve Bayes Classifier = P(Race = white, Income = high, Child = no| No) = 0 P(No | Race = white, Income = high, Child = no) = P(Race = white, Income = high, Child = no| No) P(No) P(Race = white, Income = high, Child = no) = 0 x 0.5 P(Race = white, Income = high, Child = no) = 0 P(Yes | Race = white, Income = high, Child = no) = P(Race = white, Income = high, Child = no) P(No | Race = white, Income = high, Child = no) = 0 P(Race = white, Income = high, Child = no)

COMP Na ï ve Bayes Classifier RaceIncomeChildInsurance blackhighnoyes whitehighyes whitelowyes whitelowyes blacklowno blacklowno blacklowno whitelowno P(Race = black | Yes) = ¼ For attribute Race, P(Race = white | Yes) = ¾ P(Race = black | No) = ¾ P(Race = white | No) = ¼ P(Income = high | Yes) = ½ For attribute Income, P(Income = low | Yes) = ½ P(Income = high | No) = 0 P(Income = low | No) = 1 P(Child = yes | Yes) = ¾ For attribute Child, P(Child = no | Yes) = ¼ P(Child = yes | No) = 0 P(Child = no | No) = 1 Insurance = Yes P(Yes) = ½ P(No) = ½ RaceIncomeChildInsurance whitehighno? Suppose there is a new person. Na ï ve Bayes Classifier P(Yes | Race = white, Income = high, Child = no) = P(Race = white, Income = high, Child = no) P(No | Race = white, Income = high, Child = no) = 0 P(Race = white, Income = high, Child = no) Since P(Yes | Race = white, Income = high, Child = no) > P(No | Race = white, Income = high, Child = no). we predict the following new person will buy an insurance. RaceIncomeChildInsurance whitehighno?

COMP Bayesian Classifier Na ï ve Bayes Classifier Bayesian Belief Networks

COMP Bayesian Belief Network Na ï ve Bayes Classifier Independent Assumption Bayesian Belief Network Do not have independent assumption

COMP Bayesian Belief Network ExerciseDietHeartburnBlood PressureChest PainHeart Disease YesHealthyNoHighYesNo UnhealthyYesLowYesNo HealthyYesHighNoYes ……………… Yes/No Healthy/ Unhealthy Yes/No High/Low Some attributes are dependent on other attributes. e.g., doing exercises may reduce the probability of suffering from Heart Disease Exercise (E) Heart Disease

COMP Bayesian Belief Network Exercise (E)Diet (D) Heart Disease (HD)Heartburn (Hb) Blood Pressure (BP)Chest Pain (CP) E = Yes 0.7 D = Healthy 0.25 HD=Yes E=Yes D=Healthy 0.25 E=Yes D=Unhealthy 0.45 E=No D=Healthy 0.55 E=No D=Unhealthy 0.75 CP=Yes HD=Yes Hb=Yes 0.8 HD=Yes Hb=No 0.6 HD=No Hb=Yes 0.4 HD=No Hb=No 0.1 BP=High HD=Yes0.85 HD=No0.2 Hb=Yes D=Healthy0.85 D=Unhealthy0.2

COMP Bayesian Belief Network Let X, Y, Z be three random variables. X is said to be conditionally independent of Y given Z if the following holds. P(X | Y, Z) = P(X | Z) Lemma: If X is conditionally independent of Y given Z, P(X, Y | Z) = P(X | Z) x P(Y | Z) ?

COMP Bayesian Belief Network Exercise (E)Diet (D) Heart Disease (HD)Heartburn (Hb) Blood Pressure (BP)Chest Pain (CP) Let X, Y, Z be three random variables. X is said to be conditionally independent of Y given Z if the following holds. P(X | Y, Z) = P(X | Z) e.g., P(BP = High | HD = Yes, D = Healthy) = P(BP = High | HD = Yes) e.g., P(BP = High | HD = Yes, CP=Yes) = P(BP = High | HD = Yes) “ BP = High ” is conditionally independent of “ D = Healthy ” given “ HD = Yes ” “ BP = High ” is conditionally independent of “ CP = Yes ” given “ HD = Yes ” Property: A node is conditionally independent of its non-descendants if its parents are known.

COMP Bayesian Belief Network ExerciseDietHeartburnBlood PressureChest PainHeart Disease YesHealthyNoHighYesNo UnhealthyYesLowYesNo HealthyYesHighNoYes ……………… Suppose there is a new person and I want to know whether he is likely to have Heart Disease. ExerciseDietHeartburnBlood PressureChest PainHeart Disease ?????? Yes/No Healthy/ Unhealthy Yes/No High/Low ExerciseDietHeartburnBlood PressureChest PainHeart Disease ???High?? ExerciseDietHeartburnBlood PressureChest PainHeart Disease YesHealthy?High??

COMP Bayesian Belief Network Suppose there is a new person and I want to know whether he is likely to have Heart Disease. ExerciseDietHeartburnBlood PressureChest PainHeart Disease ?????? P(HD = Yes) =  x  {Yes, No}  y  {Healthy, Unhealthy} P(HD=Yes|E=x, D=y) x P(E=x, D=y) =  x  {Yes, No}  y  {Healthy, Unhealthy} P(HD=Yes|E=x, D=y) x P(E=x) x P(D=y) = 0.25 x 0.7 x x 0.7 x x 0.3 x x 0.3 x 0.75 = 0.49 P(HD = No) = 1- P(HD = Yes) = = 0.51

COMP Bayesian Belief Network Suppose there is a new person and I want to know whether he is likely to have Heart Disease. ExerciseDietHeartburnBlood PressureChest PainHeart Disease ???High?? P(BP = High) =  x  {Yes, No} P(BP = High|HD=x) x P(HD = x) = 0.85x x0.51 = P(HD = Yes|BP = High) = P(BP = High|HD=Yes) x P(HD = Yes) P(BP = High) = 0.85 x = P(HD = No|BP = High) = 1 – P(HD = Yes|BP = High) = 1 – =

COMP Bayesian Belief Network Suppose there is a new person and I want to know whether he is likely to have Heart Disease. ExerciseDietHeartburnBlood PressureChest PainHeart Disease YesHealthy?High?? P(HD = Yes | BP = High, D = Healthy, E = Yes) = P(BP = High | HD = Yes, D = Healthy, E = Yes) P(BP = High | D = Healthy, E = Yes) x P(HD = Yes|D = Healthy, E = Yes) P(BP = High|HD = Yes) P(HD = Yes|D = Healthy, E = Yes)  x  {Yes, No} P(BP=High|HD=x) P(HD=x|D=Healthy, E=Yes) = 0.85x x x0.75 = = P(HD = No | BP = High, D = Healthy, E = Yes) = 1- P(HD = Yes | BP = High, D = Healthy, E = Yes) = =

COMP Classification Methods Decision Tree Bayesian Classifier Nearest Neighbor Classifier

COMP Nearest Neighbor Classifier ComputerHistory …… Computer History

COMP Nearest Neighbor Classifier ComputerHistoryBuy Book? 10040No (-) 9045Yes (+) 2095Yes (+) ……… Computer History

COMP Nearest Neighbor Classifier ComputerHistoryBuy Book? 10040No (-) 9045Yes (+) 2095Yes (+) ……… Computer History Suppose there is a new person ComputerHistoryBuy Book? 9535? Nearest Neighbor Classifier: Step 1: Find the nearest neighbor Step 2: Use the “ label ” of this neighbor

COMP Nearest Neighbor Classifier ComputerHistoryBuy Book? 10040No (-) 9045Yes (+) 2095Yes (+) ……… Computer History Suppose there is a new person ComputerHistoryBuy Book? 9535? k-Nearest Neighbor Classifier: Step 1: Find k nearest neighbors Step 2: Use the majority of the labels of the neighbors