Data Mining Chapter 4 Algorithms: The Basic Methods Reporter: Yuen-Kuei Hsueh.

Data Mining Chapter 4 Algorithms: The Basic Methods Reporter: Yuen-Kuei Hsueh

Outline Inferring rudimentary rules Statistical modeling

Simplicity first Simple algorithms often work very well! There are many kinds of simple structure, eg: –One attribute does all the work –All attributes contribute equally & independently –A weighted linear combination might do –Instance-based: use a few prototypes –Use simple logical rules Success of method depends on the domain

Inferring rudimentary rules 1R: learns a 1-level decision tree –I.e., rules that all test one particular attribute Basic version –One branch for each value –Each branch assigns most frequent class –Error rate: proportion of instances that don’t belong to the majority class of their corresponding branch –Choose attribute with lowest error rate (assumes nominal attributes)

Pseudo-code for 1R For each attribute, For each value of the attribute, make a rule as follow: count how often each class appears find the most frequent class make the rule assign that class to this attribute-value Calculate the error rate of the rules Choose the rules with the smallest error rate

Evaluating the weather attributes OutlookTempHumidityWindyPlay SunnyHotHighFalseNo SunnyHotHighTrueNo OvercastHotHighFalseYes RainyMildHighFalseYes RainyCoolNormalFalseYes RainyCoolNormalTrueNo OvercastCoolNormalTrueYes SunnyMildHighFalseNo SunnyCoolNormalFalseYes RainyMildNormalFalseYes SunnyMildNormalTrueYes OvercastMildHighTrueYes OvercastHotNormalFalseYes RainyMildHighTrueNo Attribute class Table 1. Weather data (Nominal)

Evaluating the weather attributes There are 3 classes of attribute Outlook (Sunny, Overcast and Rainy) There are 14 instances of attribute Outlook

Evaluating the weather attributes AttributeRulesErrorsTotal errors OutlookSunny->No2/54/14 Overcast->Yes0/4 Rainy->Yes2/5 TempHot->No*2/45/14 Mild->Yes2/6 Cool->Yes1/4 HumidityHigh->No3/74/14 Normal->Yes1/7 WindyFalse->Yes2/85/14 True->No*3/6 Table 2. Rules for Weather data (Nominal)

Dealing with numeric attributes Discretize numeric attributes Divide each attribute’s range into intervals –Sort instances according to attribute’s values –Place breakpoints where class change –This minimizes the total error

Temperature from weather data (See Table 3&4) 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Y | N | Y Y Y | N N Y | Y Y | N | Y Y | N Dealing with numeric attributes

AttributeRulesErrorsTotal errors Temperature Yes0/11/14 >64.5->No0/1 >66.5 and Yes0/3 >70.5 and No1/3 >73.5 and Yes0/2 >77.5 and No0/1 >80.5 and Yes0/2 >84->No0/1 Table 5. Rules for temperature from weather data (overfitting)

The problem of overfitting This procedure is very sensitive to noise –One instance with an incorrect class label will probably produce a separate interval Simple solution: –Enforce minimum number of instances in majority class per interval

The problem of overfitting Example (with min = 3) 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Y | N | Y Y Y | N N Y | Y Y | N | Y Y | N 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Y N Y Y Y | N N Y Y Y | N Y Y N

With overfitting avoidance AttributeRulesErrorsTotal errors Temperature Yes3/105/14 77.5->No*2/4 Table 6. Rules for temperature from weather data (With overfitting avoidance)

Statistical modeling Opposite of 1R: use all the attributes Two assumptions: Attributes are –equally important –statistically independent (given the class value) Independence assumption is never correct! But…this scheme works well in practice.

Probabilities for the weather data OutlookTemperatureHumidityWindyPlay YesNoYesNoYesNoYesNoYesNo Sunny23Hot22High34False6295 Overcast40Mild42Normal61True33 Rainy32Cool31 Sunny2/93/5Hot2/92/5High3/94/5False6/92/59/145/14 Overcast4/90/5Mild4/92/5Normal6/91/5True3/93/5 Rainy3/92/5Cool3/91/5 Table 6. Probabilities for the weather data

Probabilities for the weather data A new day OutlookTemperatureHumidityWindyPlay SunnyCoolHighTrue? Likelihood of the two classes For “Yes” = 2/9*3/9*3/9*3/9*9/14=0.0053 For “No” = 3/5*1/5*4/5*3/5*5/14=0.0206 Conversion into a probability by normalization: P(“Yes”) = 0.0053/(0.0053+0.0206)=0.205 P(“No”) = 0. 0206/(0.0053+0.0206)=0.795

Bayes’s rule Probability of event H given evidence E: A priori probability of H: –Probability of event before evidence is seen. A posteriori probability of H: –Probability of event after evidence is seen.

Naïve Bayes for classification Classification learning: what’s the probability of the class given an instance? Naïve assumption: evidence splits into parts (i.e. attributes) that are independent

Weather data example OutlookTemperatureHumidityWindyPlay SunnyCoolHighTrue?

The “zero-frequency problem” What if an attribute value does not occur with every class value? (e.g. “Humidity=high” for class “yes”) –Probability will be zero –A posteriori probability will also be zero! Remedy: add 1 to the count to every attribute value-class combination Result: probabilities will never be zero!

Modified probability estimates In some cases adding a constant different from 1 might be more appropriate Example: attribute outlook for class yes SunnyOvercastRainy Weights don’t need to be equal (but they must sum to 1)

Missing values Training: instance is not included in frequency count for attribute value-class combination Classification: attribute will be omitted from calculation Example: OutlookTemp.HumidityWindyPlay ?CoolHighTrue? Likelihood of the two classes For “Yes” = 3/9*3/9*3/9*9/14=0.0238 For “No” = 1/5*4/5*3/5*5/14=0.0343 Conversion into a probability by normalization: P(“Yes”) = 0.0238/(0.0238+0.0343)=0.41 P(“No”) = 0.0343/(0.0238+0.0343)=0.59

Numeric attributes Usual assumption: attributes have a normal or Gaussian probability distribution (giver the class) The probability density function for the normal distribution is defined by two parameters: –Sample mean –Standard deviation –Then the density function f(x) is

Statistics for weather data OutlookTemperatureHumidityWindyPlay YesNoYesNoYesNoYesNoYesNo Sunny2364,68,7 0,72,75,… 65,71,7 2,80,85,… 65,70,7 0,75,80,… 70,85,9 0,91,95, … False6295 Overcast40True33 Rainy32 Sunny2/93/5False6/92/59/145/14 Overcast4/90/5True3/93/5 Rainy3/92/5 Example density value:

Classifying a new day A new day OutlookTemperatureHumidityWindyPlay Sunny6690True? Likelihood of the two classes For “Yes” = 2/9*0.0340*0.0221*3/9*9/14=0.000036 For “No” = 3/5*0.0221*0.0381*3/5*5/14=0.000108 Conversion into a probability by normalization: P(“Yes”) = 0.000036/(0.000036+0.000108)=0.25 P(“No”) = 0.000108/(0.000036+0.000108)=0.75

Multinomial naïve Bayes I Version of naïve Bayes used for document classification using bag of words model n 1, n 2,…, n k :number of times word I occurs in document P 1, P 2,…, P k :probability of obtaining word i when sampling from document in class H Probability of observing document E given class H (based on multinomial distribution): Ignores probability of generating a document of the right length (probability assumed constant for each class)

Multinomial naïve Bayes II Suppose dictionary has two words, yellow and blue Suppose Pr[yellow|H]=75% and Pr[blue|H]=25% Suppose E is the document “blue yellow blue” Probability of observing document: Suppose there is another class H’ that has Pr[yellow|H]=10% and Pr[blue|H]=90%

Naïve Bayes: discussion Naïve Bayes works surprisingly well (even if independence assumption is clearly violated) Why? Because classification does not require accurate probability estimates. However: adding too many redundant attributes will cause problems Note also: many numeric attributes are not normally distributed!

Data Mining Chapter 4 Algorithms: The Basic Methods Reporter: Yuen-Kuei Hsueh.

Similar presentations

Presentation on theme: "Data Mining Chapter 4 Algorithms: The Basic Methods Reporter: Yuen-Kuei Hsueh."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Mining Chapter 4 Algorithms: The Basic Methods Reporter: Yuen-Kuei Hsueh.

Similar presentations

Presentation on theme: "Data Mining Chapter 4 Algorithms: The Basic Methods Reporter: Yuen-Kuei Hsueh."— Presentation transcript:

Similar presentations

About project

Feedback