Data Science Algorithms: The Basic Methods

Slides:



Advertisements
Similar presentations
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
Advertisements

Naïve Bayes Classifier
Naïve Bayes Classifier
Handling Uncertainty. Uncertain knowledge Typical example: Diagnosis. Consider data instances about patients: Can we certainly derive the diagnostic rule:
Algorithms: The basic methods. Inferring rudimentary rules Simplicity first Simple algorithms often work surprisingly well Many different kinds of simple.
1 Bayesian Classification Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe Frank.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Data Mining with Naïve Bayesian Methods
1 Bayesian Classification Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe Frank.
Review. 2 Statistical modeling  “Opposite” of 1R: use all the attributes  Two assumptions: Attributes are  equally important  statistically independent.
Handling Uncertainty. Uncertain knowledge Typical example: Diagnosis. Can we certainly derive the diagnostic rule: if Toothache=true then Cavity=true.
Algorithms for Classification: The Basic Methods.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Naïve Bayes Classifier Ke Chen Extended by Longin Jan Latecki COMP20411 Machine Learning.
NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav.
Naïve Bayes Classifier Ke Chen Modified and extended by Longin Jan Latecki
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Classification: Bayes February 17, 2009.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Classification Techniques: Bayesian Classification
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
1 Naïve Bayes Classification CS 6243 Machine Learning Modified from the slides by Dr. Raymond J. Mooney
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.2 Statistical Modeling Rodney Nielsen Many.
 Classification 1. 2  Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.  Supervised learning: classes.
Algorithms for Classification: The Basic Methods.
Data Mining – Algorithms: Naïve Bayes Chapter 4, Section 4.2.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
Slide 1 DSCI 4520/5240: Data Mining Fall 2013 – Dr. Nick Evangelopoulos Lecture 5: Decision Tree Algorithms Material based on: Witten & Frank 2000, Olson.
Machine Learning in Practice Lecture 5 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.5: Mining Association Rules Rodney Nielsen.
COMP24111 Machine Learning Naïve Bayes Classifier Ke Chen.
Chapter 4: Algorithms CS 795. Inferring Rudimentary Rules 1R – Single rule – one level decision tree –Pick each attribute and form a single level tree.
Data Mining Chapter 4 Algorithms: The Basic Methods Reporter: Yuen-Kuei Hsueh.
Data Mining Practical Machine Learning Tools and Techniques Chapter 6.3: Association Rules Rodney Nielsen Many / most of these slides were adapted from:
COMP24111 Machine Learning Naïve Bayes Classifier Ke Chen.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:
Bayesian Learning Reading: Tom Mitchell, “Generative and discriminative classifiers: Naive Bayes and logistic regression”, Sections 1-2. (Linked from.
Data Science Credibility: Evaluating What’s Been Learned
Oliver Schulte Machine Learning 726
Decision Trees an introduction.
Data Science Algorithms: The Basic Methods
Data Science Algorithms: The Basic Methods
Algorithms for Classification:
Naïve Bayes Classifier
Teori Keputusan (Decision Theory)
Prepared by: Mahmoud Rafeek Al-Farra
Decision Trees: Another Example
Artificial Intelligence
Naïve Bayes Classifier
Data Science Algorithms: The Basic Methods
Bayes Net Learning: Bayesian Approaches
Data Science Algorithms: The Basic Methods
Oliver Schulte Machine Learning 726
Naïve Bayes Classifier
Decision Tree Saed Sayad 9/21/2018.
Classification Techniques: Bayesian Classification
Bayesian Classification
Machine Learning Techniques for Data Mining
Clustering.
Naïve Bayes Classifier
Generative Models and Naïve Bayes
Play Tennis ????? Day Outlook Temperature Humidity Wind PlayTennis
Dept. of Computer Science University of Liverpool
Generative Models and Naïve Bayes
Data Mining CSCI 307, Spring 2019 Lecture 15
Data Mining CSCI 307, Spring 2019 Lecture 18
Data Mining CSCI 307, Spring 2019 Lecture 6
Naïve Bayes Classifier
Data Mining CSCI 307, Spring 2019 Lecture 9
Presentation transcript:

Data Science Algorithms: The Basic Methods Statistical Modeling WFH: Data Mining, Chapter 4.2 Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall

Algorithms: The Basic Methods Inferring rudimentary rules Naïve Bayes, probabilistic model Constructing decision trees Constructing rules Association rule learning Linear models Instance-based learning Clustering

Bayes rule If you pick up a seashell and hold it up to your ear, you can hear the ocean What’s the probability that if you do not hold the seashell to your ear, that you will hear the ocean

Probabilistic Modeling “Opposite” of 1R: use all the attributes Two assumptions: Attributes are Equally important Statistically independent (given the class value) I.e., knowing the value of one attribute says nothing about the value of another (if the class is known) Independence assumption is rarely correct But … this scheme works well in practice

Bayes’ Rule Probability of event h given evidence e: A priori probability of h: Probability of event before evidence is seen A posteriori probability of h: Probability of event after evidence is seen Thomas Bayes Born: 1702 in London, England Died: 1761 in Tunbridge Wells, Kent, England

Bayes’ Rule Probability of event h given evidence e:

Bayes’ Rule Probability of event h given evidence e: Independence assumption:

Naïve Bayes for Classification Classification learning: what’s the probability of the class given an instance? Evidence e = instance Hypothesis h = class value for the data instance Naïve assumption: evidence splits into parts (i.e. attributes) that are independent

Weather Data Example Evidence E Probability of class “yes” ? True High Cool Sunny Play Windy Humidity Temp. Outlook Evidence E       Probability of class “yes”      

Probabilities for Weather Data Outlook Temperature Humidity Windy Play Yes No Yes No Yes No Yes No Yes No Sunny 2 3 Hot 2 2 High 3 4 False 6 2 9 5 Overcast 4 Mild 4 2 Normal 6 1 True 3 3 Rainy 3 2 Cool 3 1 Sunny 2/9 3/5 Hot 2/9 2/5 High 3/9 4/5 False 6/9 2/5 9/ 14 5/ 14 Overcast 4/9 0/5 Mild 4/9 2/5 Normal 6/9 1/5 True 3/9 3/5 Rainy 3/9 2/5 Cool 3/9 1/5 No True High Mild Rainy Yes False Normal Hot Overcast Sunny Cool Play Windy Humidity Temp Outlook

Probabilities for Weather Data 5/ 14 5 No 9/ 9 Yes Play 3/5 2/5 3 2 3/9 6/9 6 True False Windy 1/5 4/5 1 4 Normal High Humidity 4/9 2/9 Cool Rainy Mild Hot Temperature 0/5 Overcast Sunny Outlook ? True High Cool Sunny Play Windy Humidity Temp. Outlook A new day: Animate the bottom box. Likelihood of the two classes For “yes” = 2/9  3/9  3/9  3/9  9/14 = 0.0053 For “no” = 3/5  1/5  4/5  3/5  5/14 = 0.0206 Conversion into a probability by normalization: P(“yes”) = 0.0053 / (0.0053 + 0.0206) = 0.205 P(“no”) = 0.0206 / (0.0053 + 0.0206) = 0.795

The “Zero-frequency Problem” What if an attribute value doesn’t occur with every class value? (e.g. “Outlook = Overcast” for class “no”) Probability will be zero! A posteriori probability will also be zero! (No matter how likely the other values are!) Remedy: add 1 to the count for every attribute value-class combination (Laplace estimator) Result: probabilities will never be zero! (also: stabilizes probability estimates) Animate with a question

Modified Probability Estimates In some cases adding a constant different from 1 might be more appropriate Example: attribute outlook for class yes Sunny Overcast Rainy

Missing Values In Training: the instance is not included in frequency count for attribute value-class combination In Classification: attribute will be omitted from calculation Example: ? True High Cool Play Windy Humidity Temp. Outlook Likelihood of “yes” = 3/9  3/9  3/9  9/14 = 0.0238 Likelihood of “no” = 1/5  4/5  3/5  5/14 = 0.0343 P(“yes”) = 0.0238 / (0.0238 + 0.0343) = 41% P(“no”) = 0.0343 / (0.0238 + 0.0343) = 59%

Numeric Attributes Usual assumption: attributes have a normal or Gaussian probability distribution (given the class) The probability density function for the normal distribution is defined by two parameters: Mean Standard deviation

Statistics for Weather Data 5/14 5 No 9/14 9 Yes Play 3/5 2/5 3 2 3/9 6/9 6 True False Windy s =9.7 ave =86 95, … 90, 91, 70, 85, s =10.2 ave =79 80, … 70, 75, 65, 70, Humidity s =7.9 ave =75 85, … 72,80, 65,71, s =6.2 ave =73 72, … 69, 70, 64, 68, Rainy Temperature 0/5 4/9 Overcast 2/9 Sunny 4 Outlook Example density value:

Classifying a New Day A new day: Missing values during training are not included in calculation of mean and standard deviation ? true 90 66 Sunny Play Windy Humidity Temp. Outlook Likelihood of “yes” = 2/9  0.0340  0.0221  3/9  9/14 = 0.000036 Likelihood of “no” = 3/5  0.0221  0.0381  3/5  5/14 = 0.000108 P(“yes”) = 0.000036 / (0.000036 + 0. 000108) = 25% P(“no”) = 0.000108 / (0.000036 + 0. 000108) = 75%

Naïve Bayes: Discussion Naïve Bayes works surprisingly well (even if independence assumption is clearly violated) Why? Because classification doesn’t require accurate probability estimates as long as maximum probability is assigned to correct class However: adding too many redundant attributes will cause problems (e.g. identical attributes) Note also: many numeric attributes are not normally distributed