Chapter 4, Doing Data Science

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

Data Mining Classification: Alternative Techniques
Ch5 Stochastic Methods Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011.
Chapter 4: Linear Models for Classification
What is Statistical Modeling
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
Data-intensive Computing Algorithms: Classification Ref: Algorithms for the Intelligent Web 6/26/20151.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Naïve Bayes Chapter 4, DDS. Introduction Classification Training set  design a model Test set  validate the model Classify data set using the model.
Exercise Session 10 – Image Categorization
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Bayesian Networks. Male brain wiring Female brain wiring.
6/28/2014 CSE651C, B. Ramamurthy1.  Classification is placing things where they belong  Why? To learn from classification  To discover patterns  To.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Bayes for Beginners Presenters: Shuman ji & Nick Todd.
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
“PREDICTIVE MODELING” CoSBBI, July Jennifer Hu.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
Classification Techniques: Bayesian Classification
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Data-intensive Computing Algorithms: Classification Ref: Algorithms for the Intelligent Web 7/10/20161.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:
Bayesian Learning Reading: Tom Mitchell, “Generative and discriminative classifiers: Naive Bayes and logistic regression”, Sections 1-2. (Linked from.
Conditional Probability, Bayes Theorem, Independence and Repetition of Experiments Chris Massa.
Who am I? Work in Probabilistic Machine Learning Like to teach 
Naïve Bayes CSE651C, B. Ramamurthy 6/28/2014.
Data-intensive Computing Algorithms: Classification
Machine Learning – Classification David Fenyő
Probability Axioms and Formulas
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Oliver Schulte Machine Learning 726
Discrete Structures for Computer Science
Data Mining Lecture 11.
Machine Learning. k-Nearest Neighbor Classifiers.
Naïve Bayes and Logistic Regression & Classification
Classification Techniques: Bayesian Classification
Naïve Bayes CSE487/587 Spring /17/2018.
Naïve Bayes CSE651 6/7/2014.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 8 —
Computer Vision Chapter 4
The Naïve Bayes (NB) Classifier
Naïve Bayes CSE487/587 Spring2017 4/4/2019.
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Machine Learning: Lecture 6
Multivariate Methods Berlin Chen, 2005 References:
Conditional Probability, Bayes Theorem, Independence and Repetition of Experiments Chris Massa.
Naïve Bayes Classifier
Presentation transcript:

Chapter 4, Doing Data Science Naïve Bayes Chapter 4, Doing Data Science 7/30/2019

Goals Classification is placing things where they belong Why? To learn from classification To discover patterns To learn from history as to what our response is to a given class of events, for example. 7/30/2019

Classification Classification relies on apriori reference structures that divide the space of all possible data points into a set of classes that are not overlapping. (what do you do the data points overlap?) What are the problems it (classification) can solve? What are some of the common classification methods? Which one is better for a given situation? (meta classifier) 7/30/2019

Classification examples in daily life Restaurant menu: appetizers, salads, soups, entrée, dessert, drinks,… Library of congress (LIC) system classifies books according to a standard scheme Injuries and diseases classification is physicians and healthcare workers Classification of all living things: eg., Home Sapiens (genus, species) Classification very large application in automobile domain from services (classes), parts (classes), incidents (classes) etc. 7/30/2019

Categories of classification algorithms With respect to underlying technique two broad categories: Statistical algorithms Regression for forecasting Bayes classifier depicts the dependency of the various attributes of the classification problem. Structural algorithms Rule-based algorithms: if-else, decision trees Distance-based algorithm: similarity, nearest neighbor Neural networks 7/30/2019

Classifiers 7/30/2019

Advantages and Disadvantages Decision tree, simple and powerful, works well for discrete (0,1- yes-no)rules; Neural net: black box approach, hard to interpret results Distance-based ones work well for low- dimensionality space .. 7/30/2019

This decision tree hangs in the ER of Cooke County hospital, Chicago, IL 7/30/2019

Naïve Bayes Naïve Bayes classifier One of the most celebrated and well-known classification algorithms of all time. Probabilistic algorithm Typically applied and works well with the assumption of independent attributes, but also found to work well even with some dependencies. Was discovered centuries ago but is heavily used today in many predictive analytic applications 7/30/2019

Life Cycle of a classifier: training, testing and production 7/30/2019

Training Stage Provide classifier with data points for which we have already assigned an appropriate class. Purpose of this stage is to determine the parameters 7/30/2019

Validation Stage Testing or validation stage we validate the classifier to ensure credibility for the results. Primary goal of this stage is to determine the classification errors. Quality of the results should be evaluated using various metrics Training and testing stages may be repeated several times before a classifier transitions to the production stage. We could evaluate several types of classifiers and pick one or combine all classifiers into a metaclassifier scheme. 7/30/2019

Production stage The classifier(s) is used here in a live production system. It is possible to enhance the production results by allowing human-in-the-loop feedback. The three steps are repeated as we get more data from the production system. 7/30/2019

Bayesian Inference H – hypothesis E – evidence   H – hypothesis E – evidence Prior = probability of the evidence P(E/H) Likelihood = P(H)/P(E) Posterior = Probability of H given E; P(H/E) 7/30/2019

Naïve Bayes Example Reference: http://en.wikipedia.org/wiki/Bayes_Theorem Suppose there is a school with 60% boys and 40% girls as its students. The female students wear trousers or skirts in equal numbers; the boys all wear trousers. An observer sees a (random) student from a distance, and what the observer can see is that this student is wearing trousers. What is the probability this student is a girl? The correct answer can be computed using Bayes' theorem. 7/30/2019

Discussion The event A is that the student observed is a girl, and the event B is that the student observed is wearing trousers. To compute P(A|B), we first need to know: P(A), or the probability that the student is a girl regardless of any other information. Since the observer sees a random student, meaning that all students have the same probability of being observed, and the fraction of girls among the students is 40%, this probability equals 0.4. P(B|A), or the probability of the student wearing trousers given that the student is a girl. Since they are as likely to wear skirts as trousers, this is 0.5. P(B), or the probability of a (randomly selected) student wearing trousers regardless of any other information. Since half of the girls and all of the boys are wearing trousers, this is 0.5×0.4 + 1.0×0.6 = 0.8. Given all this information, the probability of the observer having spotted a girl given that the observed student is wearing trousers can be computed by substituting these values in the formula: P(A|B) = 𝑃(𝐵|𝐴)∗𝑃(𝐴) 𝑃(𝐵) = 0.5 ∗ 0.4 0.8 = 0.25 7/30/2019

Intuition Here is a its derivation from first principles of probabilities: P(A|B) = P(A&B)/P(B) P(B|A) = P(A&B)/P(A)P(B|A) P(A) =P(A&B) P(A|B) = P(B|A)P(A) P(B) Now lets look a very common application of Bayes, for supervised learning in classification, spam filtering 7/30/2019

Lets Review A rare disease where 1% We have highly sensitive and specific test that is 99% positive for sick patients 99% negative for non-sick If a patients test positive, what is probability that he/she is sick? Approach: patient is sick : sick, tests positive + P(sick/+) = P(+/sick) P(sick)/P(+)= 0.99*0.01/(0.99*0.01+0.99*0.01) = 0.099/2*(0.099) = ½ = 0.5 7/30/2019

Classification Training set  design a model Test set  validate the model Classify data set using the model Goal of classification: to label the items in the set to one of the given/known classes For spam filtering it is binary class: spam or nit spam(ham) 7/30/2019

Why not use methods we discussed earlier? Linear regression is about continuous variables, not binary class K-nn cannot accommodate multi-features: curse of dimensionality: 1 distinct word 1 feature 10000 words 10000 features! Then what can we use? Naïve Bayes 7/30/2019

Spam Filter for individual words Classifying mail into spam and not spam: binary classification Lets say if we get a mail with --- you have won a “lottery” right away you know it is a spam. We will assume that is if a word qualifies to be a spam then the email is a spam… P(spam|word) = P(word|spam)P(spam) P(word) 7/30/2019

Sample data Enron data set Enron employee emails A small subset chosen for EDA 1500 spam, 3672 ham Test word is “meeting”…that is, your goal is label a email with word “meeting” as spam or ham (not spam) Run an simple shell script and find out that 16 “meeting”s in spam, 153 “meetings” in ham Right away what is your intuition? Now prove it using Bayes 7/30/2019

Further discussion Lets call good emails “ham” P(ham) = 1- P(spam) P(word) = P(word|spam)P(spam) + P(word|ham)P(ham) 7/30/2019

Calculations P(spam) = 1500/(1500+3672) = 0.29 P(ham) = 0.71 P(meeting|spam) = 16/1500= 0.0106 P(meeting|ham) = 153/3672 = 0.0416 P(meeting) = P(meeting|spam)P(spam) + P(meeting|ham)P(ham) = 0.0106 *0.29 + 0.0416+0.71= 0.03261 P(spam|meeting) = P(meeting|spam)*P(spam)/P(meeting) = 0.0106*0.29/0.03261 = 0.094  9.4% 7/30/2019

Summary Learn Naïve Bayes Rule Application to spam filtering in emails Work the example/understand the example discussed in class: disease detection, a spam filter.. Possible question problem statement  classification model using Naïve Bayes If you have time, summer reading, https://ai.stanford.edu/~ang/papers/nips01- discriminativegenerative.pdf This is a comparison of Bayesian and logistic regression: we’ll study logistic regression next. 7/30/2019