Naïve Bayes Classifier. Red = 2.125 Yellow = 6.143 Mass = 134.32 Volume = 24.21 Apple Sensors, scales, etc… 8/29/03Bayesian Classifier2.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
CS 484 – Artificial Intelligence1 Announcements Homework 8 due today, November 13 ½ to 1 page description of final project due Thursday, November 15 Current.
PROBABILISTIC MODELS David Kauchak CS451 – Fall 2013.
Probabilistic Generative Models Rong Jin. Probabilistic Generative Model Classify instance x into one of K classes Class prior Density function for class.
What is Statistical Modeling
Assuming normally distributed data! Naïve Bayes Classifier.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Lecture 5: Learning models using EM
Introduction to Bayesian Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
Sample Midterm question. Sue want to build a model to predict movie ratings. She has a matrix of data, where for M movies and U users she has collected.
Review. 2 Statistical modeling  “Opposite” of 1R: use all the attributes  Two assumptions: Attributes are  equally important  statistically independent.
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
1er. Escuela Red ProTIC - Tandil, de Abril, Bayesian Learning 5.1 Introduction –Bayesian learning algorithms calculate explicit probabilities.
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Computer vision: models, learning and inference
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
Thanks to Nir Friedman, HU
Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Machine Learning Queens College Lecture 3: Probability and Statistics.
Bayesian Networks. Male brain wiring Female brain wiring.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 2012
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
Text Classification, Active/Interactive learning.
Naive Bayes Classifier
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Computational Intelligence: Methods and Applications Lecture 12 Bayesian decisions: foundation of learning Włodzisław Duch Dept. of Informatics, UMK Google:
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
CS464 Introduction to Machine Learning1 Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
1 E. Fatemizadeh Statistical Pattern Recognition.
Classification Techniques: Bayesian Classification
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
CS Bayesian Learning1 Bayesian Learning A powerful and growing approach in machine learning We use it in our own decision making all the time – You.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Data Mining – Algorithms: Naïve Bayes Chapter 4, Section 4.2.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten’s and E. Frank’s “Data Mining” and Jeremy Wyatt and others.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Bayesian Learning Evgueni Smirnov Overview Bayesian Theorem Maximum A Posteriori Hypothesis Naïve Bayes Classifier Learning Text Classifiers.
Introduction to Classifiers Fujinaga. Bayes (optimal) Classifier (1) A priori probabilities: and Decision rule: given and decide if and probability of.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Text Classification and Naïve Bayes Formalizing the Naïve Bayes Classifier.
Bayesian Learning Reading: Tom Mitchell, “Generative and discriminative classifiers: Naive Bayes and logistic regression”, Sections 1-2. (Linked from.
Naive Bayes Classifier
Data Science Algorithms: The Basic Methods
Comp328 tutorial 3 Kai Zhang
Lecture 15: Text Classification & Naive Bayes
Data Mining Lecture 11.
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Outline Parameter estimation – continued Non-parametric methods.
Classification Techniques: Bayesian Classification
Classifiers Fujinaga.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
EE513 Audio Signals and Systems
Classifiers Fujinaga.
Parametric Methods Berlin Chen, 2005 References:
Mathematical Foundations of BME
Naive Bayes Classifier
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Presentation transcript:

Naïve Bayes Classifier

Red = Yellow = Mass = Volume = Apple Sensors, scales, etc… 8/29/03Bayesian Classifier2

 Let’s look at one dimension 8/29/03Bayesian Classifier3

 What if we wanted to ask the question “what is the probability that some fruit with a given redness value is an apple?” 8/29/03Bayesian Classifier4 Could we just look at how far away it is from the apple peak? Is it the highest PDF above the X-value in question?

 If a fruit has a redness of 4.05 do we know the probability that it’s an apple?  What do we know? 8/29/03Bayesian Classifier5 If it is a histogram of counts then it straight forward Probability it’s an apple 28.57% Probability it’s an orange 71.43% Getting the probability is simple If it is a histogram of counts then it straight forward Probability it’s an apple 28.57% Probability it’s an orange 71.43% Getting the probability is simple

 Probability density function  Continuous  Probability not count  Might be tempted to use the same approach 8/29/03Bayesian Classifier6 Parametric (  and  parameters) vs. non-parametric

 What if had trillion oranges and only 100 apples  Might be the most common apple and have a higher value at 4.05 than oranges even though the universe would have way more oranges at that value 8/29/03Bayesian Classifier7

 2506 apples  2486 oranges  If a fruit has a redness of 4.05 do we know the probability that it’s an apple if we don’t have specific counts at 4.05? 8/29/03Bayesian Classifier8

8/29/03Bayesian Classifier9  Above from the book  h is hypothesis, D is training Data Does this make sense?

 2506 apples  2486 oranges  Probability that redness would be 4.05 if know an apple  About 10/2506  P(apple)?  2506/( )  P(redness=4.05)  About (10+25)/( ) 8/29/03Bayesian Classifier10 ?

 Whether have counts or PDF  How do we classify?  Simply find the most probable class 8/29/03Bayesian Classifier11

 I think of the ratio of P(h) to P(D) as an adjustment to the easily determined P(D|h) in order to account for differences in sample size 8/29/03Bayesian Classifier12 Prior Probabilities or Priors Posterior Probability

 Maximum a posteriori hypothesis (MAP)  ä-( ˌ )pō- ˌ stir-ē- ˈ o ̇ r-ē  Relating to or derived by reasoning from observed facts; inductive  A priori: relating to or derived by reasoning from self-evident propositions; deductive  Approach: Brute-force MAP learning algorithm 8/29/03Bayesian Classifier13

Mass (normalized) Red Intensity (normalized) More dimensions can be helpful 8/29/03Bayesian Classifier14 Linearly Separable

 Color (red and yellow) says apple but mass and volume say orange?  Take a vote? 8/29/03Bayesian Classifier15 How handle multiple dimensions?

 Assume each dimension is independent (doesn’t co-vary with any other dimension)  Can use the product rule  The probability that a fruit is an apple given a set of measurements (dimensions) is: 8/29/03Bayesian Classifier16

 Known as a Naïve Bayes Classifier  Where v j is class and a i is an attribute  Derivation 8/29/03Bayesian Classifier17

8/29/03Bayesian Classifier18  You wish to classify an instance with the following attributes   The first column is redness, then yellowness, followed by mass then volume  The training data has in the redness histogram bin in which the instance falls  0 apples, 0 peaches, 9 oranges, and 22 lemons  In the bin for yellowness there are  235, 262, 263, and 239  In the bin for mass there are  106, 176, 143, and 239  In the bin for vol there are  What 3, 57, 7, and 184 What are each of the probabilities that it is an Apple Peach Orange Lemon What are each of the probabilities that it is an Apple Peach Orange Lemon

8/29/03Bayesian Classifier19 RedYellowMassVol Apples peaches oranges lemons Total apples peaches oranges lemons

8/29/03Bayesian Classifier20  Is it really a zero percent chance that it’s an apple?  Are these really probabilities (hint: not equal to 1)?  What of the bin size? RedYellowMassVol apples peaches oranges lemons Total apples peaches oranges lemons

8/29/03Bayesian Classifier21

8/29/03Bayesian Classifier22  Do too many dimensions hurt? What if only some dimensions contribute to ability to classify? What would the other dimensions do to the probabilities?

8/29/03Bayesian Classifier23  With imagination and innovation can learn to classify many things you wouldn’t expect  What if you wanted to learn to classify documents, how might you go about it?

8/29/03Bayesian Classifier24  Learning to classify text  Collect all words in examples  Calculate P(v j ) and P(w k |v j )  Each instance will be a vector of size |vocabulary|  Classes (v’s) (category)  Each word (w) is a dimension

8/29/03Bayesian Classifier25  20 News groups  1000 training documents from each group  The groups were the classes  89% classification accuracy 89 out of every 100 times could tell which newsgroup a document came from

8/29/03Bayesian Classifier26  Rift Valley fever virus  Basically RNA (like DNA but with an extra oxygen – the D in DNA is deoxy)  Encapsulated in a protein sheath  Important protein involved in the encapsulation process  Nucleocapsid

8/29/03Bayesian Classifier27  SELEX (Systematic Evolution of Ligands by Exponential Enrichment)  Identify RNA segments that have a high affinity for nucleocapsid (aptamer vs. non-aptamer)

8/29/03Bayesian Classifier28  Each known aptamer was 30 nucleotides long  A 30 character string  4 nucleotides (ACGU)  What would the data look-like  How would we “bin” the data?

8/29/03Bayesian Classifier29  Have seen  Fruit example  Documents  RNA (nucleotides)  Which is best for Bayesian?

8/29/03Bayesian Classifier30

8/29/03Bayesian Classifier31  The brighter the spot, the greater the mRNA concentration

8/29/03Bayesian Classifier32  Thousands of genes (dimensions)  Many genes not affected (distributions for disease and normal same in that dimension) gene patient g1g1 g2g2 g3g3 …gngn disease p1p1 x 1,1 x 1,2 x 1,3 …x 1,n Y p2p2 x 2,1 x 2,2 x 2,3 …x 2,n N pmpm x m,1 x m,2 x m,3 …x m,n ?

8/29/03Bayesian Classifier33  Perhaps at good growth locations  pH  Average temperature  Average sunlight exposure  Salinity  Average length of day  What else?  What would the data look-like?

8/29/03Bayesian Classifier34

8/29/03Bayesian Classifier35

8/29/03Bayesian Classifier36

8/29/03Bayesian Classifier37