Machine Learning. k-Nearest Neighbor Classifiers.

Slides:



Advertisements
Similar presentations
Text Categorization.
Advertisements

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
INC 551 Artificial Intelligence Lecture 11 Machine Learning (Continue)
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 13, Slide 1 Chapter 13 From Randomness to Probability.
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
Bayes Rule The product rule gives us two ways to factor a joint probability: Therefore, Why is this useful? –Can get diagnostic probability P(Cavity |
What is Statistical Modeling
Probabilistic inference
Bayes Rule How is this rule derived? Using Bayes rule for probabilistic inference: –P(Cause | Evidence): diagnostic probability –P(Evidence | Cause): causal.
Classification and risk prediction
Chapter 7 – K-Nearest-Neighbor
Chapter 4 Basic Probability
Naïve Bayes Classification Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 14, 2014.
Probability (cont.). Assigning Probabilities A probability is a value between 0 and 1 and is written either as a fraction or as a proportion. For the.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Chapter 4 Basic Probability
Review: Probability Random variables, events Axioms of probability
Advanced Multimedia Text Classification Tamara Berg.
Rainbow Tool Kit Matt Perry Global Information Systems Spring 2003.
6/28/2014 CSE651C, B. Ramamurthy1.  Classification is placing things where they belong  Why? To learn from classification  To discover patterns  To.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Chapter 8 Probability Section R Review. 2 Barnett/Ziegler/Byleen Finite Mathematics 12e Review for Chapter 8 Important Terms, Symbols, Concepts  8.1.
CHAPTER 5 Probability: Review of Basic Concepts
Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 4 Probability.
Probability Theory Pertemuan 4 Matakuliah: F Analisis Kuantitatif Tahun: 2009.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Classification Techniques: Bayesian Classification
Review: Probability Random variables, events Axioms of probability Atomic events Joint and marginal probability distributions Conditional probability distributions.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Chapter 6 – Three Simple Classification Methods © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
4-3 Addition Rule This section presents the addition rule as a device for finding probabilities that can be expressed as P(A or B), the probability that.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 4 Probability.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
KNN & Naïve Bayes Hongning Wang
Statistics for Managers 5th Edition
. Chapter 14 From Randomness to Probability. Slide Dealing with Random Phenomena A is a situation in which we know what outcomes could happen, but.
Biostatistics Class 2 Probability 2/1/2000.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Applied statistics Usman Roshan.
Virtual University of Pakistan
Naïve Bayes CSE651C, B. Ramamurthy 6/28/2014.
Bayesian inference, Naïve Bayes model
From Randomness to Probability
A Survey of Probability Concepts
Bayesian and Markov Test
A Survey of Probability Concepts
Chapter 4 Probability.
Chapter 4 Basic Probability.
A Survey of Probability Concepts
From Randomness to Probability
Lecture 15: Text Classification & Naive Bayes
A Survey of Probability Concepts
From Randomness to Probability
From Randomness to Probability
Classification Techniques: Bayesian Classification
Naïve Bayes CSE651 6/7/2014.
Hidden Markov Models Part 2: Algorithms
Chapter 4 Basic Probability.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Honors Statistics From Randomness to Probability
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Elementary Statistics 8th Edition
Naïve Bayes CSE487/587 Spring2017 4/4/2019.
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Chapter 4, Doing Data Science
Basic Probability Chapter Goal:
Probability.
Presentation transcript:

Machine Learning

k-Nearest Neighbor Classifiers

1-Nearest Neighbor Classifier Test Examples (What class to assign this?)

1-Nearest Neighbor

2-Nearest Neighbor

3-Nearest Neighbor

8-Nearest Neighbor

Controlling COMPLEXITY in k-NN

Measuring similarity with distance Locating the tomato's nearest neighbors requires a distance function, or a formula that measures the similarity between the two instances. There are many different ways to calculate distance. Traditionally, the k-NN algorithm uses Euclidean distance, which is the distance one would measure if it were possible to use a ruler to connect two points, illustrated in the previous figure by the dotted lines connecting the tomato to its neighbors.

Euclidean distance Euclidean distance is specified by the following formula, where p and q are the examples to be compared, each having n features. The term p 1 refers to the value of the first feature of example p, while q 1 refers to the value of the first feature of example q:

Application of KNN Which Class Tomoto belongs to given the feature values: Tomato (sweetness = 6, crunchiness = 4),

K = 3, 5, 7, 9

K = 11,13,15,17

Bayesian Classifiers

Understanding probability The probability of an event is estimated from the observed data by dividing the number of trials in which the event occurred by the total number of trials For instance, if it rained 3 out of 10 days with similar conditions as today, the probability of rain today can be estimated as 3 / 10 = 0.30 or 30 percent. Similarly, if 10 out of 50 prior messages were spam, then the probability of any incoming message being spam can be estimated as 10 / 50 = 0.20 or 20 percent. Note: The probability of all the possible outcomes of a trial must always sum to 1 For example, given the value P(spam) = 0.20, we can calculate P(ham) = 1 – 0.20 = 0.80

Understanding probability cont.. Because an event cannot simultaneously happen and not happen, an event is always mutually exclusive and exhaustive with its complement The complement of event A is typically denoted Ac or A'. Additionally, the shorthand notation P(¬A) can used to denote the probability of event A not occurring, as in P(¬spam) = This notation is equivalent to P(Ac).

Understanding joint probability Often, we are interested in monitoring several nonmutually exclusive events for the same trial Spam 20% Lotter y 5% Ham 80% All s

Lottery without appearin g in Spam Lottery appearin g in Ham Lottery appearing in Spam Understanding joint probability Estimate the probability that both P(spam) and P(Spam) occur, which can be written as P(spam ∩ Lottery). the notation A ∩ B refers to the event in which both A and B occur.

Calculating P(spam ∩ Lottery) depends on the joint probability of the two events or how the probability of one event is related to the probability of the other. If the two events are totally unrelated, they are called independent events If P(spam) and P(Lottery) were independent, we could easily calculate P(spam ∩ Lottery), the probability of both events happening at the same time. Because 20 percent of all the messages are spam, and 5 percent of all the s contain the word Lottery, we could assume that 1 percent of all messages are spam with the term Lottery. More generally, for independent events A and B, the probability of both happening can be expressed as P(A ∩ B) = P(A) * P(B) * 0.20 = 0.01

Bayes Rule  Bayes Rule : The most important Equation in ML!! Posterior Probability (Probability of class AFTER seeing the data) Class PriorData Likelihood given Class Data Prior (Marginal)

Naïve Bayes Classifier

Conditional Independence  Simple Independence between two variables:  Class Conditional Independence assumption:

Naïve Bayes Classifier Conditional Independence among variables given Classes !  Simplifying assumption  Baseline model especially when large number of features  Taking log and ignoring denominator:

Naïve Bayes Classifier for Categorical Valued Variables

Let’s Naïve Bayes! #EXMPLSCOLORSHAPELIKE 20RedSquareY 10RedCircleY 10RedTriangleN 10GreenSquareN 5GreenCircleY 5GreenTriangleN 10BlueSquareN 10BlueCircleN 20BlueTriangleY

Parameter Estimation  What / How many Parameters?  Class Priors:  Conditional Probabilities:

Naïve Bayes Classifier for Text Classifier

Text Classification Example  Doc1 = {buy two shirts get one shirt half off}  Doc2 = {get a free watch. send your contact details now}  Doc3 = {your flight to chennai is delayed by two hours}  Doc4 = {you have three tweets Four Class Problem :  Spam,  Promotions,  Social,  Main

Bag-of-Words Representation  Structured (e.g. Multivariate) data – fixed number of features  Unstructured (e.g. Text) data  arbitrary length documents,  high dimensional feature space (many words in vocabulary),  Sparse (small fraction of vocabulary words present in a doc.)  Bag-of-Words Representation:  Ignore Sequential order of words  Represent as a Weighted-Set – Term Frequency of each term

Naïve Bayes Classifier with BoW  Make an “ independence assumption ” about words | class

Naïve Bayes Text Classifiers  Log Likelihood of document given class.  Parameters in Naïve Bayes Text classifiers:

 Likelihood of a word given class. For each word, each class.  Estimating these parameters from data: Naïve Bayes Parameters

Bayesian Classifier Multi-variate real-valued data

Bayes Rule Posterior Probability (Probability of class AFTER seeing the data) Class PriorData Likelihood given Class Data Prior (Marginal)

Simple Bayesian Classifier

Controlling COMPLEXITY