Classification.. continued. Prediction and Classification Last week we discussed the classification problem.. – Used the Naïve Bayes Method Today..we.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

STATISTICS Random Variables and Distribution Functions
BUS 220: ELEMENTARY STATISTICS
CS CS1512 Foundations of Computing Science 2 Lecture 23 Probability and statistics (4) © J.
Class 6: Hypothesis testing and confidence intervals
How to Measure Uncertainty With Probability
Stats for Engineers: Lecture 3. Conditional probability Suppose there are three cards: A red card that is red on both sides, A white card that is white.
Discrete Random Variables and Probability Distributions
From W1-S16. Node failure The probability that at least one node failing is: f= 1 – (1-p) n When n =1; then f =p Suppose p= but n=10000, then: f.
Classification Classification Examples
Commonly Used Distributions
Chapter 6 Continuous Random Variables and Probability Distributions
Evaluating Classifiers
Learning Algorithm Evaluation
Acknowledgement: Thanks to Professor Pagano
ฟังก์ชั่นการแจกแจงความน่าจะเป็น แบบไม่ต่อเนื่อง Discrete Probability Distributions.
Discrete Probability Distributions
Sections 4.1 and 4.2 Overview Random Variables. PROBABILITY DISTRIBUTIONS This chapter will deal with the construction of probability distributions by.
Random Variables and Distributions COMP5318 Knowledge Discovery and Data Mining.
Evaluation.
Assuming normally distributed data! Naïve Bayes Classifier.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Classification and risk prediction
Probability Distributions Finite Random Variables.
Evaluating Classifiers Lecture 2 Instructor: Max Welling Read chapter 5.
Chapter 6 Continuous Random Variables and Probability Distributions
Evaluation.
1 Engineering Computation Part 5. 2 Some Concepts Previous to Probability RANDOM EXPERIMENT A random experiment or trial can be thought of as any activity.
Chapter 5 Continuous Random Variables and Probability Distributions
Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 2000 LIND MASON MARCHAL 1-1 Chapter Five Discrete Probability Distributions GOALS When you have completed.
Chapter 4 Continuous Random Variables and Probability Distributions
Computer Vision Lecture 8 Performance Evaluation.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 4 and 5 Probability and Discrete Random Variables.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Discrete Random Variables Chapter 4.
6- 1 Chapter Six McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Class 3 Binomial Random Variables Continuous Random Variables Standard Normal Distributions.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall 報告人:黃子齊
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 8 Continuous.
1 9/23/2015 MATH 224 – Discrete Mathematics Basic finite probability is given by the formula, where |E| is the number of events and |S| is the total number.
Probability The definition – probability of an Event Applies only to the special case when 1.The sample space has a finite no.of outcomes, and 2.Each.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 34 Chapter 11 Section 1 Random Variables.
Random Variables Numerical Quantities whose values are determine by the outcome of a random experiment.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Logistic Regression Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata September 1, 2014.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 5-1 Chapter 5 Some Important Discrete Probability Distributions Basic Business Statistics.
Random Variables. A random variable X is a real valued function defined on the sample space, X : S  R. The set { s  S : X ( s )  [ a, b ] is an event}.
 A probability function is a function which assigns probabilities to the values of a random variable.  Individual probability values may be denoted by.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 5 Discrete Random Variables.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Probability Theory Modelling random phenomena. Permutations the number of ways that you can order n objects is: n! = n(n-1)(n-2)(n-3)…(3)(2)(1) Definition:
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 5 Discrete Random Variables.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 5-1 Chapter 5 Some Important Discrete Probability Distributions Business Statistics,
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Probability Distributions  A variable (A, B, x, y, etc.) can take any of a specified set of values.  When the value of a variable is the outcome of a.
INTRODUCTION TO ECONOMIC STATISTICS Topic 5 Discrete Random Variables These slides are copyright © 2010 by Tavis Barr. This work is licensed under a Creative.
APPENDIX A: A REVIEW OF SOME STATISTICAL CONCEPTS
Chapter Six McGraw-Hill/Irwin
Evaluating Classifiers
Discrete and Continuous Random Variables
Evaluating Classifiers
Probability distributions
Lecture 11: Binomial and Poisson Distributions
Introduction to Probability and Statistics
Discrete Random Variables: Basics
Discrete Random Variables: Basics
Discrete Random Variables: Basics
Presentation transcript:

Classification.. continued

Prediction and Classification Last week we discussed the classification problem.. – Used the Naïve Bayes Method Today..we will dive into more details.. But first how do we evaluate classifier

Abstract Binary Classification Problem Given n data samples where x i is a data vector and y i is label {-1,1}. Aim is to learn a function Such that f is “accurate” on unseen data. [ill-specified as defined]

Algorithms to Learn Classifier We can use an algorithm A to learn the function f: X  Y Then we write f as f A One example of A is Naïve Bayes. Other examples {Logistic Regression, Neural Networks, Support Vector Machines, Decision Trees, Random Forests,….}

Training vs. Test Data In practice to take care of the “unseen” part…we split the data into training and test sets We learn f A on the training set using an algorithm A The learned function f A is then evaluated on the test set.

Example Suppose we learn a function F on training set. Our test set consists of four data points (z1,1),(z2,- 1),(z3,1),(z4,-1). We apply F on the four data points (without labels) and we get F(z1)=1, F(z2)=1,F(z3)=-1 and F(z4) = -1. Then F correctly classified z1 and z4 but incorrectly classified z2 and z3.

Confusion Matrix Actual Label (1)Actual Label (-1) Predicted Label (1)True Positive (N1)False Positive (N2) Predicted Label (-1)False Negatives (N3)True Negatives (N4) Label 1 is called Positive, Label -1 is called Negative Let the number of test samples be N N = N1 + N2 + N3 + N4. True Positive Rate (TPR) = N1/(N1+N3) True Negative Rate (TNR) = N4/(N4+N2) False Positive Rate (FPR) = N2/(N2+N4) False Negative Rate (FNR) = N3/(N1+N3) Accuracy = (N1+N4)/(N1+N2+N3+N4) Precision = N1/(N1+N2)Recall = N1/(N1+N3)

Example Actual Label (1)Actual Label (-1) Predicted Label (1)103 Predicted Label (-1)220 TPR = 5/6; TNR = 20/23; FPR = 3/23; FNR = 2/12; Accuracy = 30/35 Precision = 10/13 and Recall = 10/12

ROC (Receiver Operating Characteristic) Curves Generally a learning algorithm A will return a real number…but what we want is a label {1 or -1} We can apply a threshold..T A T= True Label A T= True Label TPR = 3/4 FPR = 2/5 TPR = 2/4 FPR = 2/5

ROC Curve An ROC Curve is the plot where the x-axis is FPR, the y-axis is the TPR and for each threshold t, the point on the plot represents the pair (FPR(t), TPR(t)) Lets Look at the Wikipedia ROC EntryWikipedia ROC Entry

Discussion.. If F: Symptoms  {Disease, No-Disease} – Higher Recall or Precision ? – What is the relative cost of a mis-diagnosis (and which way) If F: Banner Ad  {Click, No-Click} – Higher Precision means more revenue?

Random Variables A r.v. is a numerical quantity associated with events in an experiment. Suppose we roll two dice. Let X = k be the sum of the two faces. X can take values ranging from {2….12}. P(X=12) = 1/36. Why ? – Event associated with X=12 is {(6,6)} P(X=7) = 6/36 = 1/6 – Associated Event: {(1,6),(6,1),(2,5),(5,2),(3,4),(4,3)}

Random Variable A random variable X can take values in a set which is: – discrete and finite. Lets toss a coin and X = 1 if it’s a head and X=0 if it’s a tail. X is random variable – discrete and infinite (countable) Let X be the number of accidents in Sydney in a day.. Then X = 0,1,2,….. – Infinite (uncountable) Let X be the height of a Sydney-sider. – X = 150, , ,……

Random Variable Properties Let X be a discrete valued random variable taking values in a set S. The Expected (average) Value of X, E(X) is The Variance is

Examples Let X be a random variable which takes values 1 with probability p and 0 with probability 1-p. Then

Examples Let X be a random variable which denotes the number of “spam s” in a batch of n s. Assuming the probability of spam is p. X={0,1,2,3,4,5} X is a r.v. which follows a binomial distribution with parameters (n,p)… X ~ Binomial(n,p) – E(X) = np ; Var(X) = np(1-p)

Examples Let X be a random variable which denotes the number of tcp packets that arrive in a unit time. Then X can be modeled to follow a Poisson distribution.. E(X) = Var(X) = λ

Continuous Distribution Ofcourse the most common continuous distribution is the Normal/Gaussian distribution… denoted

How to use r.v. for classification To use r.v. in classification…we have to make an assumption. – For example..Sepal Length follows a Normal Distribution. – Is this a good/reasonable assumption. Then we use data to estimate the parameters of the distribution.. – The parameters of a Normal distribution are the mean and the variance (square of standard deviation). – For the moment we can just use Matlab/program to do that… – Once we have the parameters we can use the distribution to estimate the “probability” of Sepal Length taking a new value..

Fitting Distributions..Examples 0,1,0,1,0,0 – Assume data from a Binomial distribution with 6 trials and 2 successes In Matlab:>> binofit(2,6) = ,20,5,3,3,100 – Assume data is from a Poisson distribution – X=[ ]; – Poissfit(X); – Ans: What is happening ? We are just taking sample averages. The more data we have the more reliable these estimates become.. Suppose we take Sepal Length…data vector x >> [mean,std] = normfit(x); >> ans: mean = 5.8, std=0.81

Return to the Iris Example We will redo the Iris Classification Example..but now will use “continuous” values for the attributes…