Classification Vikram Pudi IIIT Hyderabad.

Slides:



Advertisements
Similar presentations
Let X 1, X 2,..., X n be a set of independent random variables having a common distribution, and let E[ X i ] = . then, with probability 1 Strong law.
Advertisements

Evaluating Classifiers
Acknowledgement: Thanks to Professor Pagano
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Statistics review of basic probability and statistics.
Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.
Evaluation.
Chapter 5 Basic Probability Distributions
Class notes for ISE 201 San Jose State University
1 MF-852 Financial Econometrics Lecture 4 Probability Distributions and Intro. to Hypothesis Tests Roy J. Epstein Fall 2003.
Descriptive statistics Experiment  Data  Sample Statistics Sample mean Sample variance Normalize sample variance by N-1 Standard deviation goes as square-root.
Evaluating Classifiers Lecture 2 Instructor: Max Welling Read chapter 5.
Data Mining with Naïve Bayesian Methods
Chapter 6 Continuous Random Variables and Probability Distributions
Evaluation.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Machine Learning CMPT 726 Simon Fraser University
Chapter 5 Probability Distributions
A random variable that has the following pmf is said to be a binomial random variable with parameters n, p The Binomial random variable.
Introduction to Probability and Statistics Chapter 7 Sampling Distributions.
Statistical Inference Lab Three. Bernoulli to Normal Through Binomial One flip Fair coin Heads Tails Random Variable: k, # of heads p=0.5 1-p=0.5 For.
Statistical inference Population - collection of all subjects or objects of interest (not necessarily people) Sample - subset of the population used to.
Discrete and Continuous Distributions G. V. Narayanan.
Normal and Sampling Distributions A normal distribution is uniquely determined by its mean, , and variance,  2 The random variable Z = (X-  /  is.
Prof. SankarReview of Random Process1 Probability Sample Space (S) –Collection of all possible outcomes of a random experiment Sample Point –Each outcome.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall 報告人:黃子齊
Data Mining Vikram Pudi IIIT Hyderabad.
Binomial Distributions Calculating the Probability of Success.
Probability distributions. Example Variable G denotes the population in which a mouse belongs G=1 : mouse belongs to population 1 G=2 : mouse belongs.
Statistics for Data Miners: Part I (continued) S.T. Balke.
Random Sampling, Point Estimation and Maximum Likelihood.
 A probability function is a function which assigns probabilities to the values of a random variable.  Individual probability values may be denoted by.
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
Binomial Experiment A binomial experiment (also known as a Bernoulli trial) is a statistical experiment that has the following properties:
Convergence in Distribution
King Saud University Women Students
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Classification Techniques: Bayesian Classification
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
1 Since everything is a reflection of our minds, everything can be changed by our minds.
MATH 4030 – 4B CONTINUOUS RANDOM VARIABLES Density Function PDF and CDF Mean and Variance Uniform Distribution Normal Distribution.
CpSc 881: Machine Learning Evaluating Hypotheses.
Classification Vikram Pudi IIIT Hyderabad.
The final exam solutions. Part I, #1, Central limit theorem Let X1,X2, …, Xn be a sequence of i.i.d. random variables each having mean μ and variance.
Using the Tables for the standard normal distribution.
INTRODUCTORY MATHEMATICAL ANALYSIS For Business, Economics, and the Life and Social Sciences  2011 Pearson Education, Inc. Chapter 16 Continuous Random.
Week 121 Law of Large Numbers Toss a coin n times. Suppose X i ’s are Bernoulli random variables with p = ½ and E(X i ) = ½. The proportion of heads is.
By Satyadhar Joshi. Content  Probability Spaces  Bernoulli's Trial  Random Variables a. Expectation variance and standard deviation b. The Normal Distribution.
Probability Distributions, Discrete Random Variables
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Chapter 4. Random Variables - 3
Machine Learning 5. Parametric Methods.
Chapter 5 Joint Probability Distributions and Random Samples  Jointly Distributed Random Variables.2 - Expected Values, Covariance, and Correlation.3.
Chapter 6 Large Random Samples Weiqi Luo ( 骆伟祺 ) School of Data & Computer Science Sun Yat-Sen University :
1 Decision Trees. 2 OutlookTemp (  F) Humidity (%) Windy?Class sunny7570true play sunny8090true don’t play sunny85 false don’t play sunny7295false don’t.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.
Sampling and Sampling Distributions
Supplemental Lecture Notes
Chapter 5 Joint Probability Distributions and Random Samples
Evaluating Classifiers
Machine Learning Techniques for Data Mining
ASV Chapters 1 - Sample Spaces and Probabilities
Econometric Models The most basic econometric model consists of a relationship between two variables which is disturbed by a random error. We need to use.
Using the Tables for the standard normal distribution
Continuous Random Variable Normal Distribution
Chapter 7 The Normal Distribution and Its Applications
Discrete Random Variables: Basics
Discrete Random Variables: Basics
Discrete Random Variables: Basics
Presentation transcript:

Classification Vikram Pudi IIIT Hyderabad

1 Talk Outline Introduction Classification Problem Applications Metrics Combining classifiers Classification Techniques

2 The Classification Problem OutlookTemp (  F) Humidity (%) Windy?Class sunny7570true play sunny8090true don’t play sunny85 false don’t play sunny7295false don’t play sunny6970false play overcast7290true play overcast8378false play overcast6465true play overcast8175false play rain7180true don’t play rain6570true don’t play rain7580false play rain6880false play rain7096false play Model relationship between class labels and attributes sunny7769true? rain7376false?  Assign class labels to new data with unknown labels e.g. outlook = overcast  class = play Play Outside?

3 Applications Text classification Classify s into spam / non-spam Classify web-pages into yahoo-type hierarchy NLP Problems Tagging: Classify words into verbs, nouns, etc. Risk management, Fraud detection, Computer intrusion detection Given the properties of a transaction (items purchased, amount, location, customer profile, etc.) Determine if it is a fraud Machine learning / pattern recognition applications Vision Speech recognition etc. All of science & knowledge is about predicting future in terms of past So classification is a very fundamental problem with ultra-wide scope of applications

4 Metrics 1. accuracy 2. classification time per new record 3. training time 4. main memory usage (during classification) 5. model size

5 Accuracy Measure Prediction is just like tossing a coin (random variable X) “Head” is “success” in classification; X = 1 “tail” is “error”; X = 0 X is actually a mapping: {“success”: 1, “error” : 0} In statistics, a succession of independent events like this is called a bernoulli process Accuracy = P(X = 1) = p mean value =  = E[X] = p  1 + (1-p)  0 = p variance =  2 = E[(X-  ) 2 ] = p (1–p) Confidence intervals: Instead of saying accuracy = 85%, we want to say: accuracy  [83, 87] with a confidence of 95%

6 Binomial Distribution Treat each classified record as a bernoulli trial If there are n records, there are n independent and identically distributed (iid) bernoulli trials, X i, i = 1,…,n Then, the random variable X =  i=1,…,n X i is said to follow a binomial distribution P(X = k) = n C k p k (1-p) n-k Problem: Difficult to compute for large n

7 Normal Distribution Continuous distribution with parameters  (mean),  2 (variance) Probability density: f(x) = (1/  (2  2 )) exp(-(x-  ) 2 / (2  2 )) Central limit theorem: Under certain conditions, the distribution of the sum of a large number of iid random variables is approximately normal A binomial distribution with parameters n and p is approximately normal for large n and p not too close to 1 or 0 The approximating normal distribution has mean μ = np and standard deviation  2 = (n p (1 - p))

8 Confidence Intervals E.g. P[–1.65  X  1.65] = 1 – 2  P[X  1.65] = 90% To use this we have to transform our random variable to have mean = 0 and variance = 1 Subtract mean from X and divide by standard deviation Pr[X  z] z 0.1% %2.58 1%2.33 5% % % %0.25 – Normal distribution with mean = 0 and variance = 1