Discrimination and Classification

Slides:



Advertisements
Similar presentations
Discrimination and Classification. Discrimination Situation: We have two or more populations  1,  2, etc (possibly p-variate normal). The populations.
Advertisements

Pattern Recognition and Machine Learning
7.1Variable Notation.
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
POINT ESTIMATION AND INTERVAL ESTIMATION
Chapter 4: Linear Models for Classification
A Probabilistic Model for Classification of Multiple-Record Web Documents June Tang Yiu-Kai Ng.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
Machine Learning CMPT 726 Simon Fraser University
Basics of discriminant analysis
REVIEW Central Limit Theorem Central Limit Theoremand The t Distribution.
OMS 201 Review. Range The range of a data set is the difference between the largest and smallest data values. It is the simplest measure of dispersion.
Discrete Probability Distributions
July 3, A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.
Classification with several populations Presented by: Libin Zhou.
Bayesian Decision Theory Making Decisions Under uncertainty 1.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
Sampling Theory Determining the distribution of Sample statistics.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Statistical Decision Theory
Chapter 12 – Discriminant Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Chapter 8 Discriminant Analysis. 8.1 Introduction  Classification is an important issue in multivariate analysis and data mining.  Classification: classifies.
17-1 COMPLETE BUSINESS STATISTICS by AMIR D. ACZEL & JAYAVEL SOUNDERPANDIAN 6 th edition (SIE)
Linear Discriminant Analysis and Its Variations Abu Minhajuddin CSE 8331 Department of Statistical Science Southern Methodist University April 27, 2002.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Discrimination and Classification. Discrimination Situation: We have two or more populations  1,  2, etc (possibly p-variate normal). The populations.
Section 6.4 Inferences for Variances. Chi-square probability densities.
D/RS 1013 Discriminant Analysis. Discriminant Analysis Overview n multivariate extension of the one-way ANOVA n looks at differences between 2 or more.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Sampling Theory Determining the distribution of Sample statistics.
Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
STA 291 Spring 2010 Lecture 19 Dustin Lueker.
Lecture 2. Bayesian Decision Theory
Probability Distributions
Chapter 12 – Discriminant Analysis
Design Lecture: week3 HSTS212.
Chapter 3: Maximum-Likelihood Parameter Estimation
Performance Evaluation 02/15/17
Parameter Estimation 主講人:虞台文.
Introduction to Algebra
Discrimination and Classification
Combining Random Variables
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
AP Statistics: Chapter 7
Determining the distribution of Sample statistics
More about Posterior Distributions
Econometric Models The most basic econometric model consists of a relationship between two variables which is disturbed by a random error. We need to use.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Statistical NLP: Lecture 4
Pattern Recognition and Machine Learning
CHAPTER 10 Comparing Two Populations or Groups
Bayesian Classification
A graphical explanation
Mathematical Foundations of BME
The loss function, the normal equation,
Research Design Quantitative.
Mathematical Foundations of BME Reza Shadmehr
Parametric Methods Berlin Chen, 2005 References:
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Mathematical Foundations of BME
Sec. 2.2 Functions.
Sampling Distributions
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Ch. 11 Vocabulary 1.) rational expressions 2.) excluded value
Presentation transcript:

Discrimination and Classification

Discrimination & Classification Discrimination: Goal is to separate individual cases into known population groups by their measurements on several variables. Makes use of graphical and algebraic methods. Goal is to separate groups as much as possible based on numeric values Referred to as “Separation” Classification: Observing new cases along with numeric values and assigning them to groups based on their numeric values Makes use of an algorithm generated on known cases and applies it to new cases whose population is unknown. Referred to as “Allocation”

Notation and Concepts Notation: Populations ≡ p1, p2 Measured Variables: X Conceptual Settings of Unknown Population: Incomplete Knowledge of Outcome: The outcome is in future and cannot be observed when X is measured Destruction Necessary to Observe Outcome: A product must be destroyed to observe quality status. Unavailable or Expensive Assessments of Outcome: Authorship unknown or assessment by expensive gold standard may be needed

Setting up a Discriminant Function Prior Probabilities for the 2 Populations – Assumes knowledge of relative population sizes. Will tend to classify individual cases into the “larger” population unless strong evidence in favor of “smaller” population. Misclassification Cost – Is cost of misclassification same for objects from each of the populations? Probability Density Functions – The distributions of the numeric variables for the elements of the 2 populations. Population 1: f1(x) Population 2: f2(x) Classification Regions – Given an observations’ x values, it will be assigned to Population 1 or 2, R1 ≡ {x} s.t. an observation is classified to Population 1, R2 ≡ W – R1 is the set of x where it is classified to Population 2

Mathematical Notation

Regions that Minimize Expected Cost of Misclassification

Allocation of New Observation x0 to Population

Normal Populations with Equal S

Sample Based Discrimination

Fisher’s Method for 2 Populations

Classification of Multivariate Normal Populations when S1 ≠ S2

Evaluation of Classification Functions

Jacknife Cross-Validation (Lauchenbruch’s Holdout Method) For Population 1, remove each observation 1-at-time and fit the classifier based on all (n1-1)+n2 remaining cases. Classify the hold-out case. Repeat for all n1 cases from Population 1. n1m(H) ≡ # misclassified as p2 Repeat for all n2 cases from Population 2. n2m(H) ≡ # misclassified as p1