Chapter 8 Discriminant Analysis. 8.1 Introduction  Classification is an important issue in multivariate analysis and data mining.  Classification: classifies.

Slides:

Advertisements

Similar presentations

The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke

Advertisements

Discrimination and Classification. Discrimination Situation: We have two or more populations  1,  2, etc (possibly p-variate normal). The populations.

Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition.

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.

Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.

LECTURE 11: BAYESIAN PARAMETER ESTIMATION

Indian Statistical Institute Kolkata

Classification and Prediction

CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.

Statistical Methods Chichang Jou Tamkang University.

Basic Data Mining Techniques

1 DATA MINING. 2 Introduction Outline Define data mining Data mining vs. databases Basic data mining tasks Data mining development Data mining issues.

Basics of discriminant analysis

Data mining and statistical learning - lecture 13 Separating hyperplane.

Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!

Classification.

ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Classification with several populations Presented by: Libin Zhou.

METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.

Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 

DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.

Classification (Supervised Clustering) Naomi Altman Nov '06.

Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.

Principles of Pattern Recognition

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 1 Classification.

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.

Chapter 12 – Discriminant Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.

Basic Data Mining Technique

Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.

Classification Heejune Ahn SeoulTech Last updated May. 03.

CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification CS 685: Special Topics in Data Mining Fall 2010 Jinze Liu.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.

Classification and Prediction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot Readings: Chapter 6 – Han and Kamber.

Classification Techniques: Bayesian Classification

Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.

CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.

Classification And Bayesian Learning

Applications of Supervised Learning in Bioinformatics Yen-Jen Oyang Dept. of Computer Science and Information Engineering.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Classification and Prediction

Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.

1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.

BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.

1 Discriminative Frequent Pattern Analysis for Effective Classification Presenter: Han Liang COURSE PRESENTATION:

Lecture 2. Bayesian Decision Theory

Chapter 12 – Discriminant Analysis

Chapter 6 Classification and Prediction

CH 5: Multivariate Methods

Overview of Supervised Learning

Classification and Prediction

Pattern Recognition PhD Course.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Prepared by: Mahmoud Rafeek Al-Farra

Data Mining Functionalities (2)

Supervised vs. unsupervised Learning

Classification and Prediction

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

CSCI N317 Computation for Scientific Applications Unit Weka

Generally Discriminant Analysis

A graphical explanation

Multivariate Methods Berlin Chen

Multivariate Methods Berlin Chen, 2005 References:

©Jiawei Han and Micheline Kamber

Classification 1.

Discrimination and Classification

COP5577: Principles of Data Mining Fall 2008 Lecture 4 Dr

Presentation transcript:

Chapter 8 Discriminant Analysis

8.1 Introduction  Classification is an important issue in multivariate analysis and data mining.  Classification: classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data, i.e., predicts unknown or missing values

Classification — A Two-Step Process  Model construction: describing a set of predetermined classes Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute The set of tuples used for model construction is training set The model is represented as classification rules, decision trees, or mathematical formulae  Prediction: for classifying future or unknown objects Estimate accuracy of the model The known label of test sample is compared with the classified result from the model Accuracy rate is the percentage of test set samples that are correctly classified by the model Test set is independent of training set, otherwise over-fitting will occur If the accuracy is acceptable, use the model to classify data tuples whose class labels are not known

Classification Process : Model Construction Training Data Classification Algorithms IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Classifier (Model)

Classification Process: Use the Model in Prediction Classifier Testing Data Unseen Data (Jeff, Professor, 4) Tenured?

Supervised vs. Unsupervised Learning  Supervised learning (classification) Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations New data is classified based on the training set  Unsupervised learning (clustering) The class labels of training data is unknown Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data

Discrimination— Introduction Discrimination is a technique concerned with allocating new observations to previously defined groups. There are k samples from k distinct populations: One wants to find the so-called discriminant function and related rule to identify the new observations.

Example 11.3 Bivariate case

Discriminant function and rule

Example 11.1: Riding mowers Consider two groups in city: riding-mower owners and those without riding mowers. In order to identify the best sales prospects for an intensive sales campaign, a riding-mower manufacturer is interested in classifying families as prospective owners or non- owners on the basis of income and lot size.

Example 11.1: Riding mowers

8.2 Discriminant by Distance Assume k=2 for simplicity

Consider the Mahalanobis distance 8.2 Discriminant by Distance

Let 8.2 Discriminant by Distance

Example Univariate Case with equal variance a

a* Example Univariate Case with equal variance

8.3 Fisher’s Discriminant Function Idea: projection, ANOVA

Training samples 8.3 Fisher’s Discriminant Function

Projection the data on a direction, the F-statistics where 8.3 Fisher’s Discriminant Function

To find such that The solution of is the eigenvector associated with the largest eigenvalue of. Discriminant function: 8.3 Fisher’s Discriminant Function

(B) Two Populations Note We haveand There is only one non-zero eigenvalue of as

The associated eigenvector is where (B) Two Populations

Whenis replaced by where (B) Two Populations

Example Inset Classification Note:data x1 and x2 are the characteristics of insect (Hoel,1947) n.g.means natural group (species), c.g.the classified group, y the value of the discriminant function

The eigenvalue of is and the associated eigenvector is Example Inset Classification

The discriminant function is and the associated value of each observation is given in the table. The cutting point is Classification is If we use, we have the same classification. Example Inset Classification

8.4 Bayes’ Discriminant Analysis A.Idea There are k populations G 1, …, G k in R p. A partition of R p, R 1, …, R k, is determined based on a training sample. Rule:if falls into R i Loss: is from G i, but falls into R j The Probability of this misclassification whereis the density of.

Expected cost of misclassification is where q 1, …, q k are prior probabilities. We want to minimize ECM(R 1, …, R k ) w.r.t. R 1, …, R k. 8.4 Bayes’ Discriminant Analysis

Theorem Let Then the optimal R t ’s are B. Method

Take if and 0 if. Then Proof: Corollary 1

In the case of k=2 we have Corollary 2

In the case of k=2 and Corollary 3

Then

C. Example 11.3: Detection of hemophilia A carriers For the detection of hemophilia A carriers, to construct a procedure for detecting potential hemophilia A carriers, blood samples were assayed for two groups of women and measurements on the two variables. The first group of 30 women were selected from a population of women who did not carry the hemophilia gene. This group was called the normal group. The second group of 22 women was selected from known hemophilia A carriers. This group was called the obligatory carriers.

Variables: log 10 (AHF activity) log 10 (AHF-like antigen) Populations:population of women who did not carry the hemophilia gene (n 1 =30) population of women who are known hemophilia A carriers (n 2 =45) hemophilia A carriers (n 2 =45) C. Example 11.3: Detection of hemophilia a carriers

C. Example 11.3: Detection of hemophilia a carriers

Data set normal log10(AHF activity) log10(AHF-like antigen) Obligatorycarrier log10(AHF activity) log10(AHF-like antigen) C. Example 11.3: Detection of hemophilia a carriers

SAS output C. Example 11.3: Detection of hemophilia a carriers

C. Example 11.3: Detection of hemophilia a carriers

C. Example 11.3: Detection of hemophilia a carriers

C. Example 11.3: Detection of hemophilia a carriers