Classification Supervised and unsupervised Tormod Næs Matforsk and University of Oslo.

Slides:



Advertisements
Similar presentations
PCA for analysis of complex multivariate data. Interpretation of large data tables by PCA In industry, research and finance the amount of data is often.
Advertisements

Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
Component Analysis (Review)
Dimension reduction (1)
K Means Clustering , Nearest Cluster and Gaussian Mixture
Chapter 4: Linear Models for Classification
1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.
Chapter 5: Linear Discriminant Functions
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
CALIBRATION Prof.Dr.Cevdet Demir
Linear Methods for Classification
Lecture 6: Multiple Regression
Classification 10/03/07.
Basics of discriminant analysis
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
1 Cluster Analysis EPP 245 Statistical Analysis of Laboratory Data.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
1 Linear Classification Problem Two approaches: -Fisher’s Linear Discriminant Analysis -Logistic regression model.
Objectives of Multiple Regression
1 Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Object Orie’d Data Analysis, Last Time
Classification (Supervised Clustering) Naomi Altman Nov '06.
CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
Discriminant Function Analysis Basics Psy524 Andrew Ainsworth.
Chapter 12 – Discriminant Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is.
CLASSIFICATION. Periodic Table of Elements 1789 Lavosier 1869 Mendelev.
Linear Discriminant Analysis and Its Variations Abu Minhajuddin CSE 8331 Department of Statistical Science Southern Methodist University April 27, 2002.
PATTERN RECOGNITION : CLUSTERING AND CLASSIFICATION Richard Brereton
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Discriminant Analysis
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.
Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Principal Component Analysis (PCA)
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?
D/RS 1013 Discriminant Analysis. Discriminant Analysis Overview n multivariate extension of the one-way ANOVA n looks at differences between 2 or more.
MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.
Computational Intelligence: Methods and Applications Lecture 22 Linear discrimination - variants Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Chapter 12 – Discriminant Analysis
JMP Discovery Summit 2016 Janet Alvarado
Semi-Supervised Clustering
Object Orie’d Data Analysis, Last Time
Background on Classification
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
LECTURE 10: DISCRIMINANT ANALYSIS
CH 5: Multivariate Methods
Classification of unlabeled data:
Regression.
Discrimination and Classification
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Generally Discriminant Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Presentation transcript:

Classification Supervised and unsupervised Tormod Næs Matforsk and University of Oslo

Classificaton Unsupervised (cluster analysis) –Searching for groups in the data Suspicion or general exploration –Hierarchical methods, partitioning methods Supervised (discriminant analysis) –Groups determined by other information External or from a cluster analysis –Understand differences between groups –Allocate new objects to the groups Scoring, finding degree of membership

Group 1 Group 2 New object X ? ? What is the difference?Where?

Why supervised classification? Authenticity studies –Adulteration, impurities, different origin, species etc. Raw materials Consumer products according to specification When quality classes are more important than chemical values raw materials acceptable or not raw materials for different products

Flow chart for discriminant analysis

Main problems Selectivity –Multivariate methods are needed Collinearity –Data compression is needed Complex group structures –Ellipses, squares or ”bananas”?

X1X1 X2X2 Authentic Adulterated The selectivity problem

Solving the selectivity problem Using several measurements at the same time –The information is there! Multivariate methods. These methods combine several instrumental NIR variables in order to determine the property of interest Mathematical ”purification” instead of wet chemical analysis

Multivariate methods Too many variables can also sometimes create problems –Interpretation –Computations, time and numerical stability –Simple and difficult regions (nonlinearity) –Overfitting is easier (dependentent on method used) Sometimes important to find good compromises (variable selection)

Conflict between flexibility and stability Estimation error Model error

Some main classes of methods Classical Bayes classification –LDA, QDA Variants, modifications used to solve the collinearity problem –RDA, DASCO, SIMCA Classification based on regression analysis –DPLS, DPCR KNN methods, flexible with respect to shape of the groups

Bayes classification Assume prior probabilities p j for the groups –If unknown, fix them to be p j = 1/C or –equal to the proportions in the dataset Assume known probability model within each class (f j (x)) –Estimated from the data, usually covariance matrices and means

Bayes classification + well understood, much used, often good properties, easy to validate easy to modify for collinear data Easy to updated, covariances Can be modified for cost Outlier diagnostics (not directly, but can be done, M-distance) - Can not handle too complex group structures, designed for elliptic structures not so easy to interpret directly often followed by a Fisher’s linear discriminant analysis. Directly related to interpreting differences between groups

Bayes rule Maximise porterior probability Normal data, minimise Estimate model parameters, Mahalanobis distance plus determinant minus prior probability

Different covariance structures

Mahalanobis distance is constant on ellipsoids

Best known members Equal covariance matrix for each group –LDA Unequal covariance matrices –QDA Collinear data, unstable inverted covariance matrix (see equation) –Use principal components (or PLS components) –RDA, DASCO estimate stable inverse covariance matrices

Classification by regression 0,1 dummy variables for each group Run PLS-2 (or PCR) or any other method which solves the collinearity Predict class membership. –The class with the highest value gets the vote All regular interpretation tools are available, variable selection, plotting outliers diagnostics etc. Linear borders between subgroups, not too complicated groups. Related to LDA, not covered here If large data sets, we can use more flexible methods

Example, classification of mayonnaise based on different oils The oils were soybean sunflower canola olive corn grapeseed Indahl et al (1999). Chemolab 16 samples in each group, Feasibility study, authenticity

Classification properties of QDA, LDA and regression Start out low

Comparison LDA and QDA gave almost identical results It was substantially better to use LDA/QDA based on PLS/PCA components instead of using PLS directly

Fisher’s linear discriminant analysis Closely related to LDA Focuses on interpretation –Use “spectral loadings” or group averages Finds the directions in space which distinguish the most between groups –Uncorrelated Sensitive to overfitting, use PC’s first

Fisher’s method. Næs, Isaksson, Fearn and Davies (2001). A user friendly guide to cal. and class.

Not possible to distinguish the groups from each other Plot of PC1 vs PC2

Mayonnaise data, clear separation Canonical variates based on PC’s

PCAFisher’s method Forina et al(1986), Vitis Italian wines from same region, but based on different cultivars, 27 chromatic and chemical variables Barolo Grignolino Barbera

Error rates Validated properly LDA –Barolo 100%, Grignolino 97.7%, Barbera 100% QDA –Barolo 100%, Grignolino 100%, Barbera100%

KNN methods No model assumptions Therefore: needs data from “everywhere” and many data points Flexible, complex data structures Sensitive to overfitting, use PC’s

New sample KNN, finds the N samples which are closest In this case 3 samples

Cluster analysis Unsupervised classification Identifying groups in the data –Explorative

Examples of use Forina et al(1982). Olive oil from different regions (fatty acid composition). Ann. Chim. Armanino et al(1989), Olive oils from different Tuscan provinces (acids, sterols, alcohols). Chemolab.

Methods PCA (informal/graphical) –Look for structures in scores plots –Interpretation of subgroups using loadings plots Hierarchical methods (more formal) –Based on distances between objects (Euclidean or Mahalanobis) –Join the two most similar –Interpret dendrograms

Armanino et al(1989), Chem.Int. lab. Systems. 120 olive oils from one region in Italy, 29 variables (fatty acids, sterols, etc.)