LINEAR CLASSIFICATION METHODS STAT 597 E Fengjuan Xuan Caimiao Wei Bogdan Ilie.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

StatisticalDesign&ModelsValidation. Introduction.
Kin 304 Regression Linear Regression Least Sum of Squares
Prof. Navneet Goyal CS & IS BITS, Pilani
Exercise 1 In the ISwR data set alkfos, do a PCA of the placebo and Tamoxifen groups separately, then together. Plot the first two principal components.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
1 CPC group SeminarThursday, June 1, 2006 Classification techniques for Hand-Written Digit Recognition Venkat Raghavan N. S., Saneej B. C., and Karteek.
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding.
Analysis of Variance (ANOVA) Randomized Block Design.
Lab 5 instruction.  a collection of statistical methods to compare several groups according to their means on a quantitative response variable  Two-Way.
Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Section 6-5 The Central Limit Theorem. THE CENTRAL LIMIT THEOREM Given: 1.The random variable x has a distribution (which may or may not be normal) with.
Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics ANalysis Of VAriance: ANOVA.
Linear Discriminant Analysis and Its Variations Abu Minhajuddin CSE 8331 Department of Statistical Science Southern Methodist University April 27, 2002.
Linear Discriminant Analysis (LDA). Goal To classify observations into 2 or more groups based on k discriminant functions (Dependent variable Y is categorical.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?
Principal Component Analysis
Math 4030 – 7b Normality Issues (Sec. 5.12) Properties of Normal? Is the sample data from a normal population (normality)? Transformation to make it Normal?
Extending linear models by transformation (section 3.4 in text) (lectures 3&4 on amlbook.com)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.
Bootstrap and Model Validation
Probability plots.
BINARY LOGISTIC REGRESSION
Background on Classification
Modify—use bio. IB book  IB Biology Topic 1: Statistical Analysis
Kakhramon Yusupov June 15th, :30pm – 3:00pm Session 3
Kin 304 Regression Linear Regression Least Sum of Squares
Analyzing Redistribution Matrix with Wavelet
Chapter 12: Regression Diagnostics
Fundamentals of regression analysis
BPK 304W Regression Linear Regression Least Sum of Squares
Simple Linear Regression - Introduction
Simple Linear Regression
Simple Linear Regression
LESSON 4.4. MULTIPLE LINEAR REGRESSION. Residual Analysis
Techniques for studying correlation and covariance structure
Regression Model Building
Feature space tansformation methods
Model generalization Brief summary of methods
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Principal Component Analysis
Multivariate Methods Berlin Chen, 2005 References:
Adequacy of Linear Regression Models
Machine Learning – a Probabilistic Perspective
8/22/2019 Exercise 1 In the ISwR data set alkfos, do a PCA of the placebo and Tamoxifen groups separately, then together. Plot the first two principal.
Presentation transcript:

LINEAR CLASSIFICATION METHODS STAT 597 E Fengjuan Xuan Caimiao Wei Bogdan Ilie

Introduction The observations in the dataset we will work on (“BUPA liver disorders”) were sampled by BUPA Medical Research Ltd and consist of 7 variables and 345 observed vectors. The first 5 variables are measurements taken by blood tests that are thought to be sensitive to liver disorders and might arise from excessive alcohol consumption. The sixth variable is a sort of selector variable. The subjects are single male individuals. The seventh variable is a selector on the dataset, being used to split it into two sets, indicating the class identity. Among all the observations, there are 145 people belonging to the liver-disorder group (corresponding to selector number 2) and 200 people belonging to the liver-normal group.

Description of variables The description of each variable is below: 1. mcv mean corpuscular volume 2. alkphos alkaline phosphotase 3. sgpt alamine aminotransferase 4. sgot aspartate aminotransferase 5. gammagt gamma-glutamyl transpeptidase 6. drinks number of half-pint equivalents of alcoholic beverages drunk per day 7. selector field used to split data into two sets. It is a binary categorical variable with indicators 1 and 2 ( 2 corresponding to liver disorder)

Matrix Plot of the variables

Logistic regression in full Space Coefficients: Value Std. Error t value (Intercept) mcv alk sgpt sgot gammagt drinks So the classification rule is: G(x)=

Classification error rate the classification error on the whole training data set. error rate: Sensitivity: Specificity: The error rate and it’s standard error obtained by 10-fold cross validation error rate:(Standard Error) (0.0271) Sensitivity:(Standard Error) (0.0203) Specificity:(Standard Error) (0.0699)

Backward step wise model selection based on AIC Five variables are selected after step-wise model selection. The first variable MCV is deleted. error rate:(Standard Error) ( ) Sensitivity:(Standard Error) ( ) Specificity:(Standard Error) ( ) COMMENT: This method has a larger classification error rate than the original one. Using stepwise doesn’t improve classification

Scree plot for the PCA

The performance of the Logistic regression on the reduced space The reduced space is obtained by selecting the first three principle components. The standard error is obtained by 10 fold cross validation. error rate:(Standard Error) ( ) Sensitivity:(Standard Error) ( ) Specificity:(Standard Error) ( ) Comment: the classification error rate is around 50%, which is not much better than the random guessing.

The classification plot on the first two principle components plane

Linear Discriminant Analysis LDA assumes a multivariate normal distribution, so we make some log transformations on some variables. Y1=mac & Y2=log(alk) Y3=log(sgpt) & Y4=log(sgpt) Y5=log(gammat) & Y6=log(dringks+1)

The histogram of the sgpt variable and its log transformation

The performance of the LDA based on Transformed data Comment: the classification error is the smallest among all methods and the sensitivity is the largest error rate: Sensitivity: Specificity: By the log transformation, we make the assumption of multivariate normality reasonable. So the classification becomes better.

LDA after PCA error rate: Sensitivity: 0.88 Specificity: Comment: the performance is not improved by PCA

Conclusion Four different methods are applied to the liver disorder data set. The LDA based on the transformed variables works best and the Logistic regression based on the original data set second. The classification method based on the principle component doesn’t work well. Although the first three principle components contain more than 97% variation, we may still lose the most important information for classification. The transformations can make the LDA method work better in some cases. The LDA assumes the normality distribution which is a very strong assumption in many data sets. For example, in our data, all variables except the first one are seriously skewed. That is why log transform works.